A Service Level Objective (SLO) is defined for a single Honeycomb dataset. To define and measure your SLO in Honeycomb, complete the following:
true
, false
, or null
to represent your Service Level Indicator (SLI).To identify a suitable SLI, first express it in terms of user goals. For example, “a user should be able to load our home page and see a result quickly.”
Identify qualified events, or which events contain information about the SLI.
In this example, our qualified events are events where request.path = “/home”
.
For those events, the criterion for your SLI determines which events are considered “successful”.
In this case, success means duration_ms < 100
.
If an event is not qualified, then null
is returned.
If an event is qualified, then whether the event passes the criterion (true
) or not (false
) is returned.
Honeycomb uses a Derived Column to define the SLI and evaluates “success” according to your definition.
The Derived Column returns true
(for successful), false
(for failed), or null
(for not applicable) for each event in the dataset.
An SLI reports whether an event is successful or not in terms of the goals of the SLO. Before you configure the SLO, you must define the indicator that it uses to evaluate your level of success.
In the Dataset that you plan to have the SLO, create a Derived Column to evaluate your Service Level Indictor (SLI).
From the left sidebar, select Data Settings.
Select the name of your target dataset to access its settings. The Dataset Settings page displays several tabs.
Select the Schema tab.
Select Derived Columns to expand.
The schema displays a button to create a new derived column, a search box, and a list of any existing derived columns.
Select Add new Derived Column.
A modal appears.
Enter a Name and Function at minimum.
Define your derived column expression within the Function section. Refer to the Derived Column Reference for syntax and a list of available functions. If syntax errors exist, errors in the expression appear underlined in red or with a red triangle. Hover over each error marker to display details about the error or refer to the error message displayed by the editor.
For example, Honeycomb’s two-argument “IF” command can be convenient for your derived column creation: IF( $a, $b)
returns $b
only if $a
is true
; otherwise, it returns null
.
Therefore, most SLIs are written as IF( qualifier, criterion)
.
Continuing with the previous example, the derived column for this SLI would look similar to:
IF( EQUALS( $request.path, “/home”), LT( $http.response_duration, 100))
Refer to SLI Formulas for more examples.
To test your SLI, query the associated dataset for a COUNT and a HEATMAP(duration_ms)
, broken down by the SLI derived column.
Confirm that you see three groups: true
, false
and blank.
(Blank events are those that are not qualified.)
Your current level is approximately #true / (#true + #false)
.
Confirm that the three groups look correct for your use case and understanding of the dataset’s contents.
This process is illustrated in our blog entry, Working Toward Service Level Objectives.
To define your SLO, answer the following question, where “qualified events” are as defined in your SLI:
Over what period of time do you expect what percentage of qualified events to pass the SLO?
For example, “I expect that 99% of qualified events will succeed over every 30 days” where 99%
is the Target Percentage of success and 30 days
is the Time Period being measured.
In addition to your SLI, your SLO uses these two variables (Target Percentage and Time Period) for its definition.
As you select a level, base it off your current state, which you can find out by doing a count query grouped by your SLI derived column.
Once your SLO is defined, create your SLO in Honeycomb.
The form displays the following elements:
Monitor the SLOs for your team from the SLOs page and set up Burn Alerts, which provide notifications related to your SLO budget.
While Honeycomb will track SLO values past your retention period, this only works for the Budget Burndown and the Historical Compliance graphs. You cannot use the Bubbleup or the heatmap to look at times beyond your retention period.
You may only have one SLO attached to any SLI derived column. For example, you may not have both a 30 day and a 60 day SLO attached to the same SLI column. You may have as many Burn Alerts attached to that SLO as you wish. (If you do find yourself needing more than one SLO attached to any SLI derived column, please contact Honeycomb for support; we would like to understand that scenario better!)
SLOs are most effective when you have a reasonably high volume of data: a small number of failures in an hour should not make a major dent in your reliability.
You should have fairly few SLOs for any dataset. Currently, the interface limits you to 30. SLOs should describe interfaces to a system rather than (say) customers. Customers should roughly have similar behavior to each other; if groups of customers have properties that set them apart from others, try to write SLOs against those properties instead.