Service Level Objective (SLO) Creation Process | Honeycomb

Service Level Objective (SLO) Creation Process

Note
This feature is available as part of the Honeycomb Enterprise and Pro plans.

A Service Level Objective (SLO) is defined for a single Honeycomb dataset. To define and measure your SLO in Honeycomb, complete the following:

  1. Determine your Service Level Indictor (SLI).
  2. Create a Derived Column that returns true, false, or null to represent your Service Level Indicator (SLI).
  3. Define your SLO with your Derived Column/SLI.

Determine Your Service Level Indictor (SLI) 

To identify a suitable SLI, first express it in terms of user goals. For example, “a user should be able to load our home page and see a result quickly.”

Identify qualified events, or which events contain information about the SLI. In this example, our qualified events are events where request.path = “/home”.

For those events, the criterion for your SLI determines which events are considered “successful”. In this case, success means duration_ms < 100.

If an event is not qualified, then null is returned. If an event is qualified, then whether the event passes the criterion (true) or not (false) is returned.

Define Your SLI with a Derived Column 

Honeycomb uses a Derived Column to define the SLI and evaluates “success” according to your definition. The Derived Column returns true (for successful), false (for failed), or null (for not applicable) for each event in the dataset.

An SLI reports whether an event is successful or not in terms of the goals of the SLO. Before you configure the SLO, you must define the indicator that it uses to evaluate your level of success.

Create a Derived Column 

In the Dataset that you plan to have the SLO, create a Derived Column to evaluate your Service Level Indictor (SLI).

  1. From the left sidebar, select Data Settings.

  2. Select the name of your target dataset to access its settings. The Dataset Settings page displays several tabs.

  3. Select the Schema tab.

  4. Select Derived Columns to expand. The schema displays a button to create a new derived column, a search box, and a list of any existing derived columns. Navigating through the UI to the schema of a Dataset

  5. Select Add new Derived Column. A modal appears. Create A Derived Column Modal

  6. Enter a Name and Function at minimum.

    Define your derived column expression within the Function section. Refer to the Derived Column Reference for syntax and a list of available functions. If syntax errors exist, errors in the expression appear underlined in red or with a red triangle. Hover over each error marker to display details about the error or refer to the error message displayed by the editor.

SLI Example 

For example, Honeycomb’s two-argument “IF” command can be convenient for your derived column creation: IF( $a, $b) returns $b only if $a is true; otherwise, it returns null. Therefore, most SLIs are written as IF( qualifier, criterion).

Continuing with the previous example, the derived column for this SLI would look similar to:

IF( EQUALS( $request.path, “/home”), LT( $http.response_duration, 100))

Refer to SLI Formulas for more examples.

Test the SLI 

To test your SLI, query the associated dataset for a COUNT and a HEATMAP(duration_ms), broken down by the SLI derived column.

Confirm that you see three groups: true, false and blank. (Blank events are those that are not qualified.) Your current level is approximately #true / (#true + #false).

Confirm that the three groups look correct for your use case and understanding of the dataset’s contents.

This process is illustrated in our blog entry, Working Toward Service Level Objectives.

Define Your SLO 

To define your SLO, answer the following question, where “qualified events” are as defined in your SLI:

Over what period of time do you expect what percentage of qualified events to pass the SLO?

For example, “I expect that 99% of qualified events will succeed over every 30 days” where 99% is the Target Percentage of success and 30 days is the Time Period being measured. In addition to your SLI, your SLO uses these two variables (Target Percentage and Time Period) for its definition.

As you select a level, base it off your current state, which you can find out by doing a count query grouped by your SLI derived column.

Once your SLO is defined, create your SLO in Honeycomb.

Create Your SLO 

  1. From the left sidebar, select SLOs.
  2. Select New SLO in the upper right corner. A form appears.
  3. Complete the form to create your SLO. SLO creation dialog

The form displays the following elements:

Dataset
Select the dataset to which the SLO applies.
Name
Name defines your SLO’s name.
Description
This field provides additional information to help provide context or purpose for the SLO. Use Markdown to insert links and text formatting.
SLI Column
Select the Derived Column that defines success for your SLO.
Time Period (in Days)
This setting determines the time period in days.
Target Percentage
This sets your goal percentage of how many events will succeed.
Create SLO
After selection, the saved SLO appears in the SLO display list.

Next Steps: Monitor Your SLO 

Monitor the SLOs for your team from the SLOs page and set up Burn Alerts, which provide notifications related to your SLO budget.

Best Practices and Usage Notes 

  • While Honeycomb will track SLO values past your retention period, this only works for the Budget Burndown and the Historical Compliance graphs. You cannot use the Bubbleup or the heatmap to look at times beyond your retention period.

  • You may only have one SLO attached to any SLI derived column. For example, you may not have both a 30 day and a 60 day SLO attached to the same SLI column. You may have as many Burn Alerts attached to that SLO as you wish. (If you do find yourself needing more than one SLO attached to any SLI derived column, please contact Honeycomb for support; we would like to understand that scenario better!)

  • SLOs are most effective when you have a reasonably high volume of data: a small number of failures in an hour should not make a major dent in your reliability.

  • You should have fairly few SLOs for any dataset. Currently, the interface limits you to 30. SLOs should describe interfaces to a system rather than (say) customers. Customers should roughly have similar behavior to each other; if groups of customers have properties that set them apart from others, try to write SLOs against those properties instead.