We use cookies or similar technologies to personalize your online experience & tailor marketing to you. Many of our product features require cookies to function properly.

Read our privacy policy I accept cookies from this site

Service Level Objectives (SLOs) Process

This feature is available as part of the Honeycomb Enterprise plan.

An SLO is defined over a single Honeycomb dataset. To define and measure your SLO in Honeycomb, you will do the following:

  1. Create a derived column that returns true, false, or null to represent your service-level indicator (SLI).
  2. Use that derived column to define your SLO in Dataset Settings > SLOs, or under the main SLO list.
  3. Monitor all the SLOs for your team from the SLOs page, or click the handshake icon in the left hand menu.

Define the SLI with a Derived Column  🔗

An SLI reports whether an event is “successful” or not in terms of the goals of the SLO. Before you configure the SLO, you must define the indicator that it uses to evaluate your level of success. To do this, you create a derived column in Honeycomb that evaluates “success” as you’ve defined it and returns true (for successful), false (for failed) or null (for not applicable) for each event in the dataset.

To identify a suitable SLI, first express it in terms of user goals, such as “a user should be able to load our home page and see a result quickly.”

Identify qualified events, or which events contain information about the SLI. In this example, our qualified events are events where request.path = “/home”.

For those events, the criterion for your SLI determines which events are considered “successful”. In this case, success means duration_ms < 100.

If an event is not qualified, then null is returned. If an event is qualified, then whether the event passes the criterion or not is returned.

Create a Derived Column to Measure the SLI  🔗

Now, create a derived column that reflects this qualifier and criterion. To create the SLI derived column, go to Dataset Settings for the dataset, where this SLI and associated SLO will be calculated. Under the Schema tab, select Derived Columns. For more detailed documentation, refer to the documentation for creating derived columns.

Honeycomb’s two-argument “IF” command can be convenient for your derived column creation: IF( $a, $b) returns $b only if $a is true; otherwise, it returns null. Therefore, most SLIs are written as IF( qualifier, criterion)

Continuing with the previous example, the derived column for this SLI would look similar to:

IF( EQUALS( $request.path, “/home”), LT( $http.response_duration, 100))

Refer to SLI Formulas for more examples.

Test the SLI  🔗

To test your SLI, query the associated dataset for a COUNT and a HEATMAP(duration_ms), broken down by the SLI derived column.

Confirm that you see three groups: true, false and blank. (Blank events are those that are not qualified.) Your current level is approximately #true / (#true + #false).

Confirm that the three groups look correct for your use case and understanding of the dataset’s contents.

This process is illustrated in our blog entry, Working Toward Service Level Objectives.

Define Your SLO  🔗

To define your SLO, answer the following question, where “qualified events” are as defined in your SLI:

Over what period of time do you expect what percentage of qualified events to pass the SLO?

For example, “I expect that 99% of qualified events will succeed over every 30 days.” As you select a level, base it off your current state, which you can find out by doing a count query grouped by your SLI derived column.

Create Your SLO  🔗

You can get to the list of all SLOs across all datasets by selecting the SLO icon. Select “New SLO” to create a new SLO.

SLO from the menu

You can also create an SLO on the SLOs tab of the Datasets page.

SLO from a dataset

Complete the form to create your SLO.

SLO creation dialog

Monitor Your SLOs  🔗

Access all SLOs for the current team from the left navigation bar by using the Handshake icon. The SLO list view shows information for each SLO, including:

To see the details of a particular SLO, select the SLO within the SLO list.

An SLO detailed display has four components:

SLO summary view

Define Burn Alerts  🔗

SLOs are especially useful when they warn you of upcoming issues. Honeycomb Burn Alerts warn you when your SLO budget will be exhausted in a certain amount of time.

Choose the length of time for a given burn alert based on the context and goals of your organization. A 24 hour burn alert can be useful to know if services quality is slowly degrading (and so might be best sent to Slack); a 4 hour alert can be useful to know if there is an urgent issue (and might go to PagerDuty).

Honeycomb computes burn alerts by extrapolating the current rate of budget burn by dividing the previous exhaustion time by 4. A 24 hour burn alert will fire when the trend over the last 6 hours implies a failure. A burn alert will stay fired until the SLO budget returns to the exhaustion time (plus a small buffer, to keep from flapping).

For each SLO, select Burn Alerts to add a burn alert. The Burn Alert endpoints list is populated from the Triggers list. Add Trigger Integrations in Team Settings > Integrations.

SLO Burn Alert fired

Burns Alerts are measured in hours. While it is possible to express fractional hours (0.25 corresponds to 15 minutes, for example), our experience is that burn alerts are most useful set at zero – that is, notify you when out of budget – or ranging from an hour to a few days.

For periods less than an hour, there isn’t enough time to react in order to make the SLO actionable. Conversely, for periods more than a few days, it almost never merits notification – instead, it effectively becomes the current SLO measurement.

SLO Burn Alert creation

Reset Your Remaining Budget  🔗

Burn Alerts will only trigger if you have budget remaining. If you’ve blown your error budget due to some issue and then fixed the problem, it’s worth resetting your error budget so burn alerts will start working again.

You can reset your budget back to 100% by clicking the “Reset” button under the Budget Burndown chart:

SLO Reset button

Selecting it will erase all errors that have happened in the current SLO time period, up to and including the current hour. For example, if you have a 30 day SLO, you will be back to 100% for that 30 days. This will affect both your Budget Burndown and Historical SLO Compliance graphs on the Summary view, as well as the Current Percentage displayed in the SLO lists.

Best Practices and Usage Notes  🔗