Alert with Triggers | Honeycomb

We use cookies or similar technologies to personalize your online experience & tailor marketing to you. Many of our product features require cookies to function properly.

Read our privacy policy I accept cookies from this site

Alert with Triggers

Triggers let you receive notifications when your data in Honeycomb crosses the thresholds you configure. The graph on which to alert is as flexible as a Honeycomb query, which helps reduce false positives due to known errors.

When a trigger fires, you’ll be notified via the configured method. Currently supported are PagerDuty, Slack, Webhooks, and Email. The notification includes a link back to the graph showing you the current status, providing a jumping off point for further investigation.

Triggers have a duration over which they query data and a frequency which determines how often they run. A trigger with a duration of 5 minutes (set query time_range to 300) and a frequency of 2 minutes will run every 2 minutes over the last 5 minutes of data.

Important The duration of a trigger query can be at most 1 day, and cannot exceed 4 times the frequency of the trigger. For example, if the trigger’s frequency is 1 hour, query duration can not be more than 4 hours. Query duration can also not be less than the trigger’s frequency.

Important: Trigger frequency must be specified in whole minutes, from 1 to 1440. Decimal values are truncated to the preceding full minute. (3.6 becomes 3.)

For this example, we want to know whenever the 95th percentile of our API server’s requests exceeds 30ms, but we want to exclude the /poll endpoint because it has long-held connections, which pollute the data by being artificially high.

Create a Trigger  🔗

Start the trigger creation process by building and running a query. Select the three-dot overflow menu and click “Make Trigger”.

The Query Builder

Configuration  🔗

Fill in the details for the trigger. Both the Name and Description will be included in notifications about the trigger. Make sure the name describes clearly what has happened, while the description should indicate next steps or include links back to documentation, so that the person who receives the alert will know how to respond.

Define New Trigger

The sample graph displays the most recent 16 periods for your query (with a granularity equal to the period length) to help choose an appropriate threshold. For example, the graph for a 5 minute frequency shows the previous 80 minutes with a 5 minute granularity.

Define Threshold

The Threshold indicates what condition generates a notification, and the Frequency determines how often to check for that condition. Consider what is normal within your frequency window so notifications only capture conditions worth alerting on.

The Duration determines what time range of data the trigger will check. You’ll see a chart of your dataset’s maximum and average event latency next to the duration field. This chart describes the amount of delay between the timestamp on the event and when it reached Honeycomb. You can use this data to help choose a duration that captures all your events, even if they are delayed. For example, if the average event latency is 2 minutes, and you want to run your trigger every 5 min, choose a 7 minute duration to ensure that delayed events are captured by the trigger. Please note that if your traces span a long timeframe, you may see high latency in this chart, even though the traces are arriving as soon as they complete.

The trigger will notify all recipients listed when the measured value crosses the configured threshold.

If you have specified fields in the Group By clause, then the trigger will notify all recipients every time a new group surpasses the trigger threshold. Thus, if a trigger is already in a triggered state, and a new group surpasses the trigger threshold, the trigger will again notify all recipients and include the new groups that have triggered the alert.

Add Recipients

By default, you will be able to enter email recipients. Additional recipients will be available in the dropdown after you have configured them on a team-level basis.

Trigger Recipient Integrations (Slack, Pagerduty, Webhooks)  🔗

You can configure PagerDuty, Slack, or Webhooks as trigger recipients that can be used by anyone on your Honeycomb team. If you are in the process of creating a trigger, you can click “Configure Integrations” from within the trigger editing page to set them up in your team settings page.

Configure Recipients

You can also find this page by clicking on your user icon in the navigation sidebar > Team settings > Integrations. Here you can add, edit, or remove your team-level trigger recipients. Deleting an Integration from here will remove them from all associated triggers.

Configure team-level trigger recipients

Slack  🔗

Allow Honeycomb to connect to your Slack team to send triggers to your Slack channels, with nifty features such as link unfurling that will show you a preview of your Honeycomb query result graphs. Only one of your Slack team’s members needs to authorize Honeycomb in order to send triggers to public channels. Team members who want to send triggers to their private channels or themselves must authorize Slack on an individual basis.

PagerDuty  🔗

PagerDuty’s API Integration docs describe how to create a generic API integration to PagerDuty. Following those steps will give you an Integration Key that you’ll enter in the Trigger Recipient configuration form.

Webhooks  🔗

You can specifiy Webhooks for Honeycomb to send JSON payloads to upon trigger firing, to build your custom integrations. A webhook is an HTTP endpoint running within your infrastructure to which Honeycomb will send notifications of the trigger changing state. The content will include an authentication header and the result of the trigger in JSON in the body of the webhook.

The API for webhook notifications are described via an example webhook implementation that can consume webhook notifications.

Activate Trigger  🔗

When you save the trigger, it is immediately active and will run at the next frequency interval (such as on the next 5 minute interval for a 5 minute frequency). You can enable or disable a trigger by editing the trigger and selecting the Enable or Disable option.

List and Delete  🔗

You can find all triggers for your team by clicking the Triggers icon in the left hand navigation bar. You can also find all triggers belonging to a specific dataset by going into the settings page for the dataset, then clicking the “Triggers” tab.

Triggers Page

You’ll see a full list of the triggers. You’ll be able to click through and edit each trigger from this list by clicking on the trigger name.

List of Triggers within a Dataset page

To delete a trigger, either use the delete button on the triggers page or use the delete button at the bottom of the edit page.

Considerations  🔗

Queries done for triggers run your selected calculation over your configured frequency period. COUNTs, for example, will be the total count for that period, not a per-second count. Averages and percentiles are likewise covering the entire period—so to detect spikes, it is better to use MAX instead of AVG over a period. Another alternative is to use a COUNT with a filter restricting the set to the threshold you’re interested in—for example, you could count the number of events over 100ms and use that with a threshold instead of asking for the average to exceed 100ms.

The MIN calculation returns 0 for cases in which no data is found that satisfies the query, making it difficult to distinguish between a valid minimum of 0 and a null result. For this reason, you should avoid using the MIN calculation in combination with the < threshold operator in order to prevent false positives when no data is found. Instead, try switching the query to using COUNT with a filter restricting the set, and alert when the value exceeds a threshold—for example, given that a timeout should never be less than 100 ms, you could COUNT the number of events with timeout < 100, and trigger with a threshold > 0.

Best practices  🔗

  • Use the Name and Description fields effectively. The Name field should tell you what the alert is; the Description field should tell you what to do about the alert. Links to internal wikis or runbooks are best.
  • Use filters to improve the quality of your signal. If you’re interested in latency, but have a long poll endpoint, use a filter to remove that endpoint from the calculation rather than adjusting the values of the threshold.
  • To detect spikes in latency metrics, combine a filter with your cutoff (eg >100ms) with a COUNT. Your result will be the number of events that exceed your threshold.
  • To ignore spikes in latency and trigger on overall performance, use the P95 or P99 calculations. These will be more representative of the majority of traffic than AVG, which can be polluted by large outliers.
  • When detecting errors, whitelist good values instead of looking for bad values. For example, instead of building a filter of HTTP status codes == 500, use several filters to look for events that don’t have status codes 200, 301, 302, 404, etc.