Alert with Triggers | Honeycomb

Alert with Triggers

Triggers let you receive notifications when your data in Honeycomb crosses the thresholds that you configure. The graph on which to alert is as flexible as a Honeycomb query, which helps reduce false positives due to known errors.

When a trigger fires, you will be notified by the configured method(s). Currently supported methods are PagerDuty, Slack, Webhooks, and Email. The notification includes a link back to the graph, which shows the current status and provides a jumping off point for further investigation.

Triggers have a duration over which they query data and a frequency, which determines how often they run. For example, a trigger with a duration of 5 minutes and a frequency of 2 minutes will run every 2 minutes over the last 5 minutes of data.

By default, users have a limit of two triggers available across all environments. Upgrade to a Pro or Enterprise plan to increase your number of Triggers.

Create a Trigger 

You can create a trigger within the Triggers page or while using the Query Builder.

Select the Triggers icon in the left navigation bar to reach the Triggers page. From the Triggers page:

  1. Select New Trigger in the top right corner. If no previous triggers exist, select Create Your First Trigger instead.
  2. Choose your Dataset for the trigger and select Make Trigger.

Creating a new trigger from the Triggers page will require entering a query during trigger configuration.

Triggers Page with New Trigger Button
Note
You cannot create a trigger on a heatmap or a concurrency calculation. Learn more about trigger best practices.

From the Query Builder screen:

  1. Build and run a query
  2. Select the three-dot overflow menu, located to the left of Run Query, and select Make Trigger.
Note
You cannot create a trigger on a heatmap, concurrency, or rate calculation. Learn more about trigger best practices.

For this example, we want to know whenever our cart has an error that is considered “slow” and to have the results grouped by userid, http.url, and requestID.

The Query Builder

To configure the trigger, define the trigger details, trigger alert threshold, and notification preferences in the next screen.

Define New Trigger 

Both the Name and Description will be included in notifications about the trigger. Ensure the name describes clearly what has happened, while the description should indicate next steps or include links back to documentation, so that the person who receives the alert will know how to respond.

Define New Trigger

Trigger Query 

After defining the trigger details within the first section, the next section displays a sample graph that shows how the trigger query and the trigger threshold components interact. When creating a new Trigger from Query Builder, the sample graph will appear automatically. When creating a new Trigger from the Triggers page, you must enter a query before the sample graph appears and the Threshold field in the Alerts section populates.

The sample graph displays the trends for your query with the most recent 16 periods as indicated by markers. Set the period length with Duration in the Alerts section to help choose an appropriate threshold. The default Duration value is 15 minutes. Use Frequency in the Alerts section to control the frequency of the query run. The frequency of the query is set at 15 minutes by default. For example, the sample graph for a 30 minute frequency with a 120 minute duration shows the previous 1920 minutes (or 32 hours).

Trigger Query with filters

Alerts 

Next, define the threshold for the trigger alert.

Define Threshold

The Threshold indicates what condition generates a notification. By default, a notification generates whenever the condition meets the threshold or resolves once below the threshold.

Note

About Triggered Groups: If you have specified fields in the GROUP BY clause of a trigger, then the trigger will notify all recipients when any new group crosses the trigger threshold.

For example, if a trigger is already in a triggered state, and any new group surpasses the trigger threshold, the trigger will again notify all recipients and include the new groups that have triggered the alert.

Also, set the number of times the Threshold, or trigger condition, should be met consecutively before alerting you. Use Send an alert after the threshold has been met x times to enter this value. This value defaults to 1 and cannot be greater than 5. How often the trigger condition is evaluated based on its Frequency value. For example, if the number of times a trigger’s threshold has been met is 3 before alerting and the trigger’s frequency is 5 minutes, then this trigger alerts when its threshold has been met for the past 15 minutes, or 3 cycles of 5 minutes.

If the “Send an alert every time threshold is met” checkbox is selected, an alert will be send to the alert recipients every time the condition is met or exceeds the threshold. This checkbox setting overrides the default behavior of alerting once when crossing the threshold and once upon resolution. When this setting is enabled, no resolved alert is sent. Use this checkbox when:

  • You want to receive alerts when Triggers continue to meet or exceed the threshold
  • The triggered event is more important than receiving a resolved event

For example, if a trigger has specified fields in its GROUP BY clause, and “Send an alert every time threshold is met” selected, then the trigger will notify all recipients when any new group crosses the trigger threshold, or if any group still exceeds the threshold. If one or more groups resolve, no resolved alert will be sent.

The Duration determines what time range of data that the trigger will check. The default Duration value is 15 minutes. You will see an Event Latency History graph next to the duration field. This graph describes the maximum and average amount of delay between the timestamp on the event and when it reached Honeycomb. You can use this data to help choose a duration that captures all your events, even if they are delayed.

For example, if the average event latency is 2 minutes, and you want to run your trigger every 5 min, choose a 7 minute duration to ensure that delayed events are captured by the trigger. Please note that if your traces span a long time frame, you may see high latency in this chart, even though the traces are arriving as soon as they complete.

Warning
The duration of a trigger query can be 1 day at most, and cannot exceed 4 times the frequency of the trigger. For example, if the trigger’s frequency is 1 hour, then query duration cannot be more than 4 hours. Query duration can also not be less than the trigger’s frequency.

The Frequency determines how often, in minutes, to check for the Threshold condition. The default Frequency value is 15 minutes. Consider what is normal within your frequency window so notifications only capture conditions worth alerting.

Warning
Trigger frequency must be specified in whole minutes, from 1 to 1440. Decimal values are truncated to the preceding full minute. (3.6 becomes 3.)

Custom Scheduling Option 

Honeycomb allows you to specify a scheduled window in which the trigger will run. For example, you may have a situation that you need to be alerted on Monday through Friday, but not on the weekends.

To enable, slide the Custom scheduling toggle to the right and specify the time range and days that the trigger should run. Note that the start time and end time must be provided in Coordinated Universal Time (UTC).

Custom Scheduling Options

Recipients 

The trigger will notify all listed recipients when the measured value crosses the configured threshold. No limitation exists for the number of recipients. By default, Honeycomb will send an alert to recipients once, when the trigger crosses the configured threshold or the Triggered state, and then send a resolved alert once the trigger is back in an OK state.

To add a new recipient, select Add Recipient. Use Go to Integration Center to configure additional trigger recipient integration options, like Slack, PagerDuty, and Webhooks.

List of Trigger Recipients with email and PagerDuty recipients

After selecting Add Recipient, a form will appear with Recipient options in a dropdown list. By default, you can select Notify by Email and enter email recipients. Additional integration options, like Slack, PagerDuty, and Webhooks, can be selected once configured.

Add Recipient form displaying the two Notify by Email fields

Activate Trigger 

Finally, select Create Trigger to save your trigger configuration.

Once saved, the trigger is immediately active and will run at the next frequency interval, such as on the next 5 minute interval for a 5 minute frequency. You can enable or disable a trigger by editing the trigger and selecting the Enable or Disable option.

Trigger Recipient Integrations (Slack, Pagerduty, Webhooks) 

You can configure PagerDuty, Slack, or Webhooks as trigger recipients that can be used by anyone on your Honeycomb team. If in the process of creating a trigger, use Go to Integration Center to configure recipient options in your Team settings page.

You can also find the Trigger Integrations list by selecting your user icon in the navigation sidebar and then select Team settings. Go to the Integrations tab to add, edit, or remove your team-level trigger recipients. Deleting an Integration from this Trigger Integrations page will remove them from all associated triggers.

Configure team-level trigger recipients

Slack 

When using the Integrations tab to link your Slack workspace to Honeycomb, authorization is needed to connect Honeycomb to your Slack team. An example of requested permissions appears below.

Only one of your Slack team’s members needs to authorize Honeycomb in order to send triggers to public channels. Team members who want to send triggers to their private channels or themselves must authorize Slack on an individual basis.

Screenshot example of Honeycomb's requested permissions when linking to your Slack workspace

Once authorized, Honeycomb can send triggers to your Slack channels with features such as link unfurling that shows a preview of your Honeycomb query result graphs.

PagerDuty 

PagerDuty’s API Integration docs describe how to create a generic API integration to PagerDuty. Following those steps will give you an Integration Key that you will enter in the Honeycomb Trigger Recipient configuration form.

Webhooks 

You can specify Webhooks for Honeycomb to send JSON payloads to upon trigger firing and to build your custom integrations. A webhook is an HTTP endpoint running within your infrastructure to which Honeycomb will send notifications of the trigger’s changing state. The content will include an authentication header and the result of the trigger in JSON in the body of the webhook.

The API for webhook notifications are described by an example webhook implementation that can consume webhook notifications.

View and Edit Triggers 

View all triggers for your team by selecting the Triggers icon in the left navigation bar. You will see a full list of the triggers. Select the trigger name to view and edit each trigger.

Use the search function to find a Trigger based on its name.

Triggers Page

Delete Triggers 

To delete a trigger, either select the Delete button on the Triggers page, or while editing, select the Delete button at the bottom of the Edit Trigger page.

Considerations 

Queries done for triggers run your selected calculation over your configured query duration. COUNTs, for example, will be the total count over the query duration, not a per-second count. Averages and percentiles are likewise covering the entire duration—so to detect spikes, it is better to use MAX instead of AVG over a period. Another alternative is to use a COUNT with a filter restricting the set to the threshold you are interested in—for example, you could count the number of events over 100ms and use that with a threshold instead of asking for the average to exceed 100ms.

The MIN calculation returns 0 for cases in which no data is found that satisfies the query, making it difficult to distinguish between a valid minimum of 0 and a null result. For this reason, you should avoid using the MIN calculation in combination with the < threshold operator in order to prevent false positives when no data is found. Instead, try switching the query to using COUNT with a filter restricting the set, and alert when the value exceeds a threshold. For example, given that a timeout should never be less than 100 ms, you could COUNT the number of events with timeout < 100, and trigger with a threshold > 0.

Triggers on RATE_SUM, RATE_MIN, and RATE_AVG calculate the difference between all the aggregated data points over the most recent query duration and the preceding query duration. In other words, a RATE trigger with a duration of 20 minutes will return a result based on the last 40 minutes of data points. See more about using the RATE aggregations.

Best Practices 

  • Use the Name and Description fields effectively. The Name field should tell you what the alert is; the Description field should tell you what to do about the alert. Links to internal wikis or runbooks are best.
  • Use filters to improve the quality of your signal. If you are interested in latency, but have a long poll endpoint, use a filter to remove that endpoint from the calculation rather than adjusting the values of the threshold.
  • To detect spikes in latency metrics, combine a filter with your cutoff (for example, >100ms) with a COUNT. Your result will be the number of events that exceed your threshold.
  • To ignore spikes in latency and trigger on overall performance, use the P95 or P99 calculations. These will be more representative of the majority of traffic than AVG, which can be polluted by large outliers.
  • When detecting errors, allow good values instead of looking for bad values. For example, instead of building a filter of HTTP status codes == 500, use several filters to look for events that do not have status codes 200, 301, 302, or 404.