Triggers let you receive notifications when your data in Honeycomb crosses the thresholds you configure. The graph on which to alert is as flexible as a Honeycomb query, which helps reduce false positives due to known errors.
When a trigger fires, you’ll be notified via the configured method. Currently supported are PagerDuty, Slack, Webhooks, and Email. The notification includes a link back to the graph showing you the current status, providing a jumping off point for further investigation.
Triggers have a duration over which they query data and a frequency which determines how often they run. A trigger with a duration of 5 minutes (set query
time_range to 300) and a frequency of 2 minutes will run every 2 minutes over the last 5 minutes of data.
Important The duration of a trigger query can be at most 1 day, and cannot exceed 4 times the frequency of the trigger. For example, if the trigger’s frequency is 1 hour, query duration can not be more than 4 hours. Query duration can also not be less than the trigger’s frequency.
Important: Trigger frequency must be specified in whole minutes, from 1 to 1440. Decimal values are truncated to the preceding full minute. (
For this example, we want to know whenever the 95th percentile of our API server’s requests exceeds 30ms, but we want to exclude the
/poll endpoint because it has long-held connections, which pollute the data by being artificially high.
Start the trigger creation process by building and running a query. Select the three-dot overflow menu and click “Make Trigger”.
Fill in the details for the trigger. Both the Name and Description will be included in notifications about the trigger. Make sure the name describes clearly what has happened, while the description should indicate next steps or include links back to documentation, so that the person who receives the alert will know how to respond.
The sample graph displays the most recent 16 periods for your query (with a granularity equal to the period length) to help choose an appropriate threshold. For example, the graph for a 5 minute frequency shows the previous 80 minutes with a 5 minute granularity.
The Threshold indicates what condition generates a notification, and the Frequency determines how often to check for that condition. Consider what is normal within your frequency window so notifications only capture conditions worth alerting on.
The Duration determines what time range of data the trigger will check. You’ll see a chart of your dataset’s maximum and average event latency next to the duration field. This chart describes the amount of delay between the timestamp on the event and when it reached Honeycomb. You can use this data to help choose a duration that captures all your events, even if they are delayed. For example, if the average event latency is 2 minutes, and you want to run your trigger every 5 min, choose a 7 minute duration to ensure that delayed events are captured by the trigger. Please note that if your traces span a long timeframe, you may see high latency in this chart, even though the traces are arriving as soon as they complete.
The trigger will notify all recipients listed when the measured value crosses the configured threshold.
If you have specified fields in the Group By clause, then the trigger will notify all recipients every time a new group surpasses the trigger threshold. Thus, if a trigger is already in a triggered state, and a new group surpasses the trigger threshold, the trigger will again notify all recipients and include the new groups that have triggered the alert.
By default, you will be able to enter email recipients. Additional recipients will be available in the dropdown after you have configured them on a team-level basis.
You can configure PagerDuty, Slack, or Webhooks as trigger recipients that can be used by anyone on your Honeycomb team. If you are in the process of creating a trigger, you can click “Configure Integrations” from within the trigger editing page to set them up in your team settings page.
You can also find this page by clicking on your user icon in the navigation sidebar > Team settings > Integrations. Here you can add, edit, or remove your team-level trigger recipients. Deleting an Integration from here will remove them from all associated triggers.
Allow Honeycomb to connect to your Slack team to send triggers to your Slack channels, with nifty features such as link unfurling that will show you a preview of your Honeycomb query result graphs. Only one of your Slack team’s members needs to authorize Honeycomb in order to send triggers to public channels. Team members who want to send triggers to their private channels or themselves must authorize Slack on an individual basis.
Note: previously, our Slack trigger recipients used Slack’s Incoming Webhooks. These recipients will still work, but you will no longer be able to add new recipients in this way.
PagerDuty’s API Integration docs describe how to create a generic API integration to PagerDuty. Following those steps will give you an Integration Key that you’ll enter in the Trigger Recipient configuration form.
You can specifiy Webhooks for Honeycomb to send JSON payloads to upon trigger firing, to build your custom integrations. A webhook is an HTTP endpoint running within your infrastructure to which Honeycomb will send notifications of the trigger changing state. The content will include an authentication header and the result of the trigger in JSON in the body of the webhook.
The API for webhook notifications are described via an example webhook implementation that can consume webhook notifications.
When you save the trigger, it is immediately active and will run at the next frequency interval (such as on the next 5 minute interval for a 5 minute frequency).
You can find the triggers belonging to a dataset by going into the settings page for the dataset, then clicking the “Triggers” tab.
You’ll see a full list of the triggers active for the dataset. You’ll be able to click through and edit each trigger from this list.
To delete a trigger, scroll to the bottom of the edit page.
Queries done for triggers run your selected calculation over your configured frequency period.
COUNTs, for example, will be the total count for that period, not a per-second count. Averages and percentiles are likewise covering the entire period—so to detect spikes, it is better to use
MAX instead of
AVG over a period. Another alternative is to use a
COUNT with a filter restricting the set to the threshold you’re interested in—for example, you could count the number of events over 100ms and use that with a threshold instead of asking for the average to exceed 100ms.
>100ms) with a
COUNT. Your result will be the number of events that exceed your threshold.
P99calculations. These will be more representative of the majority of traffic than
AVG, which can be polluted by large outliers.
== 500, use several filters to look for events that don’t have status codes 200, 301, 302, 404, etc.