Triggers let you receive notifications when your data in Honeycomb crosses the thresholds that you configure. The graph on which to alert is as flexible as a Honeycomb query, which helps reduce false positives due to known errors.
When a trigger fires, you will be notified by the configured method(s). Currently supported methods are PagerDuty, Slack, Webhooks, and Email. The notification includes a link back to the graph, which shows the current status and provides a jumping off point for further investigation.
Triggers have a duration over which they query data and a frequency, which determines how often they run. For example, a trigger with a duration of 5 minutes and a frequency of 2 minutes will run every 2 minutes over the last 5 minutes of data.
By default, users have a limit of two triggers available across all environments. Upgrade to a Pro or Enterprise plan to increase your number of Triggers.
You can create a trigger within the Triggers page or while using the Query Builder.
Select the Triggers icon in the left navigation bar to reach the Triggers page. From the Triggers page:
Creating a new trigger from the Triggers page will require entering a query during trigger configuration.
From the Query Builder screen:
For this example, we want to know whenever our cart has an error that is considered “slow” and to have the results grouped by userid
, http.url
, and requestID
.
To configure the trigger, define the trigger details, trigger alert threshold, and notification preferences in the next screen.
Both the Name and Description will be included in notifications about the trigger. Ensure the name describes clearly what has happened, while the description should indicate next steps or include links back to documentation, so that the person who receives the alert will know how to respond.
After defining the trigger details within the first section, the next section displays a sample graph that shows how the trigger query and the trigger threshold components interact. When creating a new Trigger from Query Builder, the sample graph will appear automatically. When creating a new Trigger from the Triggers page, you must enter a query before the sample graph appears and the Threshold field in the Alerts section populates.
The sample graph displays the trends for your query with the most recent 16 periods as indicated by markers. Set the period length with Duration in the Alerts section to help choose an appropriate threshold. The default Duration value is 15 minutes. Use Frequency in the Alerts section to control the frequency of the query run. The frequency of the query is set at 15 minutes by default. For example, the sample graph for a 30 minute frequency with a 120 minute duration shows the previous 1920 minutes (or 32 hours).
Next, define the threshold for the trigger alert.
The Threshold indicates what condition generates a notification. By default, a notification generates whenever the condition meets the threshold or resolves once below the threshold.
About Triggered Groups: If you have specified fields in the GROUP BY clause of a trigger, then the trigger will notify all recipients when any new group crosses the trigger threshold.
For example, if a trigger is already in a triggered state, and any new group surpasses the trigger threshold, the trigger will again notify all recipients and include the new groups that have triggered the alert.
Also, set the number of times the Threshold, or trigger condition, should be met consecutively before alerting you.
Use Send an alert after the threshold has been met x
times to enter this value.
This value defaults to 1
and cannot be greater than 5
.
How often the trigger condition is evaluated based on its Frequency value.
For example, if the number of times a trigger’s threshold has been met is 3
before alerting and the trigger’s frequency is 5
minutes, then this trigger alerts when its threshold has been met for the past 15 minutes, or 3 cycles of 5
minutes.
If the “Send an alert every time threshold is met” checkbox is selected, an alert will be send to the alert recipients every time the condition is met or exceeds the threshold. This checkbox setting overrides the default behavior of alerting once when crossing the threshold and once upon resolution. When this setting is enabled, no resolved alert is sent. Use this checkbox when:
For example, if a trigger has specified fields in its GROUP BY clause, and “Send an alert every time threshold is met” selected, then the trigger will notify all recipients when any new group crosses the trigger threshold, or if any group still exceeds the threshold. If one or more groups resolve, no resolved alert will be sent.
The Duration determines what time range of data that the trigger will check. The default Duration value is 15 minutes. You will see an Event Latency History graph next to the duration field. This graph describes the maximum and average amount of delay between the timestamp on the event and when it reached Honeycomb. You can use this data to help choose a duration that captures all your events, even if they are delayed.
For example, if the average event latency is 2 minutes, and you want to run your trigger every 5 min, choose a 7 minute duration to ensure that delayed events are captured by the trigger. Please note that if your traces span a long time frame, you may see high latency in this chart, even though the traces are arriving as soon as they complete.
The Frequency determines how often, in minutes, to check for the Threshold condition. The default Frequency value is 15 minutes. Consider what is normal within your frequency window so notifications only capture conditions worth alerting.
1
to 1440
.
Decimal values are truncated to the preceding full minute.
(3.6
becomes 3
.)Honeycomb allows you to specify a scheduled window in which the trigger will run. For example, you may have a situation that you need to be alerted on Monday through Friday, but not on the weekends.
To enable, slide the Custom scheduling toggle to the right and specify the time range and days that the trigger should run. Note that the start time and end time must be provided in Coordinated Universal Time (UTC).
The trigger will notify all listed recipients when the measured value crosses the configured threshold. No limitation exists for the number of recipients. By default, Honeycomb will send an alert to recipients once, when the trigger crosses the configured threshold or the Triggered state, and then send a resolved alert once the trigger is back in an OK state.
To add a new recipient, select Add Recipient. Use Go to Integration Center to configure additional trigger recipient integration options, like Slack, PagerDuty, and Webhooks.
After selecting Add Recipient, a form will appear with Recipient options in a dropdown list. By default, you can select Notify by Email and enter email recipients. Additional integration options, like Slack, PagerDuty, and Webhooks, can be selected once configured.
Finally, select Create Trigger to save your trigger configuration.
Once saved, the trigger is immediately active and will run at the next frequency interval, such as on the next 5 minute interval for a 5 minute frequency. You can enable or disable a trigger by editing the trigger and selecting the Enable or Disable option.
You can configure PagerDuty, Slack, or Webhooks as trigger recipients that can be used by anyone on your Honeycomb team. If in the process of creating a trigger, use Go to Integration Center to configure recipient options in your Team settings page.
You can also find the Trigger Integrations list by selecting your user icon in the navigation sidebar and then select Team settings. Go to the Integrations tab to add, edit, or remove your team-level trigger recipients. Deleting an Integration from this Trigger Integrations page will remove them from all associated triggers.
When using the Integrations tab to link your Slack workspace to Honeycomb, authorization is needed to connect Honeycomb to your Slack team. An example of requested permissions appears below.
Only one of your Slack team’s members needs to authorize Honeycomb in order to send triggers to public channels. Team members who want to send triggers to their private channels or themselves must authorize Slack on an individual basis.
Once authorized, Honeycomb can send triggers to your Slack channels with features such as link unfurling that shows a preview of your Honeycomb query result graphs.
PagerDuty’s API Integration docs describe how to create a generic API integration to PagerDuty. Following those steps will give you an Integration Key that you will enter in the Honeycomb Trigger Recipient configuration form.
You can specify Webhooks for Honeycomb to send JSON payloads to upon trigger firing and to build your custom integrations. A webhook is an HTTP endpoint running within your infrastructure to which Honeycomb will send notifications of the trigger’s changing state. The content will include an authentication header and the result of the trigger in JSON in the body of the webhook.
The API for webhook notifications are described by an example webhook implementation that can consume webhook notifications.
View all triggers for your team by selecting the Triggers icon in the left navigation bar. You will see a full list of the triggers. Select the trigger name to view and edit each trigger.
Use the search function to find a Trigger based on its name.
To delete a trigger, either select the Delete button on the Triggers page, or while editing, select the Delete button at the bottom of the Edit Trigger page.
Queries done for triggers run your selected calculation over your configured query duration.
COUNT
s, for example, will be the total count over the query duration, not a per-second count.
Averages and percentiles are likewise covering the entire duration—so to detect spikes, it is better to use MAX
instead of AVG
over a period.
Another alternative is to use a COUNT
with a filter restricting the set to the threshold you are interested in—for example, you could count the number of events over 100ms and use that with a threshold instead of asking for the average to exceed 100ms.
The MIN
calculation returns 0
for cases in which no data is found that satisfies the query, making it difficult to distinguish between a valid minimum of 0 and a null result.
For this reason, you should avoid using the MIN
calculation in combination with the <
threshold operator in order to prevent false positives when no data is found.
Instead, try switching the query to using COUNT
with a filter restricting the set, and alert when the value exceeds a threshold.
For example, given that a timeout should never be less than 100 ms, you could COUNT
the number of events with timeout < 100
, and trigger with a threshold > 0
.
Triggers on RATE_SUM
, RATE_MIN
, and RATE_AVG
calculate the difference between all the aggregated data points over the most recent query duration and the preceding query duration.
In other words, a RATE
trigger with a duration of 20 minutes will return a result based on the last 40 minutes of data points.
See more about using the RATE aggregations.
>100ms
) with a COUNT
.
Your result will be the number of events that exceed your threshold.P95
or P99
calculations.
These will be more representative of the majority of traffic than AVG
, which can be polluted by large outliers.== 500
, use several filters to look for events that do not have status codes 200
, 301
, 302
, or 404
.