When working with a dataset, the primary control for constructing queries is the Query Builder. Below is an example of the Builder, ready for you to make changes to the query’s clauses and run it over your data by applying those changes.
A query in Honeycomb consists of five clauses:
The default output for most queries will be a time series and a summary table, though the precise composition will depend on the composition of your query:
Let’s take a closer look at how to use each of these clauses, and cases in which each of them can be particularly useful. When we discuss the effects of each of these operations, events are inputs to the query (the series of raw payloads you sent that match a set of criteria). Results, on the other hand, refer to the output of a query after any applicable processing or aggregation.
Click on any box in the query builder to edit the clauses there. In this shot, the user has set Visualize to be count
and has Grouped By hostname
. Now, they’re adding a new Where clause on status code. Honeycomb autocomplete helps construct the query.
Honeycomb supports a wide range of calculations to provide insight into your events. When a grouping is provided, calculations occur within each group; otherwise, anything calculated is done so over all matching events.
For example, say you’ve collected the following events from your web server logs:
Timestamp | uri | status_code | response_time_ms |
---|---|---|---|
2016-08-01 07:30 | /about | 500 | 126 |
2016-08-01 07:45 | /about | 200 | 57 |
2016-08-01 07:57 | /docs | 200 | 82 |
2016-08-01 08:03 | /docs | 200 | 23 |
Specifying a visualization for a particular attribute (e.g. P95(response_time_ms)
) means to apply the aggregation function (in this case, P95
, or taking the 95th percentile) over the values for the attribute (response_time_ms
) across all input events.
Defining multiple “visualize” clauses is common and can be useful, especially when comparing the outputs of each of the visualizations (e.g. comparing the AVG
to the P95
of some value).
While most visualize queries return a line graph, the Heatmap
visualize allows you to create powerful Heatmaps, which allow you to see the distribution of data in a rich and interactive way; and allow you to use BubbleUp.
Scenario: we want to capture overall statistics for our web server. Given our four-event dataset described above, consider a query which contains:
COUNT
AVG
of response_time_ms
valuesP95
of response_time_ms
valuesThese calculations would return statistics across the entirety of our dataset:
COUNT | AVG(response_time_ms) | P95(response_time_ms) |
---|---|---|
4 | 72 | 119.4 |
Sometimes you want to constrain the events by some attribute besides time: ignoring an outlier case, for example, or isolating events triggered by a particular actor or circumstance.
For example, say you’ve collected the following events from your web server logs:
Timestamp | uri | status_code |
---|---|---|
2016-08-01 08:15 | /about | 500 |
2016-08-01 08:22 | /about | 200 |
2016-08-01 08:27 | /docs | 403 |
You can define any number of arbitrary constraints based on event values. Where clauses work in concert with the specified time range to define the events that are ultimately considered by any Group By or Visualize clauses.
Note that the Where clause does not require string delimiters or escape characters; to match a url of /docs
, simply enter url = /docs
.
Scenario: we want to understand the frequency of unsuccessful web requests. Given our three-event dataset described above, consider a query which contains:
COUNT
status_code != 200
The Where clause removes the successful event (our /about
web request returning a 200
) from consideration, and only counts the first and third events towards our Visualize clause:
COUNT | |
---|---|
2 |
Scenario: we want to refine our constraints further, to span multiple attributes for each event. Combining where clauses returns events that satisfy either the intersection of all specified Where clauses, or the union*. Given our three-event dataset described above, consider a query which contains:
uri = "/about"
status_code != 200
As all three events are considered by the Where clauses, only the first one satisfies both:
Timestamp | uri | status_code |
---|---|---|
2016-08-01 08:55 | /about | 500 |
Honeycomb also allows you to look at the union of clauses by setting to an OR
.
status_code = 500
status_code = 403
Timestamp | uri | status_code |
---|---|---|
2016-08-01 08:55 | /about | 500 |
2016-08-01 08:27 | /docs | 403 |
Being able to separate a series of events into groups by attribute is a powerful way to compare segments of your dataset against each other.
For example, say you’ve collected the following events from your web server logs:
Timestamp | uri | status_code |
---|---|---|
2016-08-01 07:30 | /about | 500 |
2016-08-01 07:45 | /about | 200 |
2016-08-01 07:57 | /docs | 200 |
You might want to analyze your web traffic in groups based on the uri
("/about" vs “/docs”) or the status_code
(500 vs 200). Choosing to group by uri
would return two result rows: one representing events in which uri="/about"
and another representing events in which uri="/docs"
. Each of these grouped results rows will be represented by a single line on a graph.
Grouping by more than one attribute will consider each unique combination of values as a single group. Here, choosing to group by both uri
and status_code
will return three groups: /about
+500
, /about
+200
, and /docs
+200
.
Grouping, paired with calculation, can often reveal interesting patterns in your underlying events—grouping by uri
, for example, and calculating response time stats will show you the slowest (or fastest) uri
s.
TIP: Honeycomb supports grouping your data based on any attribute in an event, though you’ll likely receive the clearest results by choosing an attribute with an uneven distribution within your data.
Scenario: we want to examine performance of our web server by endpoint. Given our four-event dataset described above, consider a query which contains:
uri
COUNT
AVG
of response_time_ms
valuesPairing a Grouping clause with a Visualize clause results in events being grouped by uri
; Honeycomb draws one line for each group, and calculates statistics within each group:
uri | COUNT | AVG(response_time_ms) |
---|---|---|
/about | 2 | 91.5 |
/docs | 2 | 52.5 |
This technique is particularly powerful when paired with an Order By and a Limit to return “Top K”-style results.
In this figure, the user has a VISUALIZE
by COUNT
, GROUP BY eventtype
. The two curves, in purple and orange, show the two groups. The popup shows that the user is hovering the trace_span
eventtype, which has a count of 1460 in that 15-second time range.
When you roll your mouse over the results list at the bottom of the page, each group is highlighted in turn. The user has highlighted request
and sees the orange line highlighted, and the purple line dimmed.
Rollover for heatmaps is slightly different, and described on the Heatmaps page.
Order clauses define an ordering on results rows, while Limit clauses simply limit the total number of result rows to retrieve. They can be used independently but are most powerful together, to capture the “Top K” of some set of results.
For example, say you’ve collected the following events from your web server logs:
Timestamp | uri | status_code | response_time_ms |
---|---|---|---|
2016-08-01 09:17 | /about | 200 | 57 |
2016-08-01 09:18 | /about | 500 | 234 |
2016-08-01 09:20 | /404 | 200 | 12 |
2016-08-01 09:25 | /docs | 200 | 82 |
You can define any number of Order By clauses in a query and they will be respected in the order they’re specified.
The Order By clauses available to you for a particular query are influenced by whether any Group By or Visualize clauses are also specified. If none are, you may order by any of the attributes contained in the dataset. However, once a Group By or Visualize clause exists, you may only order by the values generated by those clauses.
Scenario: we just want to get a sense of the slowest endpoints in our web server. Given our four-event dataset described above, consider a query which contains:
response_time_ms
in descending (DESC
) orderRemember that when no Visualize clauses are defined, we simply return raw events as the result rows:
Timestamp | uri | status_code | response_time_ms |
---|---|---|---|
2016-08-01 09:18 | /about | 500 | 234 |
Scenario: we want to capture statistics for our web server and know what we’re looking for (long response_time_ms
s). Given our four-event dataset described above, consider a query which contains:
P95
of response_time_ms
valuesuri
P95(response_time_ms)
in descending (DESC
) order2
resultsOur Group By and Visualize queries influence what will be returned as result rows (uri
and the P95(response_time_ms)
for events within each distinct uri
group), while the Order by determines the sort order of those results (longest P95(response_time_ms)
first) and the Limit throws away any results beyond the top 2
:
uri | P95(response_time_ms) |
---|---|
/about | 225.15 |
/docs | 82 |
As you can see, any results referencing the event with uri="/404"
was excluded from our result set as a result of its relatively low response_time_ms
.
This sort of Top K query is particularly valuable when working with high-cardinality data sets, where a Group by clause might split your dataset into a very large number of groups.
The dataset switcher allows you to change the dataset you’re working on without changing the query.
When you switch datasets, Honeycomb will load the existing query on that dataset, but it won’t run it automatically. You can execute it by typing Shift + Enter or clicking the Run Query button. You can also clear the query by clicking the Clear link underneath the Run Query button.
This functionality is especially useful when switching between testing and production datasets, where much of the schema overlaps. It’s also useful when you need to view a specific time range across multiple datasets (yes, the time range of the query is included when you switch datasets).
If there are fields in the query that do not exist in the new dataset, Honeycomb will display an informational notice letting you know that those fields were removed from the query.
Note: In Secure Tenancy environments, queries are not preserved on dataset switching.
Want more examples? Ask! We’re happy to help.