Querying metrics requires sending metrics data to Honeycomb first.
Metrics are stored in Honeycomb as fields on events. They can be queried just like any other data in a dataset. However, the kinds of queries typically written for metrics differs from traces.
Use any of the following common operations in the VISUALIZE clause of Query Builder when visualizing metrics data:
HEATMAP(<metric_field_name>)
AVG(<metric_field_name>)
SUM(<metric_field_name>)
MAX(<metric_field_name>)
MIN(<metric_field_name>)
PXX(<metric_field_name>)
We recommend that you combine HEATMAP
with other Visualize Operations to get a better sense of trends over time.
Refer to the Visualize Operations documentation for more information on these operators.
For metrics data, avoid using the COUNT
VISUALIZE operation.
COUNT
measures the total number of metrics events rather than the actual value of a metric.
For example, if tracking memory utilization of a host, the COUNT
operator will not show the counter associated with memory utilization over time.
Instead, use HEATMAP(host.memory_bytes)
and AVG(host.memory_bytes)
to visualize, assuming the instrument that measures memory utilization is called host.memory_bytes
.
Tracking the rate at which a measurement changes over time is a common operation when working with metrics data.
To do that, use RATE_MAX
, RATE_AVG
, and RATE_SUM
aggregate operators.
A common way to query metrics is to have two stacked visualization operations, such as:
VISUALIZE |
---|
AVG(host.memory_bytes) RATE_AVG(host.memory_bytes) |
When you visualize both operations, the results show the average memory utilization over time, and also interesting spikes in the rate of change.
The values for any given metric event are the measurements collected at the timestamp associated with the event.
Multiple metrics appear together as separate fields on the same event if they were received as part of the same OTLP request, have equivalent timestamps when truncated to the second (we truncate metric timestamps to the second for improved compaction), and share the same set of unique resources and attributes. Find out how Honeycomb converts incoming metrics data into events.
Counters, gauges, sums, and summary metrics result in single-valued numeric data, and these values show up as individual fields within a metric event, with the field name being the same as the metric name.
For example, an application might send metrics for host.cpu_usage
and app.memory_bytes
.
These names will show up as individual fields in a metrics event.
OpenTelemetry (OTel) Histograms contain aggregated data – a collection of buckets, each of which stores the number of values that were added to that bucket during the reporting period.
When ingesting histograms, Honeycomb aggregates them in a different way.
It creates a collection of fields with all of which contained in a single event.
For a histogram named latency
, these fields will include these aggregations:
Field | Meaning |
---|---|
latency.count |
The total number of points |
latency.sum |
The sum of all the values |
latency.avg |
The mean (average) of all the values (sum/count) |
In addition, Honeycomb records histogram data with fields containing p
values, which are values that are greater than a given percentage of the data.
For example, in a running race involving 10 competitors, p50
would be the finishing time of the 5th competitor, and p90
would correspond to the finishing time of the runner who finished 9th.
The value for ‘p50’ is also known as the median in statistics.
Here is the full list of p
values recorded for a histogram named latency
over OTLP:
p Value | Percentage |
---|---|
latency.p001 |
0.1% |
latency.p01 |
1% |
latency.p05 |
5% |
latency.p10 |
10% |
latency.p20 |
20% |
latency.p25 |
25% |
latency.p50 |
50% |
latency.p75 |
75% |
latency.p80 |
80% |
latency.p90 |
90% |
latency.p95 |
95% |
latency.p99 |
99% |
latency.p999 |
99.9% |
We recommend querying histograms by using MAX(pValue)
– for example, MAX(latency.p99)
will show you the worst-case latency measurement for 99% of spans.
It may be useful to view infrastructure metrics for your systems alongside query results from non-metrics datasets. For instance, a system running out of memory, CPU, or network resources might be the reason for an out-of-compliance SLO or an alerting trigger, and seeing the graph of the problem alongside graphs of relevant system resources could confirm or deny this kind of hypothesis.
Define a Board of the relevant metric queries you want to see in relation to other data, and then use Correlations when using Query Builder to view and compare your query results with up to six saved queries on a Board.