Examples: Query for Metrics

Querying metrics requires sending metrics data to Honeycomb first.

Write Queries for Metrics Data 

Metrics are stored in Honeycomb as fields on events. They can be queried just like any other data in a dataset. However, the kinds of queries typically written for metrics differs from traces.

Common VISUALIZE Operations 

Use any of the following common operations in the VISUALIZE clause of Query Builder when visualizing metrics data:

  • HEATMAP(<metric_field_name>)
  • AVG(<metric_field_name>)
  • SUM(<metric_field_name>)
  • MAX(<metric_field_name>)
  • MIN(<metric_field_name>)
  • PXX(<metric_field_name>)

We recommend that you combine HEATMAP with other Visualize Operations to get a better sense of trends over time.

Refer to the Visualize Operations documentation for more information on these operators.

For metrics data, avoid using the COUNT VISUALIZE operation. COUNT measures the total number of metrics events rather than the actual value of a metric.

For example, if tracking memory utilization of a host, the COUNT operator will not show the counter associated with memory utilization over time. Instead, use HEATMAP(host.memory_bytes) and AVG(host.memory_bytes) to visualize, assuming the instrument that measures memory utilization is called host.memory_bytes.

Track the Rate of Change 

Tracking the rate at which a measurement changes over time is a common operation when working with metrics data. To do that, use RATE_MAX, RATE_AVG, and RATE_SUM aggregate operators.

A common way to query metrics is to have two stacked visualization operations, such as:

VISUALIZE
AVG(host.memory_bytes)
RATE_AVG(host.memory_bytes)

When you visualize both operations, the results show the average memory utilization over time, and also interesting spikes in the rate of change.

How Metrics are Stored in Honeycomb 

Values and Fields 

The values for any given metric event are the measurements collected at the timestamp associated with the event.

Multiple metrics appear together as separate fields on the same event if they were received as part of the same OTLP request, have equivalent timestamps when truncated to the second (we truncate metric timestamps to the second for improved compaction), and share the same set of unique resources and attributes. Find out how Honeycomb converts incoming metrics data into events.

Numeric Metrics 

Counters, gauges, sums, and summary metrics result in single-valued numeric data, and these values show up as individual fields within a metric event, with the field name being the same as the metric name. For example, an application might send metrics for host.cpu_usage and app.memory_bytes. These names will show up as individual fields in a metrics event.

Histograms 

OpenTelemetry (OTel) Histograms contain aggregated data – a collection of buckets, each of which stores the number of values that were added to that bucket during the reporting period. When ingesting histograms, Honeycomb aggregates them in a different way. It creates a collection of fields with all of which contained in a single event. For a histogram named latency, these fields will include these aggregations:

Field Meaning
latency.count The total number of points
latency.sum The sum of all the values
latency.avg The mean (average) of all the values (sum/count)

In addition, Honeycomb records histogram data with fields containing p values, which are values that are greater than a given percentage of the data. For example, in a running race involving 10 competitors, p50 would be the finishing time of the 5th competitor, and p90 would correspond to the finishing time of the runner who finished 9th. The value for ‘p50’ is also known as the median in statistics. Here is the full list of p values recorded for a histogram named latency over OTLP:

p Value Percentage
latency.p001 0.1%
latency.p01 1%
latency.p05 5%
latency.p10 10%
latency.p20 20%
latency.p25 25%
latency.p50 50%
latency.p75 75%
latency.p80 80%
latency.p90 90%
latency.p95 95%
latency.p99 99%
latency.p999 99.9%

We recommend querying histograms by using MAX(pValue) – for example, MAX(latency.p99) will show you the worst-case latency measurement for 99% of spans.

Metrics Correlations 

It may be useful to view infrastructure metrics for your systems alongside query results from non-metrics datasets. For instance, a system running out of memory, CPU, or network resources might be the reason for an out-of-compliance SLO or an alerting trigger, and seeing the graph of the problem alongside graphs of relevant system resources could confirm or deny this kind of hypothesis.

Define a Board of the relevant metric queries you want to see in relation to other data, and then use Correlations when using Query Builder to view and compare your query results with up to six saved queries on a Board.