Visualize Rate | Honeycomb

Visualize Rate

The RATE_MAX, RATE_AVG, and RATE_SUM aggregates in a query allow you to visualize the rate, or the ongoing change over time of a field.

This visualization can be useful to understand the rate at which a field is changing, and in particular, is a helpful tool in understanding metric data that is collected as a counter or sum.

  • RATE_MAX(<numeric_field_name>) - Displays the difference between subsequent field values after applying the MAX operator
  • RATE_AVG(<numeric_field_name>) - Displays the difference between subsequent field values after applying the AVG operator
  • RATE_SUM(<numeric_field_name>) - Displays the difference between subsequent field values after applying the SUM operator

Rate Operations and Metrics Counters 

Metrics datasets often contain “counter” style fields. The value of these fields represent the number of times that a particular action occurred since a specific date.

For example, the metric system.network.dropped is frequently a counter that represents the number of network packets that were dropped or discarded since the machine booted up.

With this type of data, a visualization of AVG(system.network.dropped) would yield a graph that is mostly a straight line, trending up and to the right. The slope of this line will change occasionally as network characteristics fluctuate over time. In fact, the slope of this line is the most useful piece of information provided by this visualization, as it represents the amount of network packets being dropped at any given point in time rather than the total number of dropped packets since the system started up.

In this example, using RATE_AVG(system.network.dropped) in Query Builder would allow you to view a graph of this slope as it changes over time.

How the RATE Operations are Calculated 

The RATE operations are all calculated according to the same process.

In the very simplest case, imagine a dataset field that contains one value every 10 seconds.

Timestamp my.field
12:30:00 100
12:30:10 150
12:30:20 210
12:30:30 265

A rate query is designed to get the difference between the value of my.field for that time bucket and the previous time bucket. A Rate query with a ten second granularity would return that result, as expected. (RATE_MAX(my.field), RATE_SUM(my.field), and RATE_AVG(my.field) would all work.)

Timestamp RATE_xxx(my.field)
12:30:00 no data
12:30:10 150 - 100 = 50
12:30:20 210 - 150 = 60
12:30:30 265 - 210 = 55

However, if there were more than one data point in an individual bucket, we would need to apply the aggregation (MAX, SUM, or AVG) against all data points in that bucket to coalesce them into a single value.

Additionally, if there is a gap between the given value and the previous value, RATE_xxx(field) will linearly interpolate the difference between the values across the gap.

For example, with a dataset containing the following events:

Timestamp my.field
12:30:10 0
12:30:13 3
12:30:15 6
12:30:20 10
12:30:23 11
12:30:25 24
12:30:30 27
12:30:33 29
12:30:35 34
12:30:50 39
12:30:53 44
12:30:55 49

Given the above events, querying the RATE aggregations against my.field with a granularity of 10 seconds would result in a graph of the following values:

Time RATE_MAX(my.field) RATE_SUM(my.field) RATE_AVG(my.field)
12:30:10 no data (nothing to compare) no data no data
12:30:20 MAX(10,11,24) - MAX(0,3,6) = 18 SUM(10,11,24) - SUM(0,3,6) = 36 AVG(10,11,24) - AVG(0,3,6) = 12
12:30:30 MAX(27,29,34) - MAX(10,11,24) = 10 SUM(27,29,34) - SUM(10,11,24) = 45 AVG(27,29,34) - AVG(10,11,24) = 15
12:30:40 no data (no underlying data) no data no data
12:30:50 (MAX(39,44,49) - MAX(27,29,34)) / 2 = 7.5 (SUM(39,44,49) - SUM(27,29,34)) / 2 = 21 (AVG(39,44,49) - AVG(27,29,34)) / 2 = 7

Rate Aggregations and Granularity 

Note
When making a query against a metrics dataset with a regular cadence of events, ensure that your query granularity divides evenly into the metrics capture interval.

Query granularity has an important effect on the results of RATE aggregations, since it determines the number of data points that end up aggregated together in any specific bucket. RATE_SUM, in particular, is sensitive to granularity choice. If the number of data points that fit into a bucket vary, the result of the SUM aggregation will in turn be variable, and the resulting calculated rate of change between each bucket will reflect that variability.

For instance, imagine a metrics counter, such as system.cpu.time.user, with a value that is captured and stored in an event every 10 seconds. A RATE_SUM() aggregation over this counter with a granularity set to a multiple of 10 seconds – for example, a 30 second granularity – will result in a consistent number of data points for each bucket. In our example, we would expect to see 3 data points per host.

If the granularity of this aggregation is set to a value that does not evenly divide into 10 seconds – for example, a 15 second granularity – each bucket will have a different number of data points, and thus an inconsistent aggregate result. In this example, we would see 2 data points per host in every other bucket, and 1 data point per host in alternating buckets.

Rate Aggregations and Triggers 

It is possible to create triggers for RATE_MAX, RATE_AVG, and RATE_SUM queries. Since query granularity is not available in triggers, RATE aggregations are calculated over the trigger duration. For instance, a trigger for RATE_MAX(system.cpu.time.user) with a duration of 15 minutes would calculate the difference between the most recent 15 minutes of data and the preceding 15 minutes of data.