Sampled Data in Honeycomb

Learn how Honeycomb adjusts for sample rate when querying sampled data and considerations for working with sampled data.

How Honeycomb Adjusts for Sample Rate 

When you sample your data with our sampling techniques, each span in a trace is given a SampleRate attribute that represents N when you only sample 1/N traces. This allows Honeycomb to weight counts to compensate for the fact that you are sampling your data.

For example, you are doing head sampling at a 10% sampling rate, which means only 10% of traces are exported to Honeycomb:

Trace ID Sample Rate (on each span) duration_ms
abcd1234 10 200
4321dcba 10 1100

In this example, the SampleRate attribute is set to 10 because you are sampling 10% of traces, or 1 in 10 traces. With this information, Honeycomb can correct for sample rate and calculate accurate values for various aggregations:

  • COUNT of traces: (2 * 10) = 20
  • AVG(duration_ms): ((200 * 10) + (1100 * 10)) / (10 + 10) = 650

This means you can send less data and yet still see usefully accurate data in Honeycomb. Sample rate correction applies to SUM and percentile aggregations as well.

By setting the SampleRate attribute, your sampling techniques can be as simple or sophisticated as you need, and Honeycomb will do the rest. If you’re using Refinery, this is done automatically for its dynamic samplers.

COUNT_DISTINCT and Sampled Data 

Query Builder’s COUNT_DISTINCT operator does not compensate for sampling rate, so use it with care when working with sampled data.

COUNT_DISTINCT estimates the count of distinct values in a field using the HyperLogLog algorithm and can only count values that are actually present in the data.

When using COUNT_DISTINCT in a query, you can view the average sample rate for the query. Locate it in the metadata below the result summary table with elapsed query time and rows examined fields. average sample rate displays the average sample rate across all underlying events included in the query result.

Query Sampled Data Without Correcting for Sample Rate 

Sometimes you may want to query sampled data without taking sample rate into account. For example, you want to see how many actual events your dynamic sampler sends so you can adjust your sampling strategy. Or you may need to debug issues with the sampled data.

For these cases, you can use Usage Mode. Usage Mode provides a query builder that evaluates queries in an unweighted mode that does not correct for sample rate. To access Usage Mode:

  1. Select Usage in the left navigation of the Honeycomb UI
  2. Under Per-environment Breakdown, select Usage Mode for the environment you want to query Calculations in this mode don’t correct for sample rates

In Usage Mode, you have access to the Sample Rate field, and COUNT (and all other calculation operations) are unweighted.