Learn how Honeycomb adjusts for sample rate when querying sampled data and considerations for working with sampled data.
When you sample your data with our sampling techniques, each span in a trace is given a SampleRate
attribute that represents N
when you only sample 1/N
traces.
This allows Honeycomb to weight counts to compensate for the fact that you are sampling your data.
For example, you are doing head sampling at a 10% sampling rate, which means only 10% of traces are exported to Honeycomb:
Trace ID | Sample Rate (on each span) | duration_ms |
---|---|---|
abcd1234 | 10 | 200 |
4321dcba | 10 | 1100 |
In this example, the SampleRate
attribute is set to 10
because you are sampling 10% of traces, or 1 in 10 traces.
With this information, Honeycomb can correct for sample rate and calculate accurate values for various aggregations:
COUNT
of traces: (2 * 10) = 20
AVG(duration_ms)
: ((200 * 10) + (1100 * 10)) / (10 + 10) = 650
This means you can send less data and yet still see usefully accurate data in Honeycomb.
Sample rate correction applies to SUM
and percentile aggregations as well.
By setting the SampleRate
attribute, your sampling techniques can be as simple or sophisticated as you need, and Honeycomb will do the rest.
If you’re using Refinery, this is done automatically for its dynamic samplers.
COUNT_DISTINCT
and Sampled Data Query Builder’s COUNT_DISTINCT
operator does not compensate for sampling rate, so use it with care when working with sampled data.
COUNT_DISTINCT
estimates the count of distinct values in a field using the HyperLogLog algorithm and can only count values that are actually present in the data.
When using COUNT_DISTINCT
in a query, you can view the average sample rate
for the query.
Locate it in the metadata below the result summary table with elapsed query time
and rows examined
fields.
average sample rate
displays the average sample rate across all underlying events included in the query result.
Sometimes you may want to query sampled data without taking sample rate into account. For example, you want to see how many actual events your dynamic sampler sends so you can adjust your sampling strategy. Or you may need to debug issues with the sampled data.
For these cases, you can use Usage Mode. Usage Mode provides a query builder that evaluates queries in an unweighted mode that does not correct for sample rate. To access Usage Mode:
In Usage Mode, you have access to the Sample Rate
field, and COUNT
(and all other calculation operations) are unweighted.