How Honeycomb Adjusts for Sample Rate
When you sample your data with our sampling techniques, each span in a trace is given aSampleRate attribute that represents N when you only sample 1/N traces.
This allows Honeycomb to weight counts to compensate for the fact that you are sampling your data.
For example, you are doing head sampling at a 10% sampling rate, which means only 10% of traces are exported to Honeycomb:
| Trace ID | Sample Rate (on each span) | duration_ms |
|---|---|---|
| abcd1234 | 10 | 200 |
| 4321dcba | 10 | 1100 |
SampleRate attribute is set to 10 because you are sampling 10% of traces, or 1 in 10 traces.
With this information, Honeycomb can correct for sample rate and calculate accurate values for various aggregations:
COUNTof traces:(2 * 10) = 20AVG(duration_ms):((200 * 10) + (1100 * 10)) / (10 + 10) = 650
SUM and percentile aggregations as well.
By setting the SampleRate attribute, your sampling techniques can be as simple or sophisticated as you need, and Honeycomb will do the rest.
If you’re using Refinery, this is done automatically for its dynamic samplers.
COUNT_DISTINCT and Sampled Data
Query Builder’s COUNT_DISTINCT operator does not compensate for sampling rate, so use it with care when working with sampled data.
COUNT_DISTINCT estimates the count of distinct values in a field using the HyperLogLog algorithm and can only count values that are actually present in the data.
When using COUNT_DISTINCT in a query, you can view the average sample rate for the query.
Locate it in the metadata below the result summary table with elapsed query time and rows examined fields.
average sample rate displays the average sample rate across all underlying events included in the query result.
Query Sampled Data Without Correcting for Sample Rate
Sometimes you may want to query sampled data without taking sample rate into account. For example, you want to see how many actual events your dynamic sampler sends so you can adjust your sampling strategy. Or you may need to debug issues with the sampled data. For these cases, you can use Usage Mode. Usage Mode provides a query builder that evaluates queries in an unweighted mode that does not correct for sample rate. To access Usage Mode:- Select Usage in the left navigation of the Honeycomb UI
- Under Per-environment Breakdown, select Usage Mode for the environment you want to query Calculations in this mode don’t correct for sample rates
Sample Rate field, and COUNT (and all other calculation operations) are unweighted.