How Honeycomb handles sampled dataHoneycomb adjusts for sample rate when working with and querying sampled data.
Observability and Sampling
Sampling is crucial to observability at scale. You might sample your telemetry data to reduce your total data volume or filter out noise from services with predictable traffic. Consider these different kinds of traces:- Traces that finish successfully with no errors
- Traces with specific attributes on them
- Traces with high latency
- Traces with errors on them
Head Sampling
Head sampling is when you sample traces without looking at the entire trace. The decision to sample or not sample a span in a trace is made as early as possible. In OpenTelemetry, a head sampling decision is made during span creation: unsampled spans are not created. The most common form of head sampling is deterministic probability sampling. Given a constant sampling rate that represents a fixed percentage of traces to sample, the sampler will make a decision to sample or not sample spans based on using the trace ID as a random number. Using the trace ID allows disparate samplers to make consistent decisions for all of the spans in a trace. See our guidelines on when you should consider head sampling.OpenTelemetry SDK Support
The OpenTelemetry SDKs support deterministic probability sampling:Tail Sampling
Tail sampling is where the sampling decision considers all or most of the spans within the trace. Because tail sampling is done by inspecting whole traces, you can apply many different sampling techniques such as:- Dynamic sampling: By configuring a set of fields on a trace that make up a key, the sampler automatically increases or decreases the sampling rate based on how frequently each unique value of that key occurs.
For example, a key made up of
http.status_codewill sample much less traffic for requests that return200than for requests that return404. - Rules-based sampling: Define sampling rates for well-known conditions. For example, you can sample 100% of traces with an error and fall back to dynamic sampling for other traffic.
- Throughput-based sampling: Sample traces based on a fixed upper bound on the number of spans per second.
- Deterministic probability sampling - Although deterministic probability sampling is also used in head sampling, it is still possible to use it in tail sampling.