Sampling is when you select a few representative elements from a larger collection and extrapolate from the selected elements to learn about the larger collection. Head sampling and tail sampling are two different approaches to sampling your telemetry data.
Data chosen by a sampler as representative of a data set is sampled data. Sampled data is processed and exported to Honeycomb. Unsampled data is not processed or exported.
Sampling is crucial to observability at scale. You might sample your telemetry data to reduce your total data volume or filter out noise from services with predictable traffic.
Consider these different kinds of traces:
Most of your traces are probably the first kind: traces that finish successfully with no errors. These traces represent healthy behavior of your services and are required for comparisons with other kinds of traces. But you don’t need all of them, a sample of these traces is enough to understand the health of your system.
The other kinds of traces are much more interesting. You might take larger samples, or even 100%, of these traces.
Head sampling is when you sample traces without looking at the entire trace. The decision to sample or not sample a span in a trace is made as early as possible. In OpenTelemetry, a head sampling decision is made during span creation: unsampled spans are not created.
The most common form of head sampling is deterministic probability sampling. Given a constant sampling rate that represents a fixed percentage of traces to sample, the sampler will make a decision to sample or not sample spans based on using the trace ID as a random number. Using the trace ID allows disparate samplers to make consistent decisions for all of the spans in a trace.
See our guidelines on when you should consider head sampling.
The OpenTelemetry SDKs support deterministic probability sampling:
Tail sampling is where the sampling decision considers all or most of the spans within the trace. Because tail sampling is done by inspecting whole traces, you can apply many different sampling techniques such as:
http.status_code
will sample much less traffic for requests that return 200
than for requests that return 404
.Honeycomb offers Refinery as a tail sampling solution to install in your environment. Tail sampling with Refinery lets you combine all of the above techniques to create a sampling strategy tailored to your needs.
See our guidelines on when you should consider tail sampling.