When to Sample

Guidelines for why and when to sample your telemetry data.

Tip
The OpenTelemetry Sampling documentation has further guidance on when and why you should sample.

Why Sampling 

Some of the main reasons to sample data include:

  • Reduce total data volume. A representative sample of your data will be much smaller than the entire volume of data produced.
  • Ensure you sample interesting traces. The question of representativeness can be nuanced if you have a wide variety of traffic, especially if it is irregular.
  • Filter out noise. A small sample can capture the behavior of services with predictable traffic patterns.

When to Sample 

You should consider sampling if:

  • Your services generate 1000 or more traces per second
  • A lot of your trace data represents healthy traffic and is fairly uniform
  • You have conditions you can use to identify data that is relevant to you

If you have a lot of data, but it is fairly uniform or it is not critical you capture everything, then you can use a simple sampling strategy. If you have a lot of conditions that matter to you, or irregular traffic patterns across your services, then you will need a more sophisticated sampling strategy.

When to Use Head Sampling 

Head sampling is a blunt instrument. It is simple to configure and requires no additional infrastructure or operational overhead.

But what head sampling offers in simplicity, it loses in flexibility:

  • You cannot sample traces based on errors they contain or their overall latency
  • You cannot sample traces based on attributes on different spans in a trace
  • You cannot dynamically adjust your sampling rate based on traffic to a service

To accomplish the above, you need to use tail sampling instead.

When to Use Tail Sampling 

Tail sampling with Refinery lets you sample traces in just about any way you can imagine. How you configure tail sampling depends on your needs and the complexity of your system.

Most people tend to follow some common patterns:

  • Configure several rules to use a high or low sampling rate for well-known conditions, like keeping all errors in traces and dropping most health checks
  • Configure a dynamic sampler based on a low-cardinality key like http.status_code to sample traces proportionally across all values of that key

The rules and key configuration will often have to take into account attributes that are unique to your system.

The flexibility and sophistication of tail sampling comes at a price: it is more effort to configure and requires additional infrastructure and operational overhead to run. For extremely high-volume systems, you may also need to combine head sampling and tail sampling to protect your infrastructure from huge spikes of data.