Supported Sampling Methods | Honeycomb

Supported Sampling Methods

If using the dataset-only data model, refer to the Honeycomb Classic tab for instructions. Not sure? Learn more about Honeycomb versus Honeycomb Classic.

In rules.toml, you can specify different sampling methods and specify options for each.

You can specify sampling methods in a hierarchical fashion:

  • Methods for all environments
  • Different methods for different environments
  • Options and rules for specific services within environments

It is not possible to specify different types of samplers for different services within the same environment. This would imply sub-trace sampling, which Refinery does not support.

The default configuration uses the DeterministicSampler and a SampleRate of 1, meaning that no traffic will be dropped. These configurations are set through the root-level Sampler and SampleRate fields in the rules configuration. Sampler applies to all environments that do not specify their own Sampler. SampleRate applies to all environments that use an applicable Sampler type and do not specify their own SampleRate. To avoid issues, we recommend that every installation specify the environment-specific Sampler and (if applicable) SampleRate fields.

Sampling methods and rules are configured in rules.toml. See GitHub for an example rules file.

After setting up or modifying sampling rules for your dataset(s), we recommend validating your configuration and doing a Dry Run before dropping your traffic.

In rules.toml, you can specify different sampling methods and specify options for each. You can use the same sampling method and rate for all your datasets, or you can define specific sampling strategies and rules for each dataset.

The default configuration uses the DeterministicSampler and a SampleRate of 1, meaning that no traffic will be dropped. These configurations are set through the root-level Sampler and SampleRate fields in the rules configuration. Sampler applies to all datasets that do not specify their own Sampler. SampleRate applies to all datasets that use an applicable Sampler type and do not specify their own SampleRate. To avoid issues, we recommend that every installation specify the environment-specific Sampler and (if applicable) SampleRate fields.

Sampling methods and rules are configured in rules.toml. See GitHub for an example rules file.

After setting up or modifying sampling rules for your dataset(s), we recommend validating your configuration and doing a Dry Run before dropping your traffic.

Sampling Example 

Here is an example of how we sample events from Honeycomb’s ingest service. Since this is a high volume service, we have chosen to use the EMA Dynamic Sampler with a target rate of 1/50 traces.

Here is what our rules.toml file looks like:

If using the dataset-only data model, refer to the Honeycomb Classic tab for instructions. Not sure? Learn more about Honeycomb versus Honeycomb Classic.

# The name of the environment being sampled.
# Sampling decisions are applied to every dataset within this environment.
[prod]
    Sampler = "EMADynamicSampler"
    GoalSampleRate = 50
    FieldList = ["request.method","request.path","response.status_code"]
    UseTraceLength = false
    AddSampleRateKeyToTrace = true
    AddSampleRateKeyToTraceField = "meta.refinery.dynsampler_key"
    AdjustmentInterval = 60
    MaxKeys = 10000
    Weight = 0.5

It is also possible to define rules that apply only to a single dataset within an environment.

However, it is not possible to define different sampling decisions for different datasets within the same environment. This would imply sub-trace sampling, which Refinery does not support.

See Rule-Based Sampling Configuration for an example.

[IngestService] # the name of the dataset we are sampling
    Sampler = "EMADynamicSampler"
    GoalSampleRate = 50
    FieldList = ["request.method","request.path","response.status_code"]
    UseTraceLength = false
    AddSampleRateKeyToTrace = true
    AddSampleRateKeyToTraceField = "meta.refinery.dynsampler_key"
    AdjustmentInterval = 60
    MaxKeys = 10000
    Weight = 0.5

The most important fields in this example are GoalSampleRate and FieldList. Our goal sample rate aims to keep 1 out of every 50 traces seen. This rate is used by the EMA Dynamic Sampler, which assigns a sample rate for each trace based on the sampling key generated by the fields in FieldList. A useful FieldList selection will therefore have consistent values for high frequency boring traffic and unique values for outliers and interesting traffic. For example, we have included response.status_code in the field list in addition to the http endpoint (represented here by request.method and request.path), because it allows us to clearly see when there is failing traffic to any endpoint.

We have chosen not to UseTraceLength, which adds the number of spans in the trace to the sampling key. For our ingest service, trace length is not a useful indicator of which types of events we would like to see sampled.

The AddSampleRateKeyToTrace configuration fields we have enabled are convenience fields to help us understand why the sampler made specific decisions. Examining these fields in your data in Honeycomb may help you decide which fields to add to your FieldList configuration option going forward.

The AdjustmentInterval field defaults to 15 seconds, and determines how often the moving average used by the sampler is adjusted. We have chosen to increase this value to 60 seconds, as it is not necessary for us to evaluate changes more often.

By setting MaxKeys, we have chosen to limit the number of distinct keys tracked by the EMA Dynamic Sampler. We use this field to keep the sample rate map size from spiraling out of control.

Read more about all the configuration options for the EMA Dynamic Sampler.

Sampling Types 

The options available for sampling methods include DeterministicSampler, DynamicSampler, EMADynamicSampler, RulesBasedSampler, and TotalThroughputSampler.

EMADynamicSampler is recommended for most use cases.

Dynamic Sampling 

This strategy aims for the target sample rate, weighting rare traffic and frequent traffic differently so as to end up with the correct average. Frequent traffic is sampled at a higher rate, while rarer events are kept or sampled at a lower rate. Use this strategy to keep high-resolution data about unusual events while maintaining a representative sample of your application’s overall behavior.

Briefly described, you configure Refinery to examine the trace for a set of fields. For example, request.status_code and request.method. It collects all the values found in those fields anywhere in the trace - for example, “200” and “GET” - together into a key that it hands to the dynsampler. The dynsampler code will look at the frequency that key appears during the previous 30 seconds (or other value set by the ClearFrequencySec setting) and use that to hand back a desired sample rate. More frequent keys are sampled at a higher rate, so that an even distribution of traffic across the keyspace is represented in Honeycomb.

By selecting fields well, you can drop significant amounts of traffic while still retaining good visibility into the areas of traffic that interest you. For example, if you want to make sure you have a complete list of all URL handlers invoked, you would add the URL, or a normalized form, as one of the fields to include. Be careful in your selection though, because if the combination of fields creates a unique key each time, you will not drop any traffic. Because of this, it is not effective to use fields that have unique values, like a UUID, as one of the sampling fields. Each field included should ideally have values that appear many times within any given 30 second window in order to effectively turn in to a sample rate.

To see how this differs from random sampling in practice, consider a simple web service with the following characteristics: 90% of traffic is served correctly and returns a 200 response code. The remaining 10% of traffic is divided into a mix of 40x and 50x responses. If we sample events randomly, we can see these characteristics. We can do analysis of aggregates such as: what is the average duration of an event, breaking down on fields like status code, endpoint, customer_id, and so on. At a high level, we can still learn a lot about our data from a completely random sample. But what about those 50x errors? Typically, we would like to look at these errors in high resolution - they might all have different causes, or affect only a subset of customers. Discarding them at the same rate that we discard events describing healthy traffic is unfortunate - the errors are much more interesting! Here is where dynamic sampling can help.

Dynamic sampling will adjust the sample rate of traces and events based on their frequency. To achieve the target sample rate, it will increase sampling on common events, while lowering the sample rate for less common events, all the way down to 1 and keeping unique events.

Dynamic Sampler Configuration 

The dynamic sampler configuration has the following fields:

SampleRate
The goal rate at which to sample. It indicates a ratio, where one sample trace is kept for every n traces seen. For example, a SampleRate of 30 will keep 1 out of every 30 traces. This rate is handed to the dynamic sampler, which assigns a sample rate for each trace based on the fields selected from that trace. Eligible for live reload.
ClearFrequencySec
Determines the period over which the sample rate is calculated. This setting defaults to 30. Eligible for live reload.
FieldList
A list of all the field names to use to form the key that will be handed to the dynamic sampler. The combination of values from all of these fields should reflect how interesting the trace is compared to another. A good field selection has consistent values for high-frequency, boring traffic, and unique values for outliers and interesting traffic. Including an error field (or something like HTTP status code) is an excellent choice. Using fields with very high cardinality (like k8s.pod.id), is a bad choice. If the combination of fields essentially makes them unique, the dynamic sampler will sample everything. If the combination of fields is not unique enough, you will not be guaranteed samples of the most interesting traces. As an example, consider a combination of HTTP endpoint (high-frequency and boring), HTTP method, and status code (normally boring but can become interesting when indicating an error) as a good set of fields since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint. The configuration for this would look something like FieldList = ["request.method", "http.target", "response.status_code"]. For example, in contrast, consider a combination of HTTP endpoint, status code, and pod id as a bad set of fields, since it would result in keys that are all unique, and therefore results in sampling 100% of traces. Using only the HTTP endpoint field would be a bad choice, as it is not unique enough and therefore interesting traces, like traces that experienced a 500, might not be sampled. Field names may come from any span in the trace. Eligible for live reload.
UseTraceLength
When set to true, this field adds the number of spans in the trace in to the dynamic sampler as part of the key. The number of spans is exact, so if there are normally small variations in trace length you may want to leave this off. If traces are consistent lengths and changes in trace length is a useful indicator of traces you would like to see in Honeycomb, set this to true. Eligible for live reload.
AddSampleRateKeyToTrace
When set to true, the sampler will add a field to the root span of the trace containing the key used by the sampler to decide the sample rate. This can be helpful in understanding why the sampler is making certain decisions about sample rate and help you understand how to better choose the sample rate key, also known as the FieldList setting above, to use.
AddSampleRateKeyToTraceField
The name of the field that the sampler will use when adding the sample rate key to the trace. This setting is only used when AddSampleRateKeyToTrace is set to true.

EMA Dynamic Sampling 

The Exponential Moving Average (EMA) Dynamic Sampler is an improvement upon DynamicSampler and is recommended for most use cases. Based on the DynamicSampler implementation, EMADynamicSampler differs in that rather than compute rate based on a periodic sample of traffic, it maintains an Exponential Moving Average of counts seen per key, and adjusts this average at regular intervals. The weight applied to more recent intervals is defined by weight, as a number between 0 and 1. Larger values weight the average more toward recent observations. In other words, a larger weight will cause sample rates more quickly adapt to traffic patterns, while a smaller weight will result in sample rates that are less sensitive to bursts or drops in traffic and thus more consistent over time.

Keys that are not found in the Exponential Moving Average will always have a sample rate of 1. Keys that occur more frequently will be sampled on a logarithmic curve. In other words, every key will be represented at least once in any given window. More frequent keys will have their sample rate increased proportionally to wind up with the goal sample rate.

EMADynamicSampler Configuration 

The EMADynamicSampler configuration has the following fields:

GoalSampleRate
The goal rate at which to sample. It indicates a ratio, where one sample trace is kept for every n traces seen. For example, a GoalSampleRate of 30 will keep 1 out of every 30 traces. This rate is handed to the dynamic sampler, which assigns a sample rate for each trace based on the fields selected from that trace. Eligible for live reload.
FieldList
A list of all the field names to use to form the key that will be handed to the dynamic sampler. The combination of values from all of these fields should reflect how interesting the trace is compared to another. A good field selection has consistent values for high-frequency, boring traffic, and unique values for outliers and interesting traffic. Including an error field (or something like HTTP status code) is an excellent choice. Using fields with very high cardinality (like k8s.pod.id), is a bad choice. If the combination of fields essentially makes them unique, the dynamic sampler will sample everything. If the combination of fields is not unique enough, you will not be guaranteed samples of the most interesting traces. As an example, consider a combination of HTTP endpoint (high-frequency and boring), HTTP method, and status code (normally boring but can become interesting when indicating an error) as a good set of fields since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint. The configuration for this would look something like FieldList = ["request.method", "http.target", "response.status_code"]. For example, in contrast, consider a combination of HTTP endpoint, status code, and pod id as a bad set of fields, since it would result in keys that are all unique, and therefore results in sampling 100% of traces. Using only the HTTP endpoint field would be a bad choice, as it is not unique enough and therefore interesting traces, like traces that experienced a 500, might not be sampled. Field names may come from any span in the trace. Eligible for live reload.
UseTraceLength
When set to true, this field adds the number of spans in the trace into the dynamic sampler as part of the key. The number of spans is exact. So if there are normally small variations in trace length, you may want to leave this field off or set to false. If traces are consistent lengths and changes in trace length is a useful indicator of traces that you would like to see in Honeycomb, set this to true. Eligible for live reload.
AddSampleRateKeyToTrace
When set to true, the sampler will add a field to the root span of the trace containing the key used by the sampler to decide the sample rate. This can be helpful in understanding why the sampler is making certain decisions about sample rate and help you understand how to better choose the sample rate key, also known as the FieldList setting above, to use.
AddSampleRateKeyToTraceField
The name of the field the sampler will use when adding the sample rate key to the trace. This setting is only used when AddSampleRateKeyToTrace is set to true.
AdjustmentInterval
Defines how often (in seconds) we adjust the moving average from recent observations. The default is 15s. Eligible for live reload.
Weight
a value between 0 and 1 indicating the weighting factor used to adjust the EMA. With larger values, newer data will influence the average more, and older values will be factored out more quickly. In mathematical literature concerning EMA, this is referred to as the alpha constant. The default is 0.5. Eligible for live reload.
MaxKeys
If set to a number greater than 0, this field limits the number of distinct keys tracked in EMA. Once MaxKeys is reached, new keys will not be included in the sample rate map, but existing keys will continue to be be counted. You can use this to keep the sample rate map size under control. Eligible for live reload.
AgeOutValue
Indicates the threshold for removing keys from the EMA. The EMA of any key will approach 0 if it is not repeatedly observed, but will never truly reach it. We have to decide what constitutes “zero” with the AgeOutValue field. Keys with averages below this threshold will be removed from the EMA. The default for this value is the same default as weight, since this prevents a key with the smallest integer value (1) from being aged out immediately. This value should generally be less than or equal to (<=) weight, unless you have very specific reasons to set it higher. Eligible for live reload.
BurstMultiple
If set, this field value is multiplied by the sum of the running average of counts to define the burst detection threshold. If total counts observed for a given interval exceed the threshold, EMA is updated immediately rather than waiting on the AdjustmentInterval. Using a negative value disables this field. With the default of 2, if your traffic suddenly doubles, burst detection will kick in. Eligible for live reload.
BurstDetectionDelay
Indicates the number of intervals to run after Start is called before burst detection kicks in. Defaults to 3. Eligible for live reload.

Rule-Based Sampling 

This strategy allows you to define sampling rates explicitly based on the contents of your traces. Using a filter language that is similar to what you see when running queries, you can define conditions on fields across all spans in your trace. For instance, if your root span has a status_code field, and the span wrapping your database call has an error field, you can define a condition that must be met on both fields, even though the two fields are technically separate events. You can supply a sample rate to use when a match is found, or optionally drop all events in that category. Some examples of rules you might want to specify:

  • Drop all traces for your load balancer’s health-check endpoint
  • Keep all traces where the status code was 50x (sample rate of 1)
  • Keep all traces where status code was 200 but database duration was greater than (>) 500ms
  • Keep all traces for a specific customer id while sampling the rest of your traffic at 1 in 100 traces

Rules are evaluated in order, and the first rule that matches is used. For this reason, define more specific rules at the top of the list of rules, and broader rules at the bottom. The conditions making up a rule are combined and must all evaluate to true for the rule to match. If no rules match, a configurable default sampling rate is applied.

Rule-Based Sampling Configuration 

If using the dataset-only data model, refer to the Honeycomb Classic tab for instructions. Not sure? Learn more about Honeycomb versus Honeycomb Classic.

Rules apply to all datasets within that environment. Here is an example that specifies several rules for different services in an environment.

# 'prod' is the name of the environment
[prod]

    Sampler = "RulesBasedSampler"

    # 'prod.rule' is how you specify an environment-wide rule
    # This drops all healthchecks across an environment.
    [[prod.rule]]
        name = "drop healthchecks"
        drop = true
        [[prod.rule.condition]]
            field = "http.route"
            operator = "="
            value = "/health-check"

    # This keeps all slow 500 errors across an environment.
    [[prod.rule]]
        name = "keep slow 500 errors"
        SampleRate = 1
        [[prod.rule.condition]]
            field = "status_code"
            operator = "="
            value = 500
        [[prod.rule.condition]]
            field = "duration_ms"
            operator = ">="
            value = 1000.789


    # This dynamically samples all 200 responses across an environment.
    [[prod.rule]]
        name = "dynamically sample 200 responses"
        [[prod.rule.condition]]
            field = "status_code"
            operator = "="
            value = 200
        [prod.rule.sampler.EMADynamicSampler]
            Sampler = "EMADynamicSampler"
            GoalSampleRate = 15
            FieldList = ["request.method", "request.route"]
            AddSampleRateKeyToTrace = true
            AddSampleRateKeyToTraceField = "meta.refinery.dynsampler_key"

    [[prod.rule]]
        SampleRate = 10 # default when no rules match, if missing defaults to 10

It is possible to define rules scoped only to a single service dataset within an environment. Here is an example:

# 'prod' is the name of the environment
[prod]

    Sampler = "RulesBasedSampler"

    # This rule applies to a single service using two rules, one to scope it to
    # a specific dataset, and another to drop traffic from a specific route.
    [[prod.rule]]
        name = "drop healthchecks"
        drop = true
        [[prod.rule.condition]]
            field = "http.route"
            operator = "="
            value = "/health-check"
        [[prod.rule.condition]]
            field = "service.name"
            operator = "="
            value = "/service1"

Here is an example of a series of rules defined for a specific dataset:

[dataset]

    Sampler = "RulesBasedSampler"

    [[dataset4.rule]]
        name = "drop healthchecks"
        drop = true
        [[dataset4.rule.condition]]
            field = "http.route"
            operator = "="
            value = "/health-check"

    [[dataset4.rule]]
        name = "keep slow 500 errors"
        SampleRate = 1
        [[dataset4.rule.condition]]
            field = "status_code"
            operator = "="
            value = 500
        [[dataset4.rule.condition]]
            field = "duration_ms"
            operator = ">="
            value = 1000.789

    [[dataset4.rule]]
        name = "dynamically sample 200 responses"
        [[dataset4.rule.condition]]
            field = "status_code"
            operator = "="
            value = 200
        [dataset4.rule.sampler.EMADynamicSampler]
            Sampler = "EMADynamicSampler"
            GoalSampleRate = 15
            FieldList = ["request.method", "request.route"]
            AddSampleRateKeyToTrace = true
            AddSampleRateKeyToTraceField = "meta.refinery.dynsampler_key"

    [[dataset4.rule]]
        SampleRate = 10 # default when no rules match, if missing defaults to 10

Each rule has an optional name field, a SampleRate or sampler, and may include one or more condition. Use SampleRate to apply a static sample rate to traces that qualify for the given rule. Use a secondary sampler to apply a dynamic sample rate to traces that qualify for the given rule.

The sampling rate is determined in the following order:

  1. Use a secondary sampler, if defined
  2. Use the SampleRate field, which must not be less than 1
  3. If drop = true is specified, then the trace will be omitted
  4. A default sample rate of 1

Each condition in a rule consists of the following:

  • the field within your spans that you would like to sample on
  • the value which you are comparing the field to
  • the operator which you are using to compare the field to the value
  • an optional datatype parameter that coerces the field to match a specified type

The datatype parameter is optional and must be one of the following:

  • "int"
  • "float"
  • "string"
  • "bool"

The datatype parameter is helpful to let a rule handle multiple fields that come in as different data types. For example, it can be common that an http.status_code field comes in as either a string or an integer from different systems. Instead of writing the same rule twice, you can write it once and use the datatype parameter to coerce the field to the same type.

Condition operators:

  • exists - does the field exist
  • not-exists - does the field not exist
  • != - is the value of the field not equal to the value in the rule
  • = - is the value of the field equivalent to the value in the rule
  • > - is the value of the field greater than the value in the rule
  • >= - is the value of the field greater than or equal to the value in the rule
  • < - is the value of the field less than the value in the rule
  • >= - is the value of the field less than or equal to the value in the rule
  • starts-with - returns true if the field starts with the string defined in the value
  • contains - returns true if the field contains the string defined in the value
  • does-not-contain - returns true if the field does not contain the string defined in the value

Notes about operators:

  • Numeric values are compared as integers if they are both integers, or as floating point if one or both values are floating point
  • Comparing values always fails when comparing different data types. For example, Boolean/Integer, String/Integer, Boolean/Numeric, and so on
    • As an example, some instrumentation might return the value of http.status_code as an Integer, and some might return the value as a String. To account for this, two rules would be required: one rule that compares String values and one rule that compares Integer values

Here are a few examples of how sampling decisions would be made according to the rules in the above configuration example:

  • If a trace had a span with a http.route field that was equal to /health-check, then that trace would be dropped.
  • If a trace had a span with a status_code field that was equal to 500 and another span with a duration_ms field less than 1000.789, then that trace would fall through to the last configured rule, and thus would be sampled at a rate of 1 out of 10.
  • If a trace had a span with a status_code field that was equal to 500 and another span with a duration_ms field greater than 1000.789, then it would match the second rule and would be kept, because that rule has a sampleRate of 1.
  • If a trace had a span with a status_code field of 200, then that trace would match the third rule and be delegated to the secondary EMADynamicSampler sampler to determine the sample rate.
  • If a trace had a span with a status_code field of 400, then that trace would fall through to the last configured rule, and thus would be sampled at a rate of 1 out of 10.

Rules comparisons take the datatype of the fields into account. In particular, a rule that compares status_code to 200 (an integer) will fail if the status code is actually "200" (a string), and vice-versa. If you are working in a mixed environment where either one may be included in the telemetry, you should create a separate rule for each case.

Using a Secondary Sampler 

A secondary sampler can be specified using the sampler option. You can leverage any DynamicSampler, EMADynamicSampler, or TotalThroughputSampler as a secondary sampler. You need to specify the desired sampler as part of the configuration option, then include configuration options for the desired sampler. All options for the desired secondary sampler will be available.

Using a secondary sampler enables the precision of rules based sampling to capture important events – for example: error or long requests – with the flexibility of dynamic sampling for higher volume traffic.

Throughput-Based Sampling 

This strategy attempts to meet a goal throughput rate of a fixed number of spans, not traces, per second per Refinery node. This strategy is most useful if you need to quickly get event volume under control, or if your traces are fairly uniform and a consistent volume of events is preferred. It performs poorly when the active keyspace is very large, so ideally the number of active keys should be be less than 10*GoalThroughputPerSec.

Sample rates are still calculated and set on the spans, but they are a function of the number of events seen for a key in a given window, as defined by ClearFrequencySec.

TotalThroughputSampler Configuration 

GoalThroughputPerSec
The goal rate of spans per second for this refinery instance. This rate is handed to the dynamic sampler which is then used to calculate a sample rate by dividing counted events for that key by the desired number of events. Defaults to 100, must be greater than 0. Eligible for live reload.
ClearFrequencySec
How often the rate counters are reset in seconds. Defaults to 30. Eligible for live reload.
FieldList
A list of the field names to use to form the key that will be handed to the dynamic sampler. The cardinality of the combination of values from all of these keys should be reasonable in the face of the frequency of those keys. Using too many fields to form your key can cause the sampler to struggle to meet your goal throughput rate. Eligible for live reload.
UseTraceLength
When set to true, this field adds the number of spans in the trace into the dynamic sampler as part of the key. The number of spans is exact. So if there are normally small variations in trace length, you may want to leave this off or set to false. If traces are consistent lengths and changes in trace length is a useful indicator of traces that you would like to see in Honeycomb, set this to true. Eligible for live reload.
AddSampleRateKeyToTrace
When set to true, the sampler will add a field to the root span of the trace containing the key used by the sampler to decide the sample rate. This field can be helpful in understanding why the sampler is making certain decisions about sample rate and help you understand how to better choose the sample rate key, also known as the FieldList setting above, to use.
AddSampleRateKeyToTraceField
The name of the field the sampler will use when adding the sample rate key to the trace. This setting is only used when AddSampleRateKeyToTrace is set to true.

TotalThroughputSampler Example Configuration 

If using the dataset-only data model, refer to the Honeycomb Classic tab for instructions. Not sure? Learn more about Honeycomb versus Honeycomb Classic.

Here is an example TotalThroughputSampler configuration applied to an environment:

[prod]

    Sampler = "TotalThroughputSampler"
    GoalThroughputPerSec = 500
    ClearFrequencySec = 30
    FieldList = ["http.status_code"]
    UseTraceLength = false
    AddSampleRateKeyToTrace = true
    AddSampleRateKeyToTraceField = "meta.refinery.dynsampler_key"

It is not possible to use this sampler with different configuration values for different datasets within the same environment.

Here is an example TotalThroughputSampler configuration for a given dataset:

[audit-service] # the name of the dataset we are sampling

    Sampler = "TotalThroughputSampler"
    GoalThroughputPerSec = 500
    ClearFrequencySec = 30
    FieldList = ["http.status_code"]
    UseTraceLength = false
    AddSampleRateKeyToTrace = true
    AddSampleRateKeyToTraceField = "meta.refinery.dynsampler_key"

Deterministic Sampler 

Deterministic Sampler is the simplest sampling method. It is a static sample rate, choosing traces randomly to either keep or send (at the appropriate rate). It is not influenced by the contents of the trace.

Deterministic Sampler Configuration 

For deterministic sampling, the only field to set is SampleRate in rules.toml. SampleRate indicates a ratio, where one sample trace is kept for every n traces seen. For example, a SampleRate of 30 will keep 1 out of every 30 traces. The choice on whether to keep any specific trace is random, so the rate is approximate. Eligible for live reload.

Validate Sampling Rules 

Run Refinery in Dry Run Mode 

When getting started with Refinery or when updating sampling rules, it may be helpful to verify that the rules are working as expected before you start dropping traffic. By enabling dry run mode, all spans in each trace will be marked with the sampling decision in a field called refinery_kept. All traces will be sent to Honeycomb regardless of the sampling decision. You can then run queries in Honeycomb on this field to check your results and verify that the rules are working as intended. Enable dry run mode by adding DryRun = true in your configuration, as noted in rules_complete.toml.

When dry run mode is enabled, Refinery will set the meta.dryrun.sample_rate attribute on spans. This attribute allows you to inspect what the sample rate will be without sampling your data.

When dry run mode is enabled, the metric trace_send_kept will increment for each trace, and the metric for trace_send_dropped will remain 0, reflecting that we are sending all traces to Honeycomb.

Refinery can send telemetry that includes information that can help debug the sampling decisions that are made. To enable it, in the config file, set AddRuleReasonToTrace to true. Traces sent to Honeycomb will then include the field meta.refinery.reason. This field contains text that indicates the rule that caused the trace to be included.

Use Usage Mode in the Query Builder 

It may also be helpful to use the “Usage Mode” version of the Query Builder to assess your sampling strategy. Since calculations in this mode do not correct for sample rates, you can check how many actual events match each category for a dynamic sampler.

Did you find what you were looking for?