Specify Sampling Methods in Honeycomb Refinery

Update the fields in rules.yaml to specify sampling methods for your data. The default configuration at installation contains the minimum configuration needed to run Refinery.

After setting up or modifying sampling rules, we recommend validating your configuration and doing a Dry Run before dropping your traffic.

Complete your Refinery set-up after configuring rules.yaml by customizing your Refinery configuration in config.yaml.

Tip
This information applies to Refinery version 2.0 and later. If seeking information on Refinery version 1.x, refer to our GitHub repo documentation on config and rules. We recommend using our Configuration Conversion Tool to migrate to Refinery 2.0 and later.

Default Configuration 

The default Refinery configuration uses a hardcoded peer list for file-based peer management. It uses the DeterministicSampler Sampling Method and a SampleRate of 1, meaning that no traffic will be dropped. In the Refinery GitHub repository, a minimal default rules file exists.

Check out our Sampling Example, or our Refinery GitHub repository for an example rules file. To see the full set of available options, refer below to the Refinery Rules File.

Quick Start 

Configure sampling methods and rules in rules.yaml:

  1. Include the required __default__ section to handle scenarios not defined by your sampling rules
  2. Define one or more sampling rules, each with a defined Honeycomb Environment and a Sampler

After setting up your sampling rules, we recommend validating your configuration and doing a Dry Run before fully sampling your traffic with Refinery.

Required Default Section 

rules.yaml must contain a __default__ section under the Samplers heading.

For installations that expect most events to be matched to one of the primary rules, choose a default rule containing a DeterministicSampler and a SampleRate of 1, meaning that no unusual traffic will be dropped.

For installations expecting most traffic to be matched by the default rule, consider using an EMADynamicSampler or EMAThroughputSampler as the default, and then write rules to handle special cases.

Sampling Options 

Available sampling methods through samplers include:

EMADynamicSampler or EMAThroughputSampler are recommended for most Refinery use cases.

Concept: Dynamic Sampling 

Several of our Refinery sampling options use dynamic sampling as indicated by its name. Dynamic Sampling aims to achieve a target rate, weighting rare traffic and frequent traffic differently so as to end up with the correct average. Frequent traffic is sampled less often, while rarer events are kept or sampled more frequently. Use dynamic sampling to keep high-resolution data about unusual events while maintaining a representative sample of your application’s overall behavior.

To achieve this, configure Refinery to examine the trace for a specific set of fields. For example, if you specify request.status_code and request.method, then Refinery collects all the values found in those fields anywhere in the trace - for example, “200” and “GET” - together into a key that it hands to the dynsampler. The dynsampler code will look at the frequency that key appears during the previous time slice, and use that to hand back a desired sample rate. More frequent keys are kept less often, so that an even distribution of traffic across the keyspace is represented in Honeycomb.

By selecting fields well, you can drop significant amounts of traffic while still retaining good visibility into the areas of traffic that interest you. For example, if you want to make sure you have a complete list of all URL handlers invoked, you would add the URL (or a normalized form), as one of the fields to include.

Be careful in your selection, because if the combination of fields creates a unique key each time, you will not drop any traffic. Because of this, it is not effective to use fields that have unique values, like a UUID, as one of the sampling fields. Each field included should ideally have values that appear many times within any given 30 second window in order to generate a useful sample rate.

To see how this differs from random sampling in practice, consider a simple web service with the following characteristics: 90% of traffic is served correctly and returns a 200 response code. The remaining 10% of traffic is divided into a mix of 40x and 50x responses. If we sample events randomly, we can see these characteristics. We can do analysis of aggregates such as: what is the average duration of an event, breaking down on fields like status code, endpoint, customer_id, and so on. At a high level, we can still learn a lot about our data from a completely random sample. But what about those 50x errors? Typically, we would like to look at these errors in high resolution - they might all have different causes, or affect only a subset of customers. Discarding them at the same rate that we discard events describing healthy traffic is unfortunate - the errors are much more interesting! Here is where dynamic sampling can help.

Dynamic sampling will adjust the sample rate of traces and events based on their frequency. To achieve the target sample rate, it will drop more of the common events, while lowering the sample rate for less common events, all the way down to 1 and keeping unique events.

The details of all of the samplers and their configuration values are documented in the Refinery Rules documentation below.

Testing Your Sampling Rules 

Two method exist for testing your sampling rules: using Refinery’s Dry Run Mode to verify your rules, and using Usage Mode to check expected versus actual sampling rate.

Run Refinery in Dry Run Mode 

When getting started with Refinery or when updating sampling rules, it may be helpful to verify that the rules are working as expected before you start dropping traffic.

By enabling Dry Run Mode, all spans in each trace will be marked with the sampling decision in a field called refinery_kept. All traces will be sent to Honeycomb regardless of the sampling decision. You can then run queries in Honeycomb on this field to check your results and verify that the rules are working as intended. Enable dry run mode by adding DryRun = true in your config.yaml configuration. Refer to Dry Run documentation for more details.

When Dry Run Mode is enabled:

  • Refinery will set the meta.dryrun.sample_rate attribute on spans. This attribute allows you to inspect what the sample rate will be without sampling your data.
  • the metric trace_send_kept increments for each trace, and the metric for trace_send_dropped remains at 0, which reflects that all traces are being sent to Honeycomb.

Also, Refinery can send telemetry that includes information that can help debug the sampling decisions that are made. To enable this, set AddRuleReasonToTrace to true in your config.yaml file. Traces sent to Honeycomb will then include the field meta.refinery.reason. This field contains text that indicates the rule that caused the trace to be included.

Use Usage Mode in the Query Builder 

It may also be helpful to use the “Usage Mode” version of the Query Builder to assess your sampling strategy. Since calculations in this mode do not correct for sample rates, you can check how many actual events match each category for a dynamic sampler.

Sampling Example 

Here is an example of how we sample events from Honeycomb’s ingest service. Since this is a high volume service, we use the EMA Dynamic Sampler (EMADynamicSampler) with a target rate of 1/50 traces.

Here is what our rules.yaml file looks like:

RulesVersion: 2
Samplers:
    __default__:
        DeterministicSampler:
            SampleRate: 1

    IngestService:
        EMADynamicSampler:
            GoalSampleRate: 50
            AdjustmentInterval: 60s
            FieldList:
                - request.method
                - request.path
                - response.status_code

where:

  • The required default section (__default__)

    • applies to all data not applicable to the IngestService conditions
    • uses a Deterministic Sampler (DeterministicSampler)
    • keeps all applicable traffic with a SampleRate of 1
  • The IngestService section:

    • uses a EMA Dynamic Sampler (EMADynamicSampler)
    • has a goal sample rate (GoalSampleRate) of 50, which keeps 1 out of every 50 traces seen. This rate is used by the EMA Dynamic Sampler, which assigns a sample rate for each trace based on the sampling key generated by the fields in FieldList.
    • has an AdjustmentInterval of 60, so the EMA Dynamic Sampler recalculates its internal counters every 60 seconds. While AdjustmentInterval’s default value is 15 seconds, we increased this value to 60 seconds, as it is not necessary to evaluate changes more often.
    • has a FieldList selection of response.status_code in addition to the HTTP endpoint (represented here by request.method and request.path), because it allows us to clearly see when there is failing traffic to any endpoint. A useful FieldList selection has consistent values for high frequency boring traffic and unique values for outliers and interesting traffic.

Read more about the configuration options for the EMA Dynamic Sampler.

Refinery Rules file 

The Refinery rules file is a YAML file.

Example 

Below is a simple example of a rules file. For a complete example, visit the Refinery GitHub repository.

RulesVersion: 2
Samplers:
    __default__:
        DeterministicSampler:
            SampleRate: 1
    production:
        DynamicSampler:
            SampleRate: 2
            ClearFrequency: 30s
            FieldList:
                - request.method
                - http.target
                - response.status_code

where:

RulesVersion is a required parameter used to verify the version of the rules file. It must be set to 2.

Samplers maps targets to sampler configurations. Each target is a Honeycomb environment (or a dataset for Honeycomb Classic keys). The value is the sampler to use for that target. A __default__ target is required. The target called __default__ will be used for any target that is not explicitly listed.

The targets are determined by examining the API key used to send the trace. If the API key is a Honeycomb Classic key with a 32-character hexadecimal value, then the specified dataset name is used as the target. If the API key is a key with 20-23 alphanumeric characters, then the key’s environment name is used as the target.

The remainder of this page describes the samplers that can be used within the Samplers section and the fields that control their behavior.

Deterministic Sampler 

The Deterministic Sampler (DeterministicSampler) uses a fixed sample rate to sample traces based on their trace ID. This is the simplest sampling algorithm - it is a static sample rate, choosing traces randomly to either keep or send (at the appropriate rate). It is not influenced by the contents of the trace other than the trace ID.

SampleRate 

The sample rate to use. It indicates a ratio, where one sample trace is kept for every N traces seen. For example, a SampleRate of 30 will keep 1 out of every 30 traces. The choice on whether to keep any specific trace is random, so the rate is approximate. The sample rate is calculated from the trace ID, so all spans with the same trace ID will be sampled or not sampled together. A SampleRate of 1 or less will keep all traces. Specifying this value is required.

  • Type: int

Dynamic Sampler 

The Dynamic Sampler (DynamicSampler) is the basic Dynamic Sampler implementation. Most installations will find the EMA Dynamic Sampler to be a better choice. This sampler collects the values of a number of fields from a trace and uses them to form a key. This key is handed to the standard dynamic sampler algorithm, which generates a sample rate based on the frequency with which that key has appeared during the previous ClearFrequency. See https://github.com/honeycombio/dynsampler-go for more detail on the mechanics of the Dynamic Sampler. This sampler uses the AvgSampleRate algorithm from that package.

SampleRate 

The sample rate to use. It indicates a ratio, where one sample trace is kept for every N traces seen. For example, a SampleRate of 30 will keep 1 out of every 30 traces. The choice on whether to keep any specific trace is random, so the rate is approximate. The sample rate is calculated from the trace ID, so all spans with the same trace ID will be sampled or not sampled together. A SampleRate of 1 or less will keep all traces. Specifying this value is required.

  • Type: int

ClearFrequency 

The duration after which the Dynamic Sampler should reset its internal counters. It should be specified as a duration string. For example, “30s” or “1m”. Defaults to “30s”.

  • Type: duration

FieldList 

A list of all the field names to use to form the key that will be handed to the Dynamic Sampler. The combination of values from all of these fields should reflect how interesting the trace is compared to another. When choosing field names for FieldList, a good field selection has consistent values for high-frequency, boring traffic, and unique values for outliers and interesting traffic. Including an error field, or something like HTTP status code, is an excellent choice. Using fields with very high cardinality, like k8s.pod.id, is a bad choice. If the combination of fields essentially makes each trace unique, then the Dynamic Sampler will sample everything. If the combination of fields is not unique enough, then you will not be guaranteed samples of the most interesting traces. As an example, consider as a good set of fields: the combination of HTTP endpoint (high-frequency and boring), HTTP method, and status code (normally boring but can become interesting when indicating an error) since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint. In contrast, for example, consider as a bad set of fields: a combination of HTTP endpoint, status code, and pod id, since it would result in keys that are all unique, and therefore result in sampling 100% of traces. For example, rather than a set of fields, using only the HTTP endpoint field is a bad choice, as it is not unique enough, and therefore interesting traces, like traces that experienced a 500, might not be sampled. Field names may come from any span in the trace; if they occur on multiple spans, then all unique values will be included in the key.

  • Type: stringarray

MaxKeys 

Limits the number of distinct keys tracked by the sampler. Once MaxKeys is reached, new keys will not be included in the sample rate map, but existing keys will continue to be be counted. Use this field to keep the sample rate map size under control. Defaults to 500; Dynamic Samplers will rarely achieve their sampling goals with more keys than this.

  • Type: int

UseTraceLength 

Indicates whether to include the trace length (number of spans in the trace) as part of the key. The number of spans is exact, so if there are normally small variations in trace length, we recommend setting this field to false (the default). If your traces are consistent lengths and changes in trace length is a useful indicator to view in Honeycomb, then set this field to true.

  • Type: bool

EMA Dynamic Sampler 

The Exponential Moving Average (EMA) Dynamic Sampler (EMADynamicSampler) attempts to average a given sample rate, weighting rare traffic and frequent traffic differently so as to end up with the correct average. EMADynamicSampler is an improvement upon the simple DynamicSampler and is recommended for many use cases. Based on the DynamicSampler, EMADynamicSampler differs in that rather than compute rate based on a periodic sample of traffic, it maintains an Exponential Moving Average of counts seen per key, and adjusts this average at regular intervals. The weight applied to more recent intervals is defined by weight, a number between (0, 1). Larger values weight the average more toward recent observations. In other words, a larger weight will cause sample rates more quickly adapt to traffic patterns, while a smaller weight will result in sample rates that are less sensitive to bursts or drops in traffic and thus more consistent over time. Keys that are not already present in the EMA will always have a sample rate of 1. Keys that occur more frequently will be sampled on a logarithmic curve. Every key will be represented at least once in any given window and more frequent keys will have their sample rate increased proportionally to trend towards the goal sample rate.

GoalSampleRate 

The sample rate to use. It indicates a ratio, where one sample trace is kept for every N traces seen. For example, a SampleRate of 30 will keep 1 out of every 30 traces. The choice on whether to keep any specific trace is random, so the rate is approximate. The sample rate is calculated from the trace ID, so all spans with the same trace ID will be sampled or not sampled together. A SampleRate of 1 or less will keep all traces. Specifying this value is required.

  • Type: int

AdjustmentInterval 

The duration after which the EMA Dynamic Sampler should recalculate its internal counters. It should be specified as a duration string. For example, 30s or 1m. Defaults to 15s.

  • Type: duration

Weight 

The weight to use when calculating the EMA. It should be a number between 0 and 1. Larger values weight the average more toward recent observations. In other words, a larger weight will cause sample rates more quickly adapt to traffic patterns, while a smaller weight will result in sample rates that are less sensitive to bursts or drops in traffic and thus more consistent over time. The default value is 0.5.

  • Type: float

AgeOutValue 

Indicates the threshold for removing keys from the EMA. The EMA of any key will approach 0 if it is not repeatedly observed, but will never truly reach it, so this field determines what constitutes “zero”. Keys with averages below this threshold will be removed from the EMA. Default is the value of Weight, as this prevents a key with the smallest integer value (1) from being aged out immediately. This value should generally be less than (<=) Weight, unless you have very specific reasons to set it higher.

  • Type: float

BurstMultiple 

If set, then this value is multiplied by the sum of the running average of counts to dynamically define the burst detection threshold. If total counts observed for a given interval exceed this threshold, then EMA is updated immediately, rather than waiting on the AdjustmentInterval. Defaults to 2; a negative value disables. With the default of 2, if your traffic suddenly doubles, then burst detection will kick in.

  • Type: float

BurstDetectionDelay 

Indicates the number of intervals to run before burst detection kicks in. Defaults to 3.

  • Type: int

FieldList 

A list of all the field names to use to form the key that will be handed to the Dynamic Sampler. The combination of values from all of these fields should reflect how interesting the trace is compared to another. When choosing field names for FieldList, a good field selection has consistent values for high-frequency, boring traffic, and unique values for outliers and interesting traffic. Including an error field, or something like HTTP status code, is an excellent choice. Using fields with very high cardinality, like k8s.pod.id, is a bad choice. If the combination of fields essentially makes each trace unique, then the Dynamic Sampler will sample everything. If the combination of fields is not unique enough, then you will not be guaranteed samples of the most interesting traces. As an example, consider as a good set of fields: the combination of HTTP endpoint (high-frequency and boring), HTTP method, and status code (normally boring but can become interesting when indicating an error) since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint. In contrast, for example, consider as a bad set of fields: a combination of HTTP endpoint, status code, and pod id, since it would result in keys that are all unique, and therefore result in sampling 100% of traces. For example, rather than a set of fields, using only the HTTP endpoint field is a bad choice, as it is not unique enough, and therefore interesting traces, like traces that experienced a 500, might not be sampled. Field names may come from any span in the trace; if they occur on multiple spans, then all unique values will be included in the key.

  • Type: stringarray

MaxKeys 

Limits the number of distinct keys tracked by the sampler. Once MaxKeys is reached, new keys will not be included in the sample rate map, but existing keys will continue to be be counted. Use this field to keep the sample rate map size under control. Defaults to 500; Dynamic Samplers will rarely achieve their sampling goals with more keys than this.

  • Type: int

UseTraceLength 

Indicates whether to include the trace length (number of spans in the trace) as part of the key. The number of spans is exact, so if there are normally small variations in trace length, we recommend setting this field to false (the default). If your traces are consistent lengths and changes in trace length is a useful indicator to view in Honeycomb, then set this field to true.

  • Type: bool

EMA Throughput Sampler 

The Exponential Moving Average (EMA) Throughput Sampler (EMAThroughputSampler) attempts to achieve a given throughput – number of spans per second – weighting rare traffic and frequent traffic differently so as to end up with the correct rate. The EMAThroughputSampler is an improvement upon the Total Throughput Sampler and is recommended for most throughput-based use cases. Because it like the EMADynamicSampler, EMAThroughputSampler maintains an Exponential Moving Average of counts seen per key, and adjusts this average at regular intervals. The weight applied to more recent intervals is defined by weight, a number between (0, 1) - larger values weight the average more toward recent observations. In other words, a larger weight will cause sample rates more quickly adapt to traffic patterns, while a smaller weight will result in sample rates that are less sensitive to bursts or drops in traffic and thus more consistent over time. New keys that are not already present in the EMA will always have a sample rate of 1. Keys that occur more frequently will be sampled on a logarithmic curve. Every key will be represented at least once in any given window and more frequent keys will have their sample rate increased proportionally to trend towards the goal throughput.

GoalThroughputPerSec 

The desired throughput per second. This is the number of events per second you want to send to Honeycomb. The sampler will adjust sample rates to try to achieve this desired throughput. This value is calculated for the individual instance, not for the cluster; if your cluster has multiple instances, then you will need to divide your total desired sample rate by the number of instances to get this value.

  • Type: int

UseClusterSize 

Indicates whether to use the cluster size to calculate the goal throughput. If true, then the goal throughput will be divided by the number of instances in the cluster. If false (the default), then the goal throughput will be the value specified in GoalThroughputPerSec.

  • Type: bool

InitialSampleRate 

InitialSampleRate is the sample rate to use during startup, before the sampler has accumulated enough data to calculate a reasonable throughput. This is mainly useful in situations where unsampled throughput is high enough to cause problems. Default value is 10.

  • Type: int

AdjustmentInterval 

The duration after which the EMA Dynamic Sampler should recalculate its internal counters. It should be specified as a duration string. For example, 30s or 1m. Defaults to 15s.

  • Type: duration

Weight 

The weight to use when calculating the EMA. It should be a number between 0 and 1. Larger values weight the average more toward recent observations. In other words, a larger weight will cause sample rates more quickly adapt to traffic patterns, while a smaller weight will result in sample rates that are less sensitive to bursts or drops in traffic and thus more consistent over time. The default value is 0.5.

  • Type: float

AgeOutValue 

Indicates the threshold for removing keys from the EMA. The EMA of any key will approach 0 if it is not repeatedly observed, but will never truly reach it, so this field determines what constitutes “zero”. Keys with averages below this threshold will be removed from the EMA. Default is the value of Weight, as this prevents a key with the smallest integer value (1) from being aged out immediately. This value should generally be less than (<=) Weight, unless you have very specific reasons to set it higher.

  • Type: float

BurstMultiple 

If set, then this value is multiplied by the sum of the running average of counts to dynamically define the burst detection threshold. If total counts observed for a given interval exceed this threshold, then EMA is updated immediately, rather than waiting on the AdjustmentInterval. Defaults to 2; a negative value disables. With the default of 2, if your traffic suddenly doubles, then burst detection will kick in.

  • Type: float

BurstDetectionDelay 

Indicates the number of intervals to run before burst detection kicks in. Defaults to 3.

  • Type: int

FieldList 

A list of all the field names to use to form the key that will be handed to the Dynamic Sampler. The combination of values from all of these fields should reflect how interesting the trace is compared to another. When choosing field names for FieldList, a good field selection has consistent values for high-frequency, boring traffic, and unique values for outliers and interesting traffic. Including an error field, or something like HTTP status code, is an excellent choice. Using fields with very high cardinality, like k8s.pod.id, is a bad choice. If the combination of fields essentially makes each trace unique, then the Dynamic Sampler will sample everything. If the combination of fields is not unique enough, then you will not be guaranteed samples of the most interesting traces. As an example, consider as a good set of fields: the combination of HTTP endpoint (high-frequency and boring), HTTP method, and status code (normally boring but can become interesting when indicating an error) since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint. In contrast, for example, consider as a bad set of fields: a combination of HTTP endpoint, status code, and pod id, since it would result in keys that are all unique, and therefore result in sampling 100% of traces. For example, rather than a set of fields, using only the HTTP endpoint field is a bad choice, as it is not unique enough, and therefore interesting traces, like traces that experienced a 500, might not be sampled. Field names may come from any span in the trace; if they occur on multiple spans, then all unique values will be included in the key.

  • Type: stringarray

MaxKeys 

Limits the number of distinct keys tracked by the sampler. Once MaxKeys is reached, new keys will not be included in the sample rate map, but existing keys will continue to be be counted. Use this field to keep the sample rate map size under control. Defaults to 500; Dynamic Samplers will rarely achieve their sampling goals with more keys than this.

  • Type: int

UseTraceLength 

Indicates whether to include the trace length (number of spans in the trace) as part of the key. The number of spans is exact, so if there are normally small variations in trace length, we recommend setting this field to false (the default). If your traces are consistent lengths and changes in trace length is a useful indicator to view in Honeycomb, then set this field to true.

  • Type: bool

Windowed Throughput Sampler 

Windowed Throughput Sampler (WindowedThroughputSampler) is an enhanced version of total throughput sampling. Just like the TotalThroughput Sampler, WindowedThroughputSampler attempts to meet the goal of fixed number of events per second sent to Honeycomb. The original throughput sampler updates the sampling rate every “ClearFrequency” seconds. While this parameter is configurable, it suffers from the following tradeoff:

  • Decreasing it is more responsive to load spikes, but with the cost of making the sampling decision on less data.
  • Increasing it is less responsive to load spikes, but sample rates will be more stable because they are made with more data. The Windowed Throughput Sampler resolves this by introducing two different, tunable parameters:
    • UpdateFrequency: how often the sampling rate is recomputed
    • LookbackFrequency: how much total time is considered when recomputing sampling rate. A standard configuration would be to set UpdateFrequency to 1s and LookbackFrequency to 30s. In this configuration, for every second, we lookback at the last 30 seconds of data in order to compute the new sampling rate. The actual sampling rate computation is nearly identical to the original Throughput Sampler, but this variant has better support for floating point numbers and does a better job with less-common keys.

GoalThroughputPerSec 

The desired throughput per second. This is the number of events per second you want to send to Honeycomb. The sampler will adjust sample rates to try to achieve this desired throughput. This value is calculated for the individual instance, not for the cluster; if your cluster has multiple instances, then you will need to divide your total desired sample rate by the number of instances to get this value.

  • Type: int

UseClusterSize 

Indicates whether to use the cluster size to calculate the goal throughput. If true, then the goal throughput will be divided by the number of instances in the cluster. If false (the default), then the goal throughput will be the value specified in GoalThroughputPerSec.

  • Type: bool

UpdateFrequency 

The duration between sampling rate computations. It should be specified as a duration string. For example, 30s or 1m. Defaults to 1s.

  • Type: duration

LookbackFrequency 

This controls how far back in time to lookback to dynamically adjust the sampling rate. Default is 30 * UpdateFrequencyDuration. This field is forced to be an integer multiple of UpdateFrequencyDuration.

  • Type: duration

FieldList 

A list of all the field names to use to form the key that will be handed to the Dynamic Sampler. The combination of values from all of these fields should reflect how interesting the trace is compared to another. When choosing field names for FieldList, a good field selection has consistent values for high-frequency, boring traffic, and unique values for outliers and interesting traffic. Including an error field, or something like HTTP status code, is an excellent choice. Using fields with very high cardinality, like k8s.pod.id, is a bad choice. If the combination of fields essentially makes each trace unique, then the Dynamic Sampler will sample everything. If the combination of fields is not unique enough, then you will not be guaranteed samples of the most interesting traces. As an example, consider as a good set of fields: the combination of HTTP endpoint (high-frequency and boring), HTTP method, and status code (normally boring but can become interesting when indicating an error) since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint. In contrast, for example, consider as a bad set of fields: a combination of HTTP endpoint, status code, and pod id, since it would result in keys that are all unique, and therefore result in sampling 100% of traces. For example, rather than a set of fields, using only the HTTP endpoint field is a bad choice, as it is not unique enough, and therefore interesting traces, like traces that experienced a 500, might not be sampled. Field names may come from any span in the trace; if they occur on multiple spans, then all unique values will be included in the key.

  • Type: stringarray

MaxKeys 

Limits the number of distinct keys tracked by the sampler. Once MaxKeys is reached, new keys will not be included in the sample rate map, but existing keys will continue to be be counted. Use this field to keep the sample rate map size under control. Defaults to 500; Dynamic Samplers will rarely achieve their sampling goals with more keys than this.

  • Type: int

UseTraceLength 

Indicates whether to include the trace length (number of spans in the trace) as part of the key. The number of spans is exact, so if there are normally small variations in trace length, we recommend setting this field to false (the default). If your traces are consistent lengths and changes in trace length is a useful indicator to view in Honeycomb, then set this field to true.

  • Type: bool

Rules-based Sampler 

The Rules-based sampler allows you to specify a set of rules that will determine whether a trace should be sampled or not. Rules are evaluated in order, and the first rule that matches will be used to determine the sample rate. If no rules match, then the SampleRate defaults to 1 and all traces will be kept. Rules-based samplers will usually be configured to have the last rule be a default rule with no conditions that uses a downstream Dynamic Sampler to keep overall sample rate under control.

Rules 

Rules is a list of rules to use to determine the sample rate.

  • Type: objectarray

CheckNestedFields 

Indicates whether to expand nested JSON when evaluating rules. If false (the default), nested JSON will be treated as a string. If true, nested JSON will be expanded into a map[string]interface{} and the value of the field will be the value of the nested field. For example, if you have a field called http.request.headers and you want to check the value of the User-Agent header, then you would set this to true and use http.request.headers.User-Agent as the field name in your rule. This is a computationally expensive option and may cause performance problems if you have a large number of spans with nested JSON.

  • Type: bool

Rules for Rules-based Samplers 

Rules are evaluated in order, and the first rule that matches will be used to determine the sample rate. If no rules match, then the SampleRate will be 1 and all traces will be kept. If a rule matches, one of three things happens, and they are evaluated in this order: a) if the rule specifies a downstream Sampler, that sampler is used to determine the sample rate; b) if the rule has the Drop flag set to true, the trace is dropped; c) the rule’s sample rate is used.

Name 

The name of the rule. This field is used for debugging and will appear in the trace metadata if AddRuleReasonToTrace is set to true.

  • Type: string

Sampler 

The sampler to use if the rule matches. If this is set, the sample rate will be determined by this downstream sampler. If this is not set, the sample rate will be determined by the Drop flag or the SampleRate field.

  • Type: object

Drop 

Indicates whether to drop the trace if it matches this rule. If true, then the trace will be dropped. If false, then the trace will be kept.

  • Type: bool

SampleRate 

If the rule is matched, there is no Sampler specified, and the Drop flag is false, then this is the sample rate to use.

  • Type: int

Conditions 

Conditions is a list of conditions to use to determine whether the rule matches. All conditions must be met for the rule to match. If there are no conditions, then the rule will always match. A no-condition rule is typically used for the last rule to provide a default behavior.

  • Type: objectarray

Scope 

Controls the scope of the rule evaluation. If set to trace (the default), then each condition can apply to any span in the trace independently. If set to span, then all of the conditions in the rule will be evaluated against each span in the trace and the rule only succeeds if all of the conditions match on a single span together.

  • Type: string

Conditions for the Rules in Rules-based Samplers 

Conditions are evaluated in order, and the first condition that does not match will cause the rule to not match. If all conditions match, then the rule will match. If there are no conditions, then the rule will always match.

Field 

The field to check. This can name any field in the trace. If the field is not present, then the condition will not match. The comparison is case-sensitive. The field can also include a prefix that changes the span used for evaluation of the field. The only prefix currently supported is root, as in root.http.status. Specifying root. causes the condition to be evaluated against the root span. For example, if the Field is root.url, then the condition will be processed using the url field from the root span. The setting Scope: span for a rule does not change the meaning of this prefix – the condition is still evaluated on the root span and is treated as if it were part of the span being processed. When using the root. prefix on a field with a not-exists operator, include the has-root-span: true condition in the rule. The not-exists condition on a root.-prefixed field will evaluate to false if the existence of the root span is not checked and the root span does not exist. The primary reason a root span is not present on a trace when a sampling decision is being made is when the root span takes longer to complete than the configured TraceTimeout.

  • Type: string

Fields 

An array of field names to check. These can name any field in the trace. The fields are checked in the order defined here, and the first named field that contains a value will be used for the condition. Only the first populated field will be used, even if the condition fails. If a root. prefix is present on a field, but the root span is not on the trace, that field will be skipped. If none of the fields are present, then the condition will not match. The comparison is case-sensitive. All fields are checked as individual fields before any of them are checked as nested fields (see CheckNestedFields).

  • Type: stringarray

Operator 

The comparison operator to use. String comparisons are case-sensitive. For most cases, use negative operators (!=, does-not-contain, not-exists, and not-in) in a rule with a scope of “span”. WARNING: Rules can have Scope: trace or Scope: span. Using a negative operator with Scope: trace will cause the condition be true if any single span in the entire trace matches. Use Scope: span with negative operators.

  • Type: string
  • Options: =, !=, >, <, >=, <=, starts-with, contains, does-not-contain, exists, not-exists, has-root-span, matches, in, not-in

Value 

The value to compare against. If Datatype is not specified, then the value and the field will be compared based on the type of the field. The in and not-in operators can accept a list of values, which should all be of the same datatype.

  • Type: sliceorscalar

Datatype 

The datatype to use when comparing the value and the field. If Datatype is specified, then both values will be converted (best-effort) to that type and then compared. Errors in conversion will result in the comparison evaluating to false. This is especially useful when a field like http status code may be rendered as strings by some environments and as numbers or booleans by others. The best practice is to always specify Datatype; this avoids ambiguity, allows for more accurate comparisons, and offers a minor performance improvement.

  • Type: string

Total Throughput Sampler 

Total Throughput Sampler (TotalThroughputSampler) attempts to meet a goal of a fixed number of events per second sent to Honeycomb. This sampler is deprecated and present mainly for compatibility. Consider using either EMAThroughputSampler or WindowedThroughputSampler instead. If your key space is sharded across different servers, then this is a good method for making sure each server sends roughly the same volume of content to Honeycomb. It performs poorly when the active keyspace is very large. GoalThroughputPerSec * ClearFrequency defines the upper limit of the number of keys that can be reported and stay under the goal, but with that many keys, you’ll only get one event per key per ClearFrequencySec, which is very coarse. Aim for at least 1 event per key per sec to 1 event per key per 10sec to get reasonable data. In other words, the number of active keys should be less than 10 * GoalThroughputPerSec.

GoalThroughputPerSec 

The desired throughput per second of events sent to Honeycomb. This is the number of events per second you want to send. This is not the same as the Sample Rate.

  • Type: int

UseClusterSize 

Indicates whether to use the cluster size to calculate the goal throughput. If true, then the goal throughput will be divided by the number of instances in the cluster. If false (the default), then the goal throughput will be the value specified in GoalThroughputPerSec.

  • Type: bool

ClearFrequency 

The duration after which the Dynamic Sampler should reset its internal counters. It should be specified as a duration string. For example, “30s” or “1m”. Defaults to “30s”.

  • Type: duration

FieldList 

A list of all the field names to use to form the key that will be handed to the Dynamic Sampler. The combination of values from all of these fields should reflect how interesting the trace is compared to another. When choosing field names for FieldList, a good field selection has consistent values for high-frequency, boring traffic, and unique values for outliers and interesting traffic. Including an error field, or something like HTTP status code, is an excellent choice. Using fields with very high cardinality, like k8s.pod.id, is a bad choice. If the combination of fields essentially makes each trace unique, then the Dynamic Sampler will sample everything. If the combination of fields is not unique enough, then you will not be guaranteed samples of the most interesting traces. As an example, consider as a good set of fields: the combination of HTTP endpoint (high-frequency and boring), HTTP method, and status code (normally boring but can become interesting when indicating an error) since it will allowing proper sampling of all endpoints under normal traffic and call out when there is failing traffic to any endpoint. In contrast, for example, consider as a bad set of fields: a combination of HTTP endpoint, status code, and pod id, since it would result in keys that are all unique, and therefore result in sampling 100% of traces. For example, rather than a set of fields, using only the HTTP endpoint field is a bad choice, as it is not unique enough, and therefore interesting traces, like traces that experienced a 500, might not be sampled. Field names may come from any span in the trace; if they occur on multiple spans, then all unique values will be included in the key.

  • Type: stringarray

MaxKeys 

Limits the number of distinct keys tracked by the sampler. Once MaxKeys is reached, new keys will not be included in the sample rate map, but existing keys will continue to be be counted. Use this field to keep the sample rate map size under control. Defaults to 500; Dynamic Samplers will rarely achieve their sampling goals with more keys than this.

  • Type: int

UseTraceLength 

Indicates whether to include the trace length (number of spans in the trace) as part of the key. The number of spans is exact, so if there are normally small variations in trace length, we recommend setting this field to false (the default). If your traces are consistent lengths and changes in trace length is a useful indicator to view in Honeycomb, then set this field to true.

  • Type: bool