Update the fields in config.toml
to customize your configuration.
The default configuration at installation contains the minimum configuration needed to run Refinery.
Supported sampling methods and their configuration are set in rules.toml
.
When running Refinery within Docker, be sure to mount the directory containing configuration and rules files. This is because the configuration component, Viper, monitors the directory containing the files, not the files themselves.
The default Refinery configuration uses a hardcoded peer list for file-based peer management.
It uses the DeterministicSampler
Sampling Method and a SampleRate
of 1, meaning that no traffic will be dropped.
To see the full set of default fields, see GitHub for the full configuration file.
See GitHub for an example configuration file.
Use the fields below to customize your configuration file.
ListenAddr
nginx
in front to do the decryption.
Should be in 0.0.0.0:8080
form.
HTTP endpoints support both Honeycomb JSON and OpenTelemetry OTLP binary formatted data.GRPCListenAddr
nginx
in front to do the decryption. Should be in 0.0.0.0:9090
form.
Refinery can be configured to receive OpenTelemetry OTLP traffic over gRPC with GRPCListenAddr
.
If the environment variable REFINERY_GRPC_LISTEN_ADDRESS
is set, REFINERY_GRPC_LISTEN_ADDRESS
takes precedence and this GRPCListenAddr
value is ignored.PeerListenAddr
nginx
in front to do the decryption.
Must be different from the ListenAddr
setting.
Should be in 0.0.0.0:8081
form.CompressPeerCommunication
APIKeys
HoneycombAPI
SendDelay
BatchTimeout
libhoney
[100ms]. Eligible for live reload.TraceTimeout
MaxBatchSize
500
.SendTicker
LoggingLevel
UpstreamBufferSize
and PeerBufferSize
DebugServiceAddr
-d
is specified.
The debug service runs on the first open port between localhost:6060
and localhost:6069
by default.AddHostMetadataToTrace
meta.Refinery
.
(For example, meta.Refinery.local_hostname
)EnvironmentCacheTTL
HoneycombAPI
config value. Default is 1 hour (1h
). Not eligible for live reload.The EnvironmentCacheTTL
configuration option is not valid for Honeycomb Classic.
AddRuleReasonToTrace
meta.Refinery.reason
.
This field contains text indicating which rule was evaluated that caused the trace to be included.
Spans arriving after the trace’s sampling decision has already been made will have their meta.Refinery.reason
set to late
before sending to Honeycomb.
Default is false
.
Eligible for live reload.AdditionalErrorFields
dataset
, apihost
, and environment
are always included. Fields not present in the span do not appear in error log. Default is [“trace.span_id”]. Eligible for live reload.AddSpanCountToRoot
meta.span_count
, to root spans, which indicates the number of child spans on the trace at the time that the sampling decision was made.
This value is available to the rules-based sampler, making it possible to write rules that are dependent upon the number of spans in the trace.
Default is false
.
Eligible for live reload.CacheOverrunStrategy
CacheOverrunStrategy
to resize
means that when a cache overrun occurs, the cache shrinks and never grows again, which is generally not helpful unless occurring because of a permanent change in traffic patterns.
Setting CacheOverrunStrategy
to impact
means that the items having the most impact on the cache size are ejected from the cache earlier than normal, but the cache is not resized.
In both cases, CacheOverrunStrategy
only applies if MaxAlloc
is nonzero.
Default is resize
for backwards compatibility but impact
is recommended for most installations.
Eligible for live reload.The names that Refinery uses for trace ID and parent span ID are configurable. This can be helpful if you are using a tracing system with a non-standard naming scheme for these fields.
By default, Refinery recognizes the following field names incoming data:
trace.trace_id
trace.parent_id
traceId
parentId
You can add additional field names to the list by adding them to the TraceIDFieldNames
and ParentIDFieldNames
lists in the configuration file:
# Custom field names for trace ID
TraceIdFieldNames = [
"trace.my_trace_id",
"trace_id"
]
# Custom field names for parent span ID
ParentIdFieldNames = [
"trace.my_parent_id",
"parent_id"
]
Sample Cache Configuration controls the sample cache used to retain information about trace status after the sampling decision has been made.
legacy
:
“legacy” is a strategy where both keep and drop decisions are stored in a circular buffer that is 5x the size of the trace cache.
This is Refinery’s original sample cache strategy.
It is the default.
Not eligible for live reload (you cannot change the type of cache with reload).
cuckoo
:
“cuckoo” is a strategy where dropped traces are preserved in a “Cuckoo Filter”, which can remember a much larger number of dropped traces, leaving capacity to retain a much larger number of kept traces.
It is also more configurable (see below).
The cuckoo filter is recommended for most installations.
Not eligible for live reload as you cannot change the type of cache with reload.
KeptSize
:
Controls the number of traces preserved in the kept traces cache.
Refinery keeps a record of each trace that was kept and sent to Honeycomb, along with some statistical information.
This is most useful in cases where the trace was sent before sending the root span, so that the root span can be decorated with accurate metadata.
Default is 10_000
traces (each trace in this cache consumes roughly 200 bytes).
Does not apply to the “legacy” type of cache.
Eligible for live reload.
DroppedSize
:
Controls the size of the cuckoo dropped traces cache.
This cache consumes 4-6 bytes per trace at a scale of millions of traces.
Changing its size with live reload sets a future limit, but does not have an immediate effect
Default is 1_000_000
traces.
Does not apply to the “legacy” type of cache.
Eligible for live reload.
Controls the parameters of the stress relief system. There is a metric called stress_level that is emitted as part of Refinery metrics. It is a measure of Refinery’s throughput rate relative to its processing rate, combined with the amount of room in its internal queues, and ranges from 0 to 100. It is generally expected to be low except under heavy load. When stress levels reach 100, there is an increased chance that Refinery will become unstable.
To avoid this problem, the Stress Relief system can do deterministic sampling on new trace traffic based solely on TraceID, without having to store traces in the cache or take the time processing sampling rules. Existing traces in flight will be processed normally, but when Stress Relief is active, trace decisions are made deterministically on a per-span basis; all spans will be sampled according to the SamplingRate specified here.
Once Stress Relief activates (by exceeding the ActivationLevel
), it will not deactivate until stress_level
falls below the DeactivationLevel
.
When it deactivates, normal trace decisions are made – and any additional spans that arrive for traces that were active during Stress Relief will respect those decisions.
The measurement of stress is a lagging indicator and is highly dependent on Refinery configuration and scaling. Other configuration values should be well tuned first, before adjusting the Stress Relief Activation parameters.
Mode
:
a string indicating how to use Stress Relief. "never"
means that Stress Relief will never activate.
"monitor"
is the recommended setting, and means that Stress Relief will monitor the status of Refinery and activate according to the levels set below.
"always"
means that Stress Relief is always on, which may be useful in an emergency situation.
Default is "never"
.
Eligible for live reload.
ActivationLevel:
The stress_level
(from 0-100) at which Stress Relief is triggered.
Default value is 75.
Eligible for live reload.
DeactivationLevel
:
The stress_level
(from 0-100) at which Stress Relief is turned off (subject to MinimumActivationDuration
).
Under normal circumstances, it should be well below ActivationLevel
to avoid oscillations.
Default value is 25.
Eligible for live reload.
StressSamplingRate
:
The sampling rate to use when Stress Relief is activated.
All new traces will be deterministically sampled at this rate based only on the traceID.
Default value is 100.
Eligible for live reload.
MinimumActivationDuration
:
The minimum time that stress relief will stay enabled, once activated.
This prevents oscillations.
Default value is 10s.
Eligible for live reload.
MinimumStartupDuration
:
Used when switching into Monitor mode.
When stress monitoring is enabled, it will start up in stressed mode for at least this amount of time to try to make sure that Refinery can handle the load before it begins processing it in earnest.
This is to help address the problem of trying to bring a new node into an already-overloaded cluster.
If this duration is 0, Refinery will not start in stressed mode.
This can provide faster startup at the possible cost of startup instability.
Default value is “3s”.
For proper data distribution, each Refinery process needs to know how to identify and communicate with its peers, the other Refinery processes participating in the cluster. The list of peer identifiers can be referenced dynamically through redis (redis-based peer management, recommended) or set explicitly in a hard-coded list in the config file (file-based peer management).
All of the peer management options are set within the [Peer Management]
section of the Refinery config file.
Strategy
"hash"
. With the "hash"
strategy, only 1/N traces (where N is the number of nodes) get redistributed. The “legacy” strategy, which is the default, uses a simple algorithm that makes 1/2 of the in-flight traces to be assigned to a different node whenever the number of nodes changes. The legacy strategy is deprecated and is intended to be removed in a future release. Not eligible for live reload.Configuring Refinery for peer management with Redis requires more configuration information than the default file-based peer management, but is recommended so that as a Refinery cluster scales up with new instances, existing instances learn of their new peers without further intervention.
Refinery needs to know the Redis hostname and port, which can be specified in one of two ways:
REFINERY_REDIS_HOST
environment variable orRedisHost
field in the config fileSimilarly, a password for Redis can be specified:
REFINERY_REDIS_PASSWORD
environment variable orRedisPassword
field in the config fileTo customize Redis-based Peer Management for your environment, the following fields can be set under the [Peer Management]
section of config.toml
:
Type
redis
to use redis for managing the peer registry.RedisHost
REFINERY_REDIS_HOST
is set, REFINERY_REDIS_HOST
takes precedence and this RedisHost
value is ignored.
Not eligible for live reload.
The redis
host should be a hostname and a port.
For example: redis.mydomain.com:6379
.
The example config file has localhost:6379
, which will not work with more than one host.RedisUsername
REFINERY_REDIS_USERNAME
is set, REFINERY_REDIS_USERNAME
takes precedence and this RedisUsername
value is ignored.
Not eligible for live reload.RedisPassword
REFINERY_REDIS_PASSWORD
is set, REFINERY_REDIS_PASSWORD
takes precedence and this RedisPassword
value is ignored.
Not eligible for live reload.UseTLS
IdentifierInterfaceName
IdentifierInterfaceName
field.
Refinery will use the first available unicast address on the given interface as its peering identifier to register in redis.
The unicast address will be IPv4 by default or IPv6 if UseIPV6Identifier
is set to true
.UseIPV6Identifier
true
if the peering network is IPv6 and IdentifierInterfaceName
is set.
Refinery will use the first IPv6 unicast address found instead of IPv4.RedisIdentifier
IdentifierInterfaceName
.Timeout
File-based peer management is the default behavior. This peer management option is not recommended if you expect to increase your Refinery instances due to the intensive process required to update configuration files.
To use file-based Peer Management, configure the following fields in the [Peer Management]
section of config.toml
:
Type
file
to use the Refinery configuration file to list Refinery peers.Peers
scheme
, hostname
(or ip address
), and port
.
All servers in the cluster should be in this list, including this host.Refinery supports the following environment variables. Environment variables take precedence over file configuration.
Environment Variable | Configuration Field |
---|---|
REFINERY_GRPC_LISTEN_ADDRESS |
GRPCListenAddr |
REFINERY_REDIS_HOST |
PeerManagement.RedisHost |
REFINERY_REDIS_USERNAME |
PeerManagement.RedisUsername |
REFINERY_REDIS_PASSWORD |
PeerManagement.RedisPassword |
REFINERY_HONEYCOMB_API_KEY |
HoneycombLogger.LoggerAPIKey |
REFINERY_HONEYCOMB_METRICS_API_KEY REFINERY_HONEYCOMB_API_KEY |
HoneycombMetrics.MetricsAPIKey |
REFINERY_QUERY_AUTH_TOKEN |
QueryAuthToken |
REFINERY_HONEYCOMB_METRICS_API_KEY
takes precedence over REFINERY_HONEYCOMB_API_KEY
for the HoneycombMetrics.MetricsAPIKey
configuration.
There are a few components of Refinery with multiple implementations; the config file lets you choose your desired implementation.
For example, there are two logging implementations: one that uses logrus
and sends logs to STDOUT
, and a honeycomb
implementation that sends the log messages to a Honeycomb dataset instead.
Components with multiple implementations have one top level config item that lets you choose which implementation to use and then a section further down with additional config options for that choice. For example, the Honeycomb logger requires an API key.
Changing implementation choices requires a process restart; these changes will not be picked up by a live configuration reload. (Individual configuration options for a given implementation may be eligible for live reload).
Collector describes which collector to use for collecting traces.
The only current valid option is InMemCollector
.
More can be added by adding implementations of the Collector interface.
Use the fields below to modify your Collector settings.
CacheCapacity
CacheCapacity
value.
For guidance on how to best configure the CacheCapacity
value, please refer to the Scale and Troubleshoot documentation.
In addition, a cache remembers the sampling decision for any spans that might come in after the trace has been marked “complete” (either by timing out or seeing the root span); that capacity will be 5x this value.
This setting is eligible for live reload; growing the cache capacity with a live config reload is fine.
Avoid shrinking it with a live reload (you can, but it may cause temporary odd sampling decisions).
If the cache capacity is too low, the collect_cache_buffer_overrun
metric will increment.
If this indicator occurs, you should increase the CacheCapacity
value.MaxAlloc
Logger describes which logger to use for Refinery logs.
Valid options are logrus
and honeycomb
.
Set where log events go in this section. Use honeycomb
to send logs to the Honeycomb API. Use logrus
to send logs to STDOUT
.
LoggerHoneycombAPI
LoggerAPIKey
APIKeys
used to authenticate regular traffic.
Eligible for live reload.LoggerDataset
LoggerSamplerEnabled
LoggerSamplerThroughput
There are no configurable options for the logrus
logger yet.
Metrics describes which service to use for Refinery metrics.
Valid options are prometheus
and honeycomb
.
The prometheus
option starts a listener that will reply to a request for /metrics
.
The honeycomb
option will send summary metrics to the Honeycomb dataset you specify.
Refinery emits metrics as a Honeycomb event containing all values at each reporting interval. This configuration does not send OTLP Metrics to Honeycomb.
MetricsHoneycombAPI
MetricsAPIKey
MetricsDataset
MetricsReportingInterval
MetricsListenAddr
Here ends the list of the general Refinery configuration options. Remember to customize your Sampling Methods configuration to complete your Refinery set-up.
Did you find what you were looking for?