The OpenTelemetry CollectorDocumentation Index
Fetch the complete documentation index at: https://docs.honeycomb.io/llms.txt
Use this file to discover all available pages before exploring further.
drainprocessor applies the Drain online log parsing algorithm to each log record and attaches the derived template as an attribute.
A template captures the invariant skeleton of a log message, with variable parts replaced by <*> wildcards.
For example, the messages user 1234 logged in from 10.0.0.1 and user 5678 logged in from 10.0.0.2 both reduce to user <*> logged in from <*>.
Deriving templates at the collector means a stable template attribute, defaulting to log.record.template, lands on every log record in your Honeycomb dataset.
You can GROUP BY it to find dominant patterns, build SLOs or Triggers against specific templates, and feed template-based conditions into Refinery sampling rules.
When to use the drain processor
The drain processor is the right choice when you need to discover and stabilize log patterns at the collector rather than define them manually. Use it when:- You have high-volume or noisy logs and want to identify the dominant message patterns without manually defining them.
- You want a stable template attribute on every event so that SLOs, Triggers, and Refinery rules can reference it directly.
- You want to make log volume reduction decisions (sampling, dropping, or routing) at the collector based on pattern rather than message content.
- You only need to template logs for a single ad-hoc query. A Calculated Field using regex or string functions is enough and doesn’t require collector changes. Calculated Fields are per-query and can’t be referenced by SLOs, Triggers, or Refinery.
- You already know the exact patterns and just need to tag them. The
transformprocessor with OTTL is more precise than a learned algorithm. - You need a stable, generally-available processor and can’t adopt alpha components.
How drain works
Drain incrementally builds a fixed-depth parse tree of tokens seen across log messages. Each leaf holds a small set of templates; an incoming message walks the tree to the matching leaf and is merged with the closest template if a sufficient fraction of tokens match. This approach is fast, runs online without retraining, and converges to a stable set of templates after a short warmup. The two parameters that most affect output quality aretree_depth and merge_threshold.
To learn more, refer to the Tune the Algorithm section.
Configuration reference
This table covers the most common fields. For the authoritative option list at the version of the collector you run, refer to the upstreamdrainprocessor README.
| Field | Type | Required | Default | Purpose |
|---|---|---|---|---|
tree_depth | int | No | 4 | Depth of the parse tree. Minimum is 3. Higher values increase precision but use more memory. |
merge_threshold | float | No | 0.4 | Fraction of tokens that must match an existing template (range 0.0–1.0). Lower values merge more aggressively into fewer templates. |
max_node_children | int | No | 100 | Maximum children per internal tree node. |
max_clusters | int | No | 0 | Maximum number of templates retained. 0 means unlimited. When the cap is hit the least-recently-used template is evicted. |
extra_delimiters | []string | No | [] | Additional characters to treat as token delimiters, in addition to whitespace. |
body_field | string | No | "" | When the log body is a map, the key whose value should be templated. Empty means template the stringified body. |
template_attribute | string | No | log.record.template | Attribute name written to each log record. |
seed_templates | []string | No | [] | Pre-loaded templates the processor starts with, using <*> for wildcards. |
seed_logs | []string | No | [] | Example log lines used to seed templates at startup. |
warmup_min_clusters | int | No | 0 | Minimum templates that must exist before the processor begins annotating records. Useful to avoid noisy templates during a cold start. |
storage | string | No | in-memory | ID of a storage extension for persisting templates across collector restarts. |
save_interval | duration | No | 0s | How often to snapshot template state to the storage extension. 0s disables periodic snapshots. |
Basic configuration
Wire the processor into your logs pipeline. Spans and metric pipelines should not include it.log.record.template attribute set to its derived template.
To start with a curated set of templates rather than learning from scratch, seed the processor:
warmup_min_clusters holds back annotation until the processor has learned the specified number of templates, preventing a noisy first few minutes of data.
Tune the algorithm
The defaults work well for general application logs but produce poor results for very high-volumne logs with many distinct patterns, or very low-volume logs where the algorithm has little data to learn from. Two parameters do most of the work:tree_depth: Controls how many leading tokens the tree splits on. Increase it (5–6) for logs where the first few tokens carry less signal, such as logs prefixed with a long timestamp or request ID. Decrease it (3) for short, structured logs.merge_threshold: Controls how readily new messages merge into existing templates. Lower values (0.2–0.3) produce fewer, broader templates. Higher values (0.5–0.7) produce more, narrower templates. If you see hundreds of near-duplicate templates, lower the threshold.
max_clusters is a memory ceiling, not a quality control.
Set it to roughly twice the number of stable templates you expect; LRU eviction handles the rest.
For most services this is in the low hundreds.
If a single source emits structured logs as a map, set body_field to the key that holds the human-readable message so the processor templates the right value.
Use templates in Honeycomb
Once log records carrylog.record.template, you can use it like any other attribute:
- Find dominant patterns: Run a query over your logs dataset with
COUNTgrouped bylog.record.template, ordered descending. The top rows are your noisiest patterns. - Build SLOs and Triggers: Reference
log.record.template = "<specific template>"in the SLI or Trigger condition. A stable template attribute makes these conditions resilient to changes in variable values that would otherwise require regex. - Drive Refinery sampling: Configure Refinery rules to drop or downsample high-volume templates while keeping a representative sample of each pattern.
- Identify reduction candidates: A small number of templates accounting for a large fraction of log volume usually means easy reductions are available, either by dropping at the collector, by pairing with the log deduplication processor downstream, or by addressing the source.
Reduce volume by pairing with the log deduplication processor
A common and effective pattern is to put thelogdedupprocessor immediately downstream of the drain processor.
The drain processor attaches a stable template attribute to every record; the log deduplication processor then collapses records that share the same template within a time window into a single record annotated with a count.
For high-volume, repetitive logs, this can reduce egress to Honeycomb by an order of magnitude or more while preserving the information needed to reason about pattern frequency.
By default, the log deduplication processor matches on the full body, attributes, severity, and resource, which prevents two records with the same template but different variable values from collapsing.
To deduplicate by template, restrict matching to the template attribute and any resource attributes you care about with include_fields.
log.record.count in the example above) and first_observed_timestamp/last_observed_timestamp attributes that the log deduplication processor adds.
Use SUM(log.record.count) rather than COUNT when summing volume across templates.
The log deduplication processor is also at alpha stability for logs.
The same upgrade caveats apply.
Persist templates across restarts
By default, the processor’s learned templates are in-memory and are lost on collector restart. After a restart, the processor relearns templates and may produce different output until the algorithm converges again. To persist templates, configure a storage extension and reference it from the processor:save_interval controls how often state is snapshotted.
Shorter intervals reduce data loss on crash at the cost of more disk activity.
Limitations
Keep these constraints in mind when planning your pipeline configuration:- The processor operates on logs only. Span and metric pipelines should not include it.
- Drain is an online clustering algorithm; output is deterministic only when seeded with
seed_templatesorseed_logs. Two collectors without shared state will produce slightly different template inventories. - Without a
storageextension, templates do not survive collector restarts. - Alpha stability means option names and defaults may change between collector releases. Validate after every upgrade.
Related topics
These resources cover the tools and approaches that work alongside or instead of the drain processor:- Handle Sensitive Information with the OpenTelemetry Collector: Covers the
transformandredactionprocessors, which are the right tools when you know the patterns up front rather than learning them. - Calculated Fields: Honeycomb-side, per-query templating when collector changes are not an option.
- Refinery: Apply template-based sampling rules using the attribute the drain processor produces.
- Upstream
drainprocessorREADME - Upstream
logdedupprocessorREADME - Drain: An Online Log Parsing Approach with Fixed Depth Tree (paper)