Deduplicate Logs


Note
This feature is available as an add-on for the Honeycomb Enterprise plan. Please contact your Honeycomb account team for details.

Description 

The Deduplicate Logs processor can be used to deduplicate logs over a time range and emit a single log with the count of duplicate logs.

Logs are considered duplicates if the following match:

  • Severity
  • Log Body
  • Resource Attributes
  • Log Attributes

Supported Types 

Metrics Logs Traces

Configuration Table 

Parameter Type Default Description
interval* int 10 The interval in seconds on which to aggregate logs. An aggregated log will be emitted after the interval passes.
log_count_attribute* string log_count The name of the count attribute of deduplicated logs that will be added to the emitted log.
timezone* string UTC The timezone of the first_observed_timestamp and last_observed_timestamp log attributes that are on the emitted log.
exclude_fields strings A list of fields to exclude from duplicate matching. Fields can be excluded from the log body or attributes. These fields will not be present in the emitted log. More details can be found here.

*required field

exclude_fields Parameter 

The exclude_fields parameter allows the user to remove fields from being considered when looking for duplicate logs. Fields can be excluded from either the body or attributes of a log. Though the entire body cannot be excluded. Nested fields can be specified by delimiting each part of the path with a .. If a field contains a . as part of its name it can be escaped by using \..

Below are a few examples and how to specify them:

  • Exclude timestamp field from the body -> body.timestamp
  • Exclude a host.name field from the log attributes -> attributes.host\.name
  • Exclude a nested ip field inside a src attribute -> attributes.src.ip

Example Configuration 

Basic Configuration 

Setting a custom log_count_attribute and timezone while deduplicating logs on a 60 second interval.

Web Interface 

Honeycomb Docs - Deduplicate Logs - image 1

Standalone Processor 

apiVersion: bindplane.observiq.com/v1
kind: Processor
metadata:
  id: log-dedup
  name: log-dedup
spec:
  type: log_dedup
  parameters:
    - name: interval
      value: 60
    - name: log_count_attribute
      value: 'dedup_count'
    - name: timezone
      value: 'America/Los_Angeles'

Exclude Fields 

This example shows the addition of exclude_fields. More information on exclude_fields can be found here.

Web Interface 

Honeycomb Docs - Deduplicate Logs - image 2

Standalone Processor 

apiVersion: bindplane.observiq.com/v1
kind: Processor
metadata:
  id: exclude-fields
  name: exclude-fields
spec:
  type: log_dedup
  parameters:
    - name: interval
      value: 10
    - name: log_count_attribute
      value: 'log_count'
    - name: timezone
      value: 'UTC'
    - name: exclude_fields
      value:
        - 'attributes.timestamp'
        - 'body.time'
        - 'attributes.log\.file\.name'