Send Logs with the OpenTelemetry Collector

Collect and convert logs from different sources to OpenTelemetry’s structured log format.

The OpenTelemetry Collector can collect logs from many different sources in common or custom formats. You can use a Collector as a logging agent, often as a drop-in replacement for other logging agents. This lets you process your logs, traces, and metrics in one place.

Use a Receiver to Collect Logs 

The OpenTelemetry Collector supports a large number of receivers that can be used to collect logs from a variety of sources.

Setup 

  1. Make sure you’re using the Collector Contrib distribution of the OpenTelemetry Collector, which contains contributions that are not part of the core repository and core distribution of the OpenTelemetry Collector.
  2. Prepare your collector configuration file by adding the following boilerplate:
receivers:
  # Add your receiver here
  # ...

processors:
  batch:
  # Add any additional processors here
  # ...

exporters:
  otlp/logs:
    endpoint: "api.honeycomb.io:443"
    headers:
      "x-honeycomb-team": "YOUR_API_KEY"
      "x-honeycomb-dataset": "YOUR_LOGS_DATASET_NAME"

service:
  pipelines:
    logs:
      receivers: [receiver1,receiver2,etc]
      processors: [batch]
      exporters: [otlp/logs]
Note
If you are sending OTLP logs from a service with a service.name defined, then the dataset for those logs will be the name of the service, and the x-honeycomb-dataset header will not be used.

Collect any Log with the Filelog Receiver 

The Filelog Receiver supports reading and parsing any arbitrary log written to a file on a server.

The Filelog Receiver is the most flexible receiver, but depending on the shape of your logs, it may require additional configuration to parse your logs correctly.

For example, here is a configuration that reads an NGINX access log and parses it into a structured log:

receivers:
  filelog:
    include: ["/var/log/nginx/access.log"]
    operators:
      - type: "regex_parser"
        regex: "(?P<remote>[^ ]*) - - \\[(?P<time>[^\\]]*)\\] \"(?P<method>\\S+)(?: +(?P<path>[^ ]*) +\\S*)?\" (?P<status>\\d+) (?P<size>\\d+) \"(?P<referer>[^\"]*)\" \"(?P<agent>[^\"]*)\""
        timestamp:
          parse_from: attributes.time
          layout: "%d/%b/%Y:%H:%M:%S %z"

Here’s a configuration that reads a JSON log:

receivers:
  filelog:
    include: [ /var/log/myservice/*.json ]
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%d %H:%M:%S'

Logs with Mixed Formats 

Sometimes, logs have a mix of structured and unstructured information, such as an info log with JSON inside. To parse these, you need to parse each piece of the log into a specific element, and also know how to handle the structured information.

For example, let’s say you have a log at /var/log/name-service/0.log that mixes text and JSON:

2024-03-19T14:49:51.998-0600 info name-service/main.go:212 Listening on http://localhost:8000/name {"name": "name-service", "function": "main", "featureflag.allow-future": true, "name-format": "lowercase"}

You can use a filelogreceiver configuration to do the following:

  1. Read the log from /var/log/name-service/*.log.
  2. Parse the full text of the log into a timestamp, severity, file name, message, and “details”.
  3. Parse the “details” text using a JSON parser, where each key in the JSON object is turned into an attribute for the log.
  4. Parse the service name from the file path.
  5. Sets the service.name resource with the parsed service name.

The following filelogreceiver configuration completes the required tasks:

filelog:
  include:
    - /var/log/name-service/*.log
  include_file_path: true
  operators:
    # Parse the text.
    # Each capture group becomes an attribute on the log.
    # The original full string becomes the body.
    - type: regex_parser
      regex: (?P<timestamp>^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}-\d{4}) (?P<severity>\w+) (?P<filename>[\w\/\.\:\-]+) (?P<message>[^\{]*) (?P<details>.*)
      timestamp:
        layout: "%Y-%m-%dT%H:%M:%S.%f%z"
        parse_from: attributes.timestamp
      severity:
        parse_from: attributes.severity
    # The "details" attribute is a json string.
    # This operator parses the json string to turn it into a map.
    # Honeycomb will flatten the map on ingest.
    - type: json_parser
      parse_from: attributes.details
      parse_to: attributes.details
    # We can extract the service name, name-service, from the file path
    - type: regex_parser
      regex: \/var\/log\/(?P<servicename>.*)\/\d+\.log
      parse_from: attributes["log.file.path"]
    # Move the extracted attribute to the service name resource attribute
    - type: move
      from: attributes.servicename
      to: resource["service.name"]

The configuration produces a structured log that, when exported to Honeycomb, contains the following fields:

{
    "body": "2024-03-19T14:49:51.998-0600 info name-service/main.go:212 Listening on http://localhost:8000/name {\"name\": \"name-service\", \"function\": \"main\", \"featureflag.allow-future\": true, \"name-format\": \"lowercase\"}",
    "severity": "info",
    "severity_code": 9,
    "service.name": "name-service",
    "message": "Listening on http://localhost:8000/name",
    "timestamp": 1679265391998,
    "details.featureflag.allow-future": true,
    "details.function": "main",
    "details.name": "name-service",
    "details.name-format": "lowercase",
    "log.file.name": "0.log",
    "log.file.path": "/var/log/name-service/0.log",
    "flags": 0,
}

Use the Explore Data tab in Query Results to view the structured log in Honeycomb.

Log Sources 

You can configure many different receivers to collect logs from a specific source.

AWS Cloudwatch 

The AWS Cloudwatch Receiver supports autodiscovery of log groups and log streams in AWS Cloudwatch, with optional filtering of those sources.

For example, here is a configuration that autodiscovers only EKS logs from us-west-1:

receivers:
  awscloudwatch/:
    region: us-west-1
    logs:
      poll_interval: 1m
      groups:
        autodiscover:
          limit: 100
          prefix: /aws/eks/

Azure Blob 

The Azure Blob Receiver reads logs and trace data from Azure Blob Storage.

For example, here is a configuration that reads logs from a specific container in Azure Blob Storage:

receivers:
  azureblob:
    connection_string: DefaultEndpointsProtocol=https;AccountName=accountName;AccountKey=<your-key>;EndpointSuffix=core.windows.net
    event_hub:
      endpoint: Endpoint=sb://oteldata.servicebus.windows.net/;SharedAccessKeyName=otelhubbpollicy;SharedAccessKey=<access-key>;EntityPath=otellhub
    logs:
        container_name: name-of-container

Azure Event Hub 

The Azure Event Hub Receiver pulls logs from an Azure Event Hub and transforms them.

For example, here is a configuration that reads logs from a specific parition and group, then structures them a structured JSON log:

receivers:
  azureeventhub:
    connection: Endpoint=<your-endpoint>;SharedAccessKeyName=<your-key>;SharedAccessKey=<your-key>;EntityPath=hubName
    partition: my-partition
    group: my-consumer-group

Cloudflare 

The Cloudflare Receiver accepts logs from CloudFlare’s LogPush Jobs.

For example, here is a configuration that reads logs from a specific LogPush Job:

receivers:
  cloudflare:
    logs:
      tls:
        key_file: some_key_file
        cert_file: some_cert_file
      endpoint: <your-endpoint>
      secret: <your-secret>
      timestamp_field: EdgeStartTimestamp
      attributes:
        ClientIP: http_request.client_ip
        ClientRequestURI: http_request.uri

Fluent Forward 

The Fluent Forward Receiver runs a TCP server that accepts logs via the Fluent Forward protocol, which enables collecting logs from Fluentbit and Fluentd.

For example, here is a configuration that reads all logs on port 8006:

receivers:
  fluentforward:
    endpoint: 0.0.0.0:8006

Google Pubsub 

The Google Pubsub Receiver reads logs from a Google Pubsub subscription.

For example, here is a configuration that reads raw text logs and wraps them into an OpenTelemetry Log:

receivers:
  googlecloudpubsub:
    project: otel-project
    subscription: projects/otel-project/subscriptions/otlp-logs
    encoding: raw_text

Journald 

The Journald Receiver parses Journald events from the systemd.

For example, here is a configuration that reads all logs from Journald from some specific units:

receivers:
  journald:
    directory: /run/log/journal
    units:
      - ssh
      - kubelet
      - docker
      - containerd
    priority: info

Kafka 

The Kafka Receiver reads logs, metrics, and traces from Kafka.

For example, here is a configuration that reads all Kafka data:

receivers:
  kafka:
    protocol_version: 2.0.0

Kubernetes 

The OpenTelemetry Collector has several receivers that can be used to collect logs from Kubernetes. To learn more, visit Kubernetes Log Collection and Kubernetes Event Collection.

Loki 

The Loki Receiver allows Promtail instances to send logs to the OpenTelemetry Collector.

For example, here is a configuration that reads all logs from an endpoint:

receivers:
  loki:
    protocols:
      http:
        endpoint: 0.0.0.0:3500
      grpc:
        endpoint: 0.0.0.0:3600
    use_incoming_timestamp: true

MongoDB Atlas 

The MongoDB Atlas Receiver reads logs from MongoDB Atlas.

For example, here is a configuration that reads all logs from a specific project:

receivers:
  mongodbatlas:
    logs:
      enabled: true
      projects: 
        - name: "project 1"
          collect_audit_logs: true
          collect_host_logs: true

OTLPJson File 

The OTLPJson File Receiver reads any existing OTLP Logs, Metrics, or Traces from a file on a server.

For example, here is a configuration that reads from a specific directory and excludes a specific file:

receivers:
  otlpjsonfile:
    include:
      - "/var/log/*.log"
    exclude:
      - "/var/log/example.log"

Apache Pulsar 

The Pulsar Receiver collects logs, metrics, and traces from Apache Pulsar.

For example, here is a configuration that reads data from a Pulsar cluster:

receivers:
  pulsar:
    endpoint: pulsar://localhost:6650
    topic: otlp-spans
    subscription: otlp_spans_sub
    consumer_name: otlp_spans_sub_1
    encoding: otlp_proto
    auth:
      tls:
        cert_file: cert.pem
        key_file: key.pem
    tls_allow_insecure_connection: false
    tls_trust_certs_file_path: ca.pem

SignalFx 

The SignalFx Receiver reads logs from a SignalFx endpoint.

For example, here is a configuration that reads data from a SignalFx endpoint:

receivers:
  signalfx:
    endpoint: 0.0.0.0:9943
  signalfx/advanced:
    endpoint: 0.0.0.0:9943
    access_token_passthrough: true
    tls:
      cert_file: /test.crt
      key_file: /test.key

Splunk HEC 

The Splunk HEC Receiver accepts events in the Splunk HEC format.

For example, here is a configuration that reads JSON HEC events and raw log data:

receivers:
  splunk_hec:
    endpoint: 0.0.0.0:8088
  splunk_hec/advanced:
    endpoint: 0.0.0.0:8088
    access_token_passthrough: true
    tls:
      cert_file: /test.crt
      key_file: /test.key
    raw_path: "/raw"
    hec_metadata_to_otel_attrs:
      source: "mysource"
      sourcetype: "mysourcetype"
      index: "myindex"
      host: "myhost"

Syslog 

The Syslog Receiver parses Syslogs received over UDP or TCP.

For example, here is a configuration that reads Syslogs from TCP:

receivers:
  syslog:
    tcp:
      listen_address: "0.0.0.0:54526"
    protocol: rfc5424

TCP 

The TCP Receiver receives logs over TCP.

For example, here is a configuration that reads logs from TCP over a particular address:

receivers:
  tcplog:
    listen_address: "0.0.0.0:54525"

UDP 

The UDP Receiver receives logs over UDP.

For example, here is a configuration that reads logs from UDP over a particular address:

receivers:
  udplog:
    listen_address: "0.0.0.0:54525"

Webhook Event 

The Webhook Event Receiver allows for any webhook-style data source to send logs to the OpenTelemetry Collector.

For example, here is a configuration that reads logs from a webhook:

receivers:
    webhookevent:
        endpoint: localhost:8088
        read_timeout: "500ms"
        path: "eventsource/receiver"
        health_path: "eventreceiver/healthcheck"
        required_header:
            key: "required-header-key"
            value: "required-header-value"

Windows Log Event 

The Windows Log Event Receiver tails and parses logs from the Windows event log API.

For example, here is a configuration that reads logs from a named channel:

receivers:
    windowseventlog:
        channel: application