Configure Collector Export to Amazon S3 Archive


Note
This feature is available as an add-on for the Honeycomb Enterprise plan. Please contact your Honeycomb account team for details.

Honeycomb supports rehydrating archived, unsampled OpenTelemetry (OTel) trace and log data from an Amazon S3 bucket. In this guide, you will learn how to configure an OpenTelemetry Collector to export unsampled data to an Amazon S3 bucket.

Configure the AWS S3 Exporter 

Set the following s3uploader attributes in your AWS S3 Exporter:

  • region: your S3 region.
  • s3_bucket: the name of your S3 bucket.
  • s3_prefix: the directory to store your trace and log data in.
  • compression: set to gzip to reduce object storage cost.
  • marshaler: set to otlp_proto to improve parsing performance.

In this configuration, files will be organized into subdirectories with minute resolution: "year=%Y/month=%m/day=%d/hour=%H/minute=%M".

You can batch the data sent to your S3 bucket by configuring the sending_queue section of the exporter:

  • flush_timeout: The time after which a batch will be sent regardless of its size. Must be a non-zero value.
  • min_size: The minimum size of a batch.
  • max_size: The maximum size of a batch, which enables batch splitting. The maximum size of a batch should be greater than or equal to the minimum size of a batch.
  • queue_size: Maximum size the queue can accept. Measured in units defined by sizer. Defaults to 1000.
  • sizer: How the queue and batching is measured. Default is requests. Available options:
    • requests: number of incoming batches of metrics, logs, traces (the most performant option).
    • items: number of the smallest parts of each signal (spans, metric data points, log records).
    • bytes: the size of serialized data in bytes (the least performant option).

You can also set a custom timeout for the exporter, which is the amount of time to wait per individual attempt to send data to a backend. The default is 5 seconds.

Here is an example AWS S3 Exporter configuration:

exporters:
  awss3:
    s3uploader:
        region: 'us-east-1'
        s3_bucket: 'my-test-bucket'
        s3_prefix: 'traces-logs-directory'
        compression: 'gzip'
        marshaler: 'otlp_proto'
        sending_queue:
            batch:
                flush_timeout: 30s
                max_size: 50000
                min_size: 50000
            enabled: true
            queue_size: 500000
            sizer: items
        timeout: 30s

Configure a Pipeline 

Honeycomb supports processing both trace and log OpenTelemetry signal types for data activation and rehydration. These signal types can be sent through the AWS S3 exporter in your collector’s pipelines configuration.

You should set up a separate pipeline configuration block for each signal type. In this example, pipelines for logs and traces are labeled to indicate pipelines intended for object storage (storage in Amazon S3):

service:
    pipelines:
        traces:
            // [...]
        logs:
            // [...]
        logs/objectstorage:
            exporters:
                - awss3
            receivers:
                - otlp
        traces/objectstorage:
            exporters:
                - awss3
            receivers:
                - otlp

Full example configuration 

The example below shows a simple, but complete OpenTelemetry (OTel) Collector configuration for exporting both log and trace OTel signal types through the AWS S3 exporter where:

  • Trace and log telemetry is exported to an Amazon S3 bucket called "my-test-bucket".
  • An S3 prefix is used at the root of the bucket called "traces-logs-directory".
  • Data will be partitioned into subdirectories underneath the prefix with minute resolution.
  • Files uploaded to S3 will be compressed with gzip compression.
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  awss3:
    s3uploader:
        region: 'us-east-1'
        s3_bucket: 'my-test-bucket'
        s3_prefix: 'traces-logs-directory'
        compression: 'gzip'
        marshaler: 'otlp_proto'
        sending_queue:
            batch:
                flush_timeout: 30s
                max_size: 50000
                min_size: 50000
            enabled: true
            queue_size: 500000
            sizer: items
        timeout: 30s

service:
    pipelines:
        logs/objectstorage:
            exporters:
                - awss3
            receivers:
                - otlp
        traces/objectstorage:
            exporters:
                - awss3
            receivers:
                - otlp