Collector Export to Amazon S3 Archive


Note
This feature is available as an add-on for the Honeycomb Enterprise plan. Please contact your Honeycomb account team for details.

Honeycomb supports rehydrating archived, unsampled OpenTelemetry (OTel) trace and log data stored in an Amazon S3 bucket.

In this guide, you will learn how to configure an OpenTelemetry Collector to export unsampled trace and log data to an Amazon S3 bucket using Honeycomb’s Enhance Indexing S3 Exporter. This exporter extends the standard OpenTelemetry AWS S3 Exporter by adding field-based indexing, so you can rehydrate data more quickly and cost-effectively.

Important
Adding AWS Credentials requires assistance from your Honeycomb account team. To learn more, visit Set Up an Archive.

How Indexing Works 

When you send data to your S3 archive, the exporter automatically indexes these fields:

  • trace.trace_id: Unique identifier of the trace
  • service.name: Name of the service
  • session.id: Unique identifier of the session

You can also index additional fields that you frequently query, such as user.id, customer.id, or environment.

When you run a query that requires rehydration, you can use these indexes to locate and retrieve only the relevant subset of your archived data. This gives you:

  • Faster rehydration: Retrieve only the events that match your query.
  • Lower costs: Reduce the amount of archived data you need to ingest.
  • Better performance: Investigate with a smaller, more targeted dataset.

Configuring the Enhance Indexing S3 Exporter 

The exporter requires configuration in these areas.

Honeycomb API Credentials 

The exporter uses a Honeycomb Management API key with the enhance:write scope for authentication and usage tracking.

Configure these fields in your exporter configuration:

  • api_key: Your Honeycomb Management API key. Must have the enhance:write scope.
  • api_secret: Your Honeycomb Management API secret.
  • api_endpoint: URL of the Honeycomb API endpoint.
    • US: https://api.honeycomb.io
    • EU: https://api.eu1.honeycomb.io
Note
To learn how to create a Management API key, visit Managing API Keys.

S3 Uploader Settings 

Set these attributes in the s3uploader section:

  • s3_bucket: Amazon S3 bucket in which to store the data. Not required when endpoint is set.
  • region: AWS region for your bucket. Default: us-east-1.
  • s3_prefix: (Optional) Directory-like prefix for organizing files in your S3 bucket. Example: traces-logs-directory.
  • s3_partition_format: (Optional) Partition format to use when writing files to S3. Default: "year=%Y/month=%m/day=%d/hour=%H/minute=%M" (minute-level resolution).
  • compression: Compression algorithm for S3 files. Options:
    • gzip
    • none
  • retry_mode: (Optional) Retry behavior for failed requests. Default: standard. Options:
    • standard (fixed intervals)
    • adaptive (adjusts based on server response)
  • retry_max_attempts: (Optional) Maximum number of times to retry a failed attempt. Default: 3.
  • endpoint: (Optional) Custom S3 endpoint for S3-compatible services, such as MinIO or on-premises object storage. Overrides the default AWS S3 endpoint. Example: http://localhost:9000.
  • s3_force_path_style: (Optional) Use path-style rather than virtual-hosted style URLs. Required for MinIO and some S3-compatible services. Default: false.
  • disable_ssl: (Optional) Allow unencrypted HTTP connections. Use only for local development. Default: false.
Tip
The file_prefix attribute is not supported and will cause validation to fail.

Data Format Settings 

Choose how telemetry is encoded before being written to S3:

  • marshaler: Format for encoding telemetry data. Default: otlp_protobuf. Options:
    • otlp_protobuf (smaller files, better performance)
    • otlp_json (human-readable output).

Batching and Queue Settings 

Batch data before sending to S3 by configuring the sending_queue section:

  • flush_timeout: Maximum time before a batch is sent to S3, even if not full. Must be a non-zero value.
  • min_size: Minimum size of a batch. Measured in units defined by sizer. Default: 50000
  • max_size: Maximum size of a batch. Enables batch splitting. Must be greater than or equal to min_size. Measured in units defined by sizer. Default: 50000
  • queue_size: Maximum number the queue can accept. Measured in units defined by sizer. Default: 500000.
  • sizer: Unit used to measure the queue and batch size. Default: items. Options:
    • requests: Number of incoming batches of traces and logs (the most performant option).
    • items: Number of the smallest parts of each signal (spans, log records).
    • bytes: Size of serialized data in bytes (the least performant option).

You can also set a custom timeout for the exporter:

  • timeout: Maximum time to wait for each S3 send attempt. Default: 30s.

Retry Settings 

Control how the exporter retries failed send attempts by configuring the retry_on_failure section:

  • enabled: Allow the exporter to retry failed sends.
  • initial_interval: Amount of time to delay before the first retry. Example: 5s
  • max_interval: Maximum amount of time to delay. Retries use exponential backoff, so each retry waits longer than the previous one. Example: 30s

Custom Indexed Fields 

Add indexed fields beyond the built-in ones (trace.trace_id, service.name, session.id) using the indexed_fields section.

Choose fields that are frequently used in your queries or that help narrow down your investigations. High-cardinality fields are especially useful because they make rehydration more selective. Examples:

  • user.id
  • customer.id
  • environment
  • deployment.version
Tip
To minimize Collector processing compute and time, we recommend indexing no more than 5 custom fields.
indexed_fields:
  - "user.id"
  - "customer.id"
  - "environment"
Important
Share your indexed fields with your Honeycomb account team, along with your S3 bucket information and IAM role ARN. This ensures that rehydration is configured correctly.

Examples 

These examples show common configurations, so you can choose the setup that best fits your needs.

Basic Configuration 

A basic example of the Enhance Indexing S3 Exporter configuration:

exporters:
  enhance_indexing_s3_exporter:

    # Honeycomb API credentials
    api_key: ${env:HONEYCOMB_MANAGEMENT_API_KEY}
    api_secret: ${env:HONEYCOMB_MANAGEMENT_API_SECRET}
    api_endpoint: https://api.honeycomb.io

    # S3 configuration
    s3uploader:
      region: 'us-east-1'
      s3_bucket: 'my-test-bucket'
      s3_partition_format: "year=%Y/month=%m/day=%d/hour=%H/minute=%M"
      compression: 'gzip'
        
    # Data format
    marshaler: 'otlp_protobuf'

Pipeline Configuration 

Honeycomb supports processing both trace and log OpenTelemetry signal types for data archival and rehydration. Set up a separate pipeline configuration block for each signal type.

In this example, pipelines for logs and traces are labeled to indicate pipelines intended for object storage (storage in Amazon S3):

service:
  pipelines:
    traces:
      // [...]
    logs:
      // [...]
    logs/objectstorage:
      exporters:
        - enhance_indexing_s3_exporter
      receivers:
        - otlp
    traces/objectstorage:
      exporters:
        - enhance_indexing_s3_exporter
      receivers:
        - otlp

Full Configuration 

The example below shows a simple, but complete OpenTelemetry (OTel) Collector configuration for exporting both log and trace OTel signal types through the Enhance Indexing S3 Exporter:

  • Trace and log telemetry is exported to an Amazon S3 bucket called my-test-bucket.
  • An S3 prefix is used at the root of the bucket called telemetry-data.
  • Data will be partitioned into subdirectories underneath the prefix with minute resolution.
  • Files uploaded to S3 will be compressed with gzip compression.
  • Custom indexed fields are configured for user.id, customer.id, environment, and deployment_version.
  • Queue batching is configured.
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  enhance_indexing_s3_exporter:

    # Honeycomb API credentials
    api_key: ${env:HONEYCOMB_MANAGEMENT_API_KEY}
    api_secret: ${env:HONEYCOMB_MANAGEMENT_API_SECRET}
    api_endpoint: https://api.honeycomb.io

    #S3 configuration
    s3uploader:
      region: 'us-east-1'
      s3_bucket: 'my-test-bucket'
      s3_prefix: 'telemetry-data'
      s3_partition_format: "year=%Y/month=%m/day=%d/hour=%H/minute=%M"
      compression: 'gzip'
      retry_mode: 'adaptive'
      retry_max_attempts: 5

    # Data format
    marshaler: 'otlp_protobuf'

    # Custom indexed fields
    indexed_fields:
      - "user.id"
      - "customer.id"
      - "environment"
      - "deployment.version"

    # Batching, timeout, and retry configuration
    sending_queue:
      batch:
        flush_timeout: 30s
        max_size: 50000
        min_size: 50000
      enabled: true
      queue_size: 500000
      sizer: items
    timeout: 30s

# Pipeline configuration
service:
  pipelines:
    logs:
      receivers:
        - otlp
      exporters:
        - enhance_indexing_s3_exporter
    traces:
      receivers:
        - otlp
      exporters:
        - enhance_indexing_s3_exporter

Local Development with MinIO 

For local development and testing, you can use MinIO as an S3-compatible object storage service:

exporters:
  enhance_indexing_s3_exporter:
  
    # Honeycomb API credentials (required even for local development)
    api_key: ${env:HONEYCOMB_MANAGEMENT_API_KEY}
    api_secret: ${env:HONEYCOMB_MANAGEMENT_API_SECRET}
    api_endpoint: https://api.honeycomb.io

    # MinIO configuration
    s3uploader:
      region: 'us-east-1'
      endpoint: 'http://localhost:9000'
      s3_bucket: 'telemetry-bucket'
      s3_force_path_style: true
      disable_ssl: true
      s3_partition_format: "year=%Y/month=%m/day=%d/hour=%H/minute=%M"
      compression: 'gzip'

    # Data format
    marshaler: 'otlp_json'

    # Custom indexed fields
    indexed_fields:
      - "user.id"
      - "customer.id"