> ## Documentation Index
> Fetch the complete documentation index at: https://docs.honeycomb.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Collector Export to Amazon S3 Archive

> Configure an OpenTelemetry Collector to export unsampled trace and log data to Amazon S3 with field-based indexing for fast, cost-effective rehydration in Honeycomb.

<Badge className="hny-badge-enterprise-addon" stroke>Ent+</Badge>

<Note>
  This feature is available as an add-on for the [Honeycomb Enterprise plan](https://www.honeycomb.io/pricing/).
  Please contact your Honeycomb account team for details.
</Note>

Honeycomb supports rehydrating archived, unsampled OpenTelemetry (OTel) trace and log data stored in an Amazon S3 bucket.

In this guide, you will learn how to configure an OpenTelemetry Collector to export unsampled trace and log data to an Amazon S3 bucket using Honeycomb's [Enhance Indexing S3 Exporter](https://github.com/honeycombio/enhance-indexing-s3-exporter/tree/main/enhanceindexings3exporter/README.md).
This exporter extends the standard [OpenTelemetry AWS S3 Exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/awss3exporter/README.md) by adding field-based indexing, so you can rehydrate data more quickly and cost-effectively.

<Info>
  Adding AWS Credentials requires assistance from your Honeycomb account team.
  To learn more, visit [Set Up an Archive](/send-data/telemetry-pipeline/enhance/#set-up-an-archive).
</Info>

## How Indexing Works

When you send data to your S3 archive, the exporter automatically indexes these fields:

* `trace.trace_id`: Unique identifier of the trace
* `service.name`: Name of the service
* `session.id`: Unique identifier of the session

You can also index additional fields that you frequently query, such as `user.id`, `customer.id`, or `environment`.

When you run a query that requires rehydration, you can use these indexes to locate and retrieve only the relevant subset of your archived data.
This gives you:

* **Faster rehydration**: Retrieve only the events that match your query.
* **Lower costs**: Reduce the amount of archived data you need to ingest.
* **Better performance**: Investigate with a smaller, more targeted dataset.

## Configuring the Enhance Indexing S3 Exporter

The [Honeycomb OpenTelemetry Collector distribution](https://github.com/honeycombio/honeycomb-collector-distro) includes the [Enhance Indexing S3 Exporter](https://github.com/honeycombio/enhance-indexing-s3-exporter/tree/main/enhanceindexings3exporter/README.md) by default. If you're using another collector distribution or building your own, make sure it is built with the Enhance Indexing S3 Exporter.

The exporter requires configuration in these areas.

### Honeycomb API Credentials

The exporter uses a Honeycomb Management API key with the `enhance:write` scope for authentication and usage tracking.

Configure these fields in your exporter configuration:

* `api_key`: Your Honeycomb Management API key.
  Must have the `enhance:write` scope.
* `api_secret`: Your Honeycomb Management API secret.
* `api_endpoint`: URL of the Honeycomb API endpoint.
  * US: `https://api.honeycomb.io`
  * EU: `https://api.eu1.honeycomb.io`

<Note>
  To learn how to create a Management API key, visit [Managing API Keys](/configure/teams/manage-api-keys/).
</Note>

### S3 Uploader Settings

Set these attributes in the `s3uploader` section:

* `s3_bucket`: Amazon S3 bucket in which to store the data.
  Not required when `endpoint` is set.
* `region`: AWS region for your bucket.
  Default: `us-east-1`.
* `s3_prefix`: (Optional) Directory-like prefix for organizing files in your S3 bucket.
  Example: `traces-logs-directory`.
* `s3_partition_format`: (Optional) Partition format to use when writing files to S3.
  Default: `"year=%Y/month=%m/day=%d/hour=%H/minute=%M"` (minute-level resolution).
* `compression`: Compression algorithm for S3 files.
  Options:
  * `gzip`
  * `none`
* `retry_mode`: (Optional) Retry behavior for failed requests.
  Default: `standard`.
  Options:
  * `standard` (fixed intervals)
  * `adaptive` (adjusts based on server response)
* `retry_max_attempts`: (Optional) Maximum number of times to retry a failed attempt.
  Default: `3`.
* `endpoint`: (Optional) Custom S3 endpoint for S3-compatible services, such as MinIO or on-premises object storage.
  Overrides the default AWS S3 endpoint.
  Example: `http://localhost:9000`.
* `s3_force_path_style`: (Optional) Use path-style rather than virtual-hosted style URLs.
  Required for MinIO and some S3-compatible services.
  Default: `false`.
* `disable_ssl`: (Optional) Allow unencrypted HTTP connections.
  Use only for local development.
  Default: `false`.

<Tip>
  The `file_prefix` attribute is not supported and will cause validation to fail.
</Tip>

### Data Format Settings

Choose how telemetry is encoded before being written to S3:

* `marshaler`: Format for encoding telemetry data.
  Default: `otlp_proto`.
  Options:
  * `otlp_proto` (smaller files, better performance)
  * `otlp_json` (human-readable output).

### Batching and Queue Settings

Batch data before sending to S3 by configuring the `sending_queue` section:

* `flush_timeout`: Maximum time before a batch is sent to S3, even if not full.
  Must be a non-zero value.
* `min_size`: Minimum size of a batch. Measured in units defined by `sizer`.
  Default: `50000`
* `max_size`: Maximum size of a batch.
  Enables batch splitting.
  Must be greater than or equal to `min_size`.
  Measured in units defined by `sizer`.
  Default: `50000`
* `queue_size`: Maximum number the queue can accept.
  Measured in units defined by `sizer`.
  Default: `500000`.
* `sizer`: Unit used to measure the queue and batch size.
  Default: `items`.
  Options:
  * `requests`: Number of incoming batches of traces and logs (the most performant option).
  * `items`: Number of the smallest parts of each signal (spans, log records).
  * `bytes`: Size of serialized data in bytes (the least performant option).

You can also set a custom timeout for the exporter:

* `timeout`: Maximum time to wait for each S3 send attempt.
  Default: `30s`.

### Retry Settings

Control how the exporter retries failed send attempts by configuring the `retry_on_failure` section:

* `enabled`: Allow the exporter to retry failed sends.
* `initial_interval`: Amount of time to delay before the first retry.
  Example: `5s`
* `max_interval`: Maximum amount of time to delay. Retries use exponential backoff, so each retry waits longer than the previous one.
  Example: `30s`

### Custom Indexed Fields

Add indexed fields beyond the built-in ones (`trace.trace_id`, `service.name`, `session.id`) using the `indexed_fields` section.

Choose fields that are frequently used in your queries or that help narrow down your investigations. High-cardinality fields are especially useful because they make rehydration more selective.
Examples:

* `user.id`
* `customer.id`
* `environment`
* `deployment.version`

<Tip>
  To minimize Collector processing compute and time, we recommend indexing no more than 5 custom fields.
</Tip>

```yaml theme={}
indexed_fields:
  - "user.id"
  - "customer.id"
  - "environment"
```

<Info>
  Share your indexed fields with your Honeycomb account team, along with your S3 bucket information and IAM role ARN.
  This ensures that rehydration is configured correctly.
</Info>

## Examples

These examples show common configurations, so you can choose the setup that best fits your needs.

### Basic Configuration

A basic example of the Enhance Indexing S3 Exporter configuration:

```yaml theme={}
exporters:
  enhance_indexing_s3_exporter:

    # Honeycomb API credentials
    api_key: ${env:HONEYCOMB_MANAGEMENT_API_KEY}
    api_secret: ${env:HONEYCOMB_MANAGEMENT_API_SECRET}
    api_endpoint: https://api.honeycomb.io

    # S3 configuration
    s3uploader:
      region: 'us-east-1'
      s3_bucket: 'my-test-bucket'
      s3_partition_format: "year=%Y/month=%m/day=%d/hour=%H/minute=%M"
      compression: 'gzip'
        
    # Data format
    marshaler: 'otlp_proto'
```

### Pipeline Configuration

Honeycomb supports processing both trace and log OpenTelemetry signal types for data archival and rehydration.
Set up a separate pipeline configuration block for each signal type.

In this example, pipelines for logs and traces are labeled to indicate pipelines intended for object storage (storage in Amazon S3):

```yaml theme={}
service:
  pipelines:
    traces:
      // [...]
    logs:
      // [...]
    logs/objectstorage:
      exporters:
        - enhance_indexing_s3_exporter
      receivers:
        - otlp
    traces/objectstorage:
      exporters:
        - enhance_indexing_s3_exporter
      receivers:
        - otlp
```

### Full Configuration

The example below shows a simple, but complete OpenTelemetry (OTel) Collector configuration for exporting both log and trace OTel signal types through the Enhance Indexing S3 Exporter:

* Trace and log telemetry is exported to an Amazon S3 bucket called `my-test-bucket`.
* An S3 prefix is used at the root of the bucket called `telemetry-data`.
* Data will be partitioned into subdirectories underneath the prefix with minute resolution.
* Files uploaded to S3 will be compressed with gzip compression.
* Custom indexed fields are configured for `user.id`, `customer.id`, `environment`, and `deployment_version`.
* Queue batching is configured.

```yaml theme={}
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  enhance_indexing_s3_exporter:

    # Honeycomb API credentials
    api_key: ${env:HONEYCOMB_MANAGEMENT_API_KEY}
    api_secret: ${env:HONEYCOMB_MANAGEMENT_API_SECRET}
    api_endpoint: https://api.honeycomb.io

    #S3 configuration
    s3uploader:
      region: 'us-east-1'
      s3_bucket: 'my-test-bucket'
      s3_prefix: 'telemetry-data'
      s3_partition_format: "year=%Y/month=%m/day=%d/hour=%H/minute=%M"
      compression: 'gzip'
      retry_mode: 'adaptive'
      retry_max_attempts: 5

    # Data format
    marshaler: 'otlp_proto'

    # Custom indexed fields
    indexed_fields:
      - "user.id"
      - "customer.id"
      - "environment"
      - "deployment.version"

    # Batching, timeout, and retry configuration
    sending_queue:
      batch:
        flush_timeout: 30s
        max_size: 50000
        min_size: 50000
      enabled: true
      queue_size: 500000
      sizer: items
    timeout: 30s

# Pipeline configuration
service:
  pipelines:
    logs:
      receivers:
        - otlp
      exporters:
        - enhance_indexing_s3_exporter
    traces:
      receivers:
        - otlp
      exporters:
        - enhance_indexing_s3_exporter
```

### Local Development with MinIO

For local development and testing, you can use MinIO as an S3-compatible object storage service:

```yaml theme={}
exporters:
  enhance_indexing_s3_exporter:
  
    # Honeycomb API credentials (required even for local development)
    api_key: ${env:HONEYCOMB_MANAGEMENT_API_KEY}
    api_secret: ${env:HONEYCOMB_MANAGEMENT_API_SECRET}
    api_endpoint: https://api.honeycomb.io

    # MinIO configuration
    s3uploader:
      region: 'us-east-1'
      endpoint: 'http://localhost:9000'
      s3_bucket: 'telemetry-bucket'
      s3_force_path_style: true
      disable_ssl: true
      s3_partition_format: "year=%Y/month=%m/day=%d/hour=%H/minute=%M"
      compression: 'gzip'

    # Data format
    marshaler: 'otlp_json'

    # Custom indexed fields
    indexed_fields:
      - "user.id"
      - "customer.id"
```
