Query with Archive


Note
This feature is available as an add-on for the Honeycomb Enterprise plan. Please contact your Honeycomb account team for details.

Query results showing no data or incomplete information? Sampling may have filtered out the events you need, or the data may have expired from your retention period. While sampling and retention limits help control costs, they mean some data isn’t available in Honeycomb. Archive rehydration solves this by retrieving your full, unsampled dataset from S3 storage on-demand, so you can investigate without the gaps.

How It Works 

If you have configured an Amazon S3 bucket as an archive for OpenTelemetry trace and log data, you can rehydrate that data and query it in Honeycomb. This is useful for investigating data that was sampled out or data that has expired from your standard retention period.

Filtering 

Time ranges and indexed fields let you filter and retrieve only the part of your archived data needed for your investigation, resulting in faster rehydration and lower costs.

Indexed fields are attributes configured when you set up your S3 exporter.

When you rehydrate data from your archive, Honeycomb:

  1. Reads your S3 bucket using the IAM role you configured during archive setup.
  2. Filters by time range and indexed fields to retrieve only files that contain events matching your criteria. For example, all files with events between 2024-01-15 10:00 and 2024-01-15 11:00 where app.customer.id=12345.
  3. Ingests the matching files into your S3 Archive environment for querying. Files that were already ingested from previous rehydration requests and are still within your retention period are skipped.

Rehydrated Data Persistence 

Rehydrated data persists for your standard retention period from the time of ingestion. During this time, you can query it as many times as needed without rehydrating again.

When you request a rehydration, Honeycomb checks which data has already been ingested. If some of the requested data already exists in your environment, only the missing data is ingested. Your queries then run against all the rehydrated data in your environment.

Enhancing a Query with Archived Data 

You can enhance an existing query by pulling in relevant archived data that matches your time range and indexed fields.

To enhance a query with archived data from your S3 bucket:

  1. Run a query in the Query Builder and receive your query results.

  2. From the Query Results, select Enhance from Archive.

    Screenshot of Query Results
  3. In the Enhance from Archive modal, define the scope of your rehydration:

    Field Description
    Start time Start of the event time range to rehydrate. Automatically populated with the start time of your query range.
    End time End of the event time range to rehydrate. Automatically populated with the end time of your query range.
    Index Indexed field to filter by. Automatically populated when the field is included in your query. Choose a field with high cardinality for more precise filtering.
    Values Value(s) to filter by. Automatically populated when the field is included in your query. Use specific values to minimize the number of events ingested during rehydration.
  4. Review your usage estimate to confirm:

    • Approximate number of events based on the average size of previously rehydrated events
    • Approximate number of free monthly rehydration events that will be used and free events remaining
    • Approximate regular monthly events that will be used and regular events remaining

    Recalculate as often as needed.

    Tip
    Typically, your free rehydration quota is 20% of your monthly event budget. To find your actual quota, contact your Honeycomb Account representative.
  5. Select Rehydrate data to begin ingesting the archived data that matches your query and chosen rehydration scope. You will be redirected to the History (History menu icon) page in your Amazon S3 Environment.

    When ingestion completes, Honeycomb automatically re-runs your query using the rehydrated data. A notification appears with a link to your query results.

    Screenshot of successful completion of rehydration from archive
  6. Select the link in the notification to view your query with the rehydrated data.

Tip
If you investigate a trace in your rehydrated data and notice missing spans, you may need to rehydrate additional data using the trace ID. To learn more, visit Enhancing Traces with Missing Spans.

Enhancing Traces with Missing Spans 

When you rehydrate data using a filtered index (such as customer.id), your trace waterfall may show gaps where spans are missing. This happens because not every span in a trace contains the indexed field you filtered by. To retrieve the missing spans, rehydrate again using the trace ID:

  1. From the Trace Waterfall view, select Enhance again.

    Note

    To reach the Trace Waterfall:

    From your query results, select a data point on the graph and choose View Trace from the context menu.

    Screenshot of Trace needing Enhance Again
  2. In the modal, review the automatically-populated scope of your rehydration:

    Field Description
    Start time Start of the event time range to rehydrate. Automatically set to two hours before the trace timestamp to capture all related spans.
    End time End of the event time range to rehydrate. Automatically set to two hours after the trace timestamp to capture all related spans.
    Index Indexed field to filter by. Automatically set to your trace ID field to retrieve all spans for this trace.
    Values Value(s) to filter by. Automatically populated with the trace ID to ensure all spans are retrieved.
  3. Review your usage estimate to confirm:

    • Approximate number of events based on the average size of previously rehydrated events
    • Approximate number of free monthly rehydration events that will be used and free events remaining
    • Approximate regular monthly events that will be used and regular events remaining
    Tip
    Typically, your free rehydration quota is 20% of your monthly event budget. To find your actual quota, contact your Honeycomb Account representative.
  4. Select Rehydrate data to ingest the missing spans. You will be redirected to the History (History menu icon) page in your Amazon S3 Environment.

When ingestion completes, Honeycomb automatically re-runs your query using the rehydrated data. A notification appears with a link to your results. Your trace waterfall with all spans is now available for investigation.

Querying Only Archived Data 

You can explore archived data independently from your live telemetry by rehydrating and querying it in your dedicated archive environment.

To query only archived data from your S3 bucket:

  1. Select Manage Data (Manage Data menu icon) from the navigation menu, and choose Environments.

  2. Select your S3 Archive Environment.

  3. Define the scope of your rehydration:

    Field Description
    Start time Start of the event time range to rehydrate.
    End time End of the event time range to rehydrate.
    Index Indexed field to filter by. Choose a field with high cardinality for more precise filtering.
    Values Value(s) to filter by. Use specific values to minimize the number of events ingested during rehydration.
    Screenshot of S3 Archive Home screen
  4. Review your usage estimate and adjust your criteria to optimize cost and event count.

  5. Select Rehydrate data.

  6. After rehydration completes, select New Query to begin querying your ingested archived data.

  7. Honeycomb pre-populates the query with your chosen rehydration scope as filters. Add fields, filters, or visualizations to further refine your query.

Reviewing Rehydration History 

You can review all rehydration requests for your team in your archive environment:

  1. Select Manage Data (Manage Data menu icon) from the navigation menu, and choose Environments.

  2. Select your Amazon S3 Environment.

  3. Select History (History menu icon) from the navigation menu.

    Screenshot of S3 Archive History screen

    For each rehydration request, the history shows:

    • Status: State of the rehydration request. Possible states:
      • Completed: Data was successfully rehydrated and is ready to query.
      • No Data: No archived data matched your filter criteria. Verify that your S3 bucket contains data for the specified time range and index values.
      • Failed: An error occurred during rehydration. Our team has been notified.
    • Initiated date: Date and time at which the rehydration request was initiated.
    • Rehydration time range: Time range filter applied to the rehydration.
    • Index fields: Field-based indexes and values used to filter the rehydrated data.
    • Initiated by: User who initiated the rehydration request.
    • Actions: Available actions for this rehydration request:
      • New Query: View the results of a completed rehydration request. The Query Builder opens, showing a query pre-populated with the rehydration filter criteria.

Best Practices for Rehydration 

  • Use specific indexed fields: Filter by high-cardinality fields like user.id or trace.trace_id rather than low-cardinality fields like environment to reduce the number of events ingested.
  • Start with narrow time ranges: Begin with shorter time windows and expand as needed to control costs.
  • Check the estimate: Always review the estimated event count before rehydrating to avoid unexpectedly large ingestion volumes.
  • Don’t worry about tracking rehydrated data: Honeycomb automatically tracks which files have been ingested, so you can make overlapping rehydration requests without duplicating ingestion or costs. Once data is rehydrated, it remains queryable for your standard retention period; you don’t need to rehydrate the same data multiple times.