Best Practices for Querying using Relational Fields

Query Warnings 

Understanding the “Ignoring long traces” warning 

When you run a query with relational fields, Honeycomb uses a finite join buffer to bring together each trace’s relevant spans. If a trace’s spans are too far apart in your data stream (meaning, they were ingested by honeycomb at very different times), we may retire that trace before reading all of its constituent spans, resulting in this warning. There are three common reasons you might see this:

  • For high-volume environments, the buffer is typically around 3-10 minutes, so traces longer than this are at risk. You may be able to mitigate this by including more filters (especially on service name) in the query.
  • Problems with your ingestion pipeline which delay sending parts of a trace can cause this problem. You can identify this issue by looking for divergence between the timestamps of your spans and their recorded ingestion time. See the Derived Column Formula Reference for how to access this field.
  • Some customers use a single trace id for a very long-running or background operation, resulting in a very large “giga-trace”. (We have observed single traces hundreds of millions of spans across multiple days). These are impossible to join successfully, but may not be relevant to your query. You can identify this issue by looking for traces with extremely large numbers of spans (COUNT GROUP BY trace.trace_id), although in high-volume environments you’ll want to scope that to as narrow a time range as possible.

Query Performance 

To make your queries run faster, we recommend that you follow certain best practices when querying using relational fields.

Use more filters 

Tip
Using more filters will give the greatest performance increase. Wherever possible, use filters!

To get the greatest performance increase:

  • add filters that use field names both with and without relational field prefixes, even if some of those filters are duplicated.
  • if you add a filter with a relational field prefix to the GROUP BY clause, add a filter that uses the same relational field prefix to the WHERE clause. For example, if you are grouping by parent.name, also add a filter with a parent. relational field prefix into the WHERE clause.
  • make sure any of the additional filters you use exclude a meaningful number of events.

If all else fails, identify a field that applies to all spans (for example, service.name) and include it in the WHERE clause.

Following these recommendations will be particularly helpful for more expensive prefixes like anyX. (any., any2., any3.) and parent..

Root > anyX > parent 

Queries involving the root. prefix generally run faster than queries involving the anyX. (any., any2., any3.) prefix, which generally run faster than queries involving the parent. prefix.

When you use the root. prefix, you get a free, implied is_root filter in the WHERE clause, which will usually filter out a substantial number of spans.

When you use anyX. (any., any2., any3.), Honeycomb chooses the first span that matches your criteria, which also filters out a substantial number of spans.

When you use parent., Honeycomb must find the parent span for every event that matches the criteria defined by your non-prefixed fields, which can take some time. To improve performance, you could add a parent.name to the WHERE clause.

Use shorter time ranges 

Use shorter time ranges for queries, including queries using relational fields.

While Honeycomb can do an impressive amount of parallel processing of infrequently accessed data, we can do only so much within a given time frame. Be prepared for queries with long time ranges to take a while.

Use traces with fewer spans 

Smaller traces means fewer events to hold in memory at once and less work for Honeycomb.

Use similar services in a single trace 

Honeycomb uses the ingest time to determine what fits into a “window” of events that we keep in memory at a time. If you have a significant ingest delay for a specific service, relational fields queries that rely on joining that service’s data with other services might suffer.

For example, if you use a mix of AWS Lambda and non-Lambda services in a single trace, your ingest delay will likely vary significantly. AWS freezes the execution environment before spans can be flushed, which increases the ingest delay from AWS as opposed to other services.

Note
This only matters if the amount of ingest delay varies by service/span. If all of your services have roughly the same amount of ingest delay (for example, all consistently two to three minutes late), then your queries should not be affected.