Trace datasets are exactly the same as any other dataset in Honeycomb. We treat each span as an individual event. Therefore, you can group, filter and aggregate by any field on the span.
This model has a subtlety: A query on an aggregate function, like
SUM, will work across all spans in a dataset. This is almost never the expected behavior: the
COUNT across all events counts all spans of all traces; a
P95(duration_ms) would produce the 95th percentile across all spans, including trace roots.
Instead, consider using a
WHERE filter to limit to a specific type of span. For example, to limit only to trace roots, use
trace.parentId does-not-exist. Conversely, to limit only to spans that are http calls, you might use
To view traces associated with your query, select the “Traces” tab. This will show you the slowest traces that match the query, calculated by root span duration.
The “Traces” tab will show a maximum of 10 traces, ordered from slowest to fastest. The span summary shows the distribution of spans.
If you want to see specific traces that correspond to a feature on the graph, such as a latency spike, you can select a region on the graph, and the list of trace summaries will update:
Click on the trace button, or one of the
trace.traceId links to see the full waterfall diagram for that
You can get into traces in several different ways.
Any view that can show a
trace.traceId will show a button that connects into the trace. This includes the Raw Data view or the Results view from a query. Adding a
trace.traceId to a query will find traces.
In addition, clicking on either the line graph or the heatmap will jump directly to a trace corresponds to that point. Note that this will pick one trace that sits at that time. To find a specific trace, the Traces tab, Raw Data tab, or the Query Results will be more effective.
Note that, given a specific
TraceId, it is also possible to construct a URL that links directly to the waterfall for that trace.
The first row is the root span, the parent of the entire trace. Below it are each of the spans in this trace. The bars on the right display how much time each operation took to execute. This type of graph is also called a waterfall diagram, as it shows how the order of operations cascade across the total execution time.
To see more detail about an individual span, click on a row to view that span’s raw data in the right sidebar:
To display additional columns, open the column selection menu by clicking the plus icon in the header. Check the boxes for the columns you’d like to include. To hide a column, click on the column name and select “Hide column” from the menu. To collapse or expand a portion of the trace, click on the arrow on the left side of a span row.
When a trace contains many spans, it can be useful to collapse ones that aren’t of interest. Clicking on the span parent’s root box collapses a span’s children. To show they are collapsed, the number changes from an outlined box to a filled box, and the number changes from the number of direct children to the total size of the subtree.
In this screenshot, the first
FetchPartial is not collapsed, but the second one is, showing that it has 7 children. Below, both instances of
merging are collapsed; one has 64 spans below it, while the other has 44 spans.
In a particularly large trace, it can be convenient to collapse all spans that share a name, or that are at the same level of indent. The expand/collapse menu, accessed from the triple-dots to the left of the trace, controls this.
You can show and hide additional columns with the trace field selector. By default, the tracing UI shows the
trace.service_name. Changing this can highlight errors or help scan for interesting columns.
By default, rows are colored by the service name if there is more than one service name defined; otherwise, they are colored by
trace.name. Changing this can help color the waterfall to emphasize different attributes. Each distinct value gets its own color.
It can be useful to search across the entirety of a trace. For example, you might know that a full trace has a span with an error field somewhere within it. The “Search” box, located immediately above the trace, searches both span names and values for a given string. It also highlights any matching spans in the span sidebar.
The span sidebar also has a filter; this will filter the sidebar (only) to show only those fields that match the filter. This setting remains as you click between spans; it can be useful to jump between a set of spans to track a particular field.
When collapsing the tree, it can be difficult to see the extent of the child spans of a parent. For example, an async process might have a parent that returns in a few milliseconds – but it starts a database query that runs for several minutes. Honeycomb draws a thin line to the right of the parent span to show the duration of the child process. For example, in this figure, the span with
head.startQuery runs 2.533 ms – but it has children that run much longer. (The three subprocesses called
FetchPartial all run over 2.5 seconds).
Clock skew, temporal anomalies, or time travel can cause a child to start before the parent span. Honeycomb renders these with similar lines to the left of the main body.
Sometimes, a trace has some spans that claim to have a
trace.parentId – but no span claims to have that
spanId. Honeycomb renders a missing span in that slot: a placeholder that shows the span with all the children that point to that parent. Honeycomb also expects one span to have no parent; that is the root span. If there is no root span, then this is understood as a missing root; in that case, the root span is drawn as missing.
The minigraph, at the top of the trace view, shows a heatmap view of the selected span relative to others with the same
trace.systemName. Below, the trace shows that this is a long instance of
run_trigger_summary, but not an unusually long one. The X axis runs from several minutes before this trace to several minutes afterward.
Clicking on the minimap re-enters the query builder with the query that corresponds to the minigraph.
Any value, on any span in the trace, can be an entry point back to the query builder. The options in the drop down, including adding to
GROUP BY, will create a new query that re-uses the original query but adds these fields.
Note that the new query may not return results. If the fields you choose are only found on leaves, but the original query has
trace.parentId does-not-exist, then the new query will be self-contradictory, and will not return any results. Remove any unneeded clauses to get a full result.
Here is an example of investigating a problem using tracing:
A few users are reporting slow API calls. The initial investigation shows one endpoint is periodically slow. One user is making a lot of calls, and that is slowing down responses for everyone.
Selecting a representative trace shows the full execution of the API request, helping us understand why these API requests are slow.
The first part seems normal, but
ticketBackend is making more calls than
mysql. Clicking on one of the
mysql spans shows more information
about it in the sidebar, including the specific database query.
The information from this trace points to a possible next step. Is the user running a script gone wrong? Or maybe this endpoint needs tighter rate limiting. Traces show a level of detail that can be hard to see otherwise.
Tracing can help shine a light into background/batch jobs as well! Check out this blog post for some examples.