If you are running a user-facing software service, it probably qualifies as a distributed service. You might have a proxy, an application and a database, or a more complicated microservice architecture. Regardless of the level of complexity, a distributed system means that multiple distinct services must work together in concert.
Tracing helps tie together instrumentation from separate services, or from different methods within one service. This makes it easier to identify the source of errors, find performance problems, or understand how data flows through a large system.
A trace tells the story of a complete unit of work in your system.
For example, when a user loads a web page, their request might go to an edge proxy. That proxy talks to a frontend service, which calls out to an authorization and a rate-limiting service. There could be multiple backend services, each with its own data store. Finally, the frontend service returns a result to the client.
Each part of this story is told by a span. A span is a single piece of instrumentation from a single location in your code. It represents a single unit of work done by a service. Each tracing event, one per span, contains several key pieces of data:
OpenTelemetry automatically defines these fields. You can manually configure which fields on your events correspond to these pieces of data.
A trace is made up of multiple spans. Honeycomb uses the metadata from each span to reconstruct the relationships between them and generate a trace diagram.
The image below is a portion of a trace diagram for an incoming API request:
In this example, the /api/v2/tickets/export
endpoint first checks if the request is allowed by the rate limiter.
Then, it authenticates the requesting user, and finally, fetches the tickets requested.
Each of those calls also called a datastore.
In the trace diagram, you can see the order in which these operations were executed, which service called which other service, and how long each call took.
Each span is one event to Honeycomb.
That event has fields, like parentID
and traceID
, which describe that span’s relationship to other spans.
A trace ties together a unit of work that occurs across multiple events (or spans) in a distributed service.
A trace is a group of spans that all share the same traceID
.
When using OpenTelemetry, service.name
in each span defines the current service context and creates Service Datasets.