If you’re running a user-facing software service, it’s probably a distributed system. You might have a proxy, an application and a database, or a more complicated microservice architecture. Regardless of the level of complexity, a distributed system means that multiple distinct services must work together in concert.
Tracing helps tie together instrumentation from separate services, or from different methods within one service. This makes it easier to identify the source of errors, find performance problems, or understand how data flows through a large system.
A trace tells the story of a complete unit of work in your system.
For example, when a user loads a web page, their request might go to an edge proxy. That proxy talks to a frontend service, which calls out to an authorization and a rate-limiting service. There could be multiple backend services, each with its own data store. Finally, the frontend service returns a result to the client.
Each part of this story is told by a span. A span is a single piece of instrumentation from a single location in your code. It represents a single unit of work done by a service. Each tracing event, one per span, contains several key pieces of data:
A trace is made up of multiple spans. Honeycomb uses the metadata from each span to reconstruct the relationships between them and generate a trace diagram.
The image below is a portion of a trace diagram for an incoming API request:
In this example, the
/api/v2/tickets/export endpoint first checks
if the request is allowed by the rate limiter. Then it authenticates the
requesting user, and finally fetches the tickets requested. Each of
those calls also called a datastore.
You can see in the trace diagram the order these operations were executed, which service called which other service, and how long each call took.
There are many ways to create tracing data. The following is a comparison of two popular implementations: OpenCensus and OpenTracing. In the following examples, both were set up to send data to Honeycomb and to Zipkin.
OpenCensus: A vendor-agnostic tool that provides metrics collection and tracing for your services.
OpenTracing is a vendor-neutral standard for distributed tracing data.
Zipkin: A distributed tracing system. For any Zipkin implementation, you need to have a Zipkin server running to receive the data.
All of the tracing instrumentation APIs produce the same trace data. Here is an example of the output in Honeycomb:
Zipkin and Honeycomb have pretty similar waterfall diagrams. Instrumenting them both with OpenCensus was incredibly easy. In fact, switching from one to the other is just a matter of changing the initial configuration. Below, you can see a line-for-line comparison of the differences between the two setups:
We ran the same example app for both of Zipkin and Honeycomb, but with different Docker processes running. The resulting output is the same as with OpenCensus.
Below is a comparison of instrumenting traces with OpenCensus versus OpenTracing:
StartSpan and OpenTracing’s
StartSpanFromContext functions take a context as the first parameter. Under the hood, this looks into the context to see if there is an existing span there. If not, a new span is added to the context. If there is an existing span, a new child span is added, a bit like nesting dolls. In the above example, the root (or parent as it’s sometimes called) span is added in the first line of
readEvaluateProcess, and then child and sibling spans are added in
processLine. Child spans can have their own child spans, creating very deep nests, but in this very simple example the root span has two sibling child spans.
The instrumentation of OpenTracing and OpenCensus is very similar, and either implementation can send data to Honeycomb.
Dive into the documentation below to get started tracing your own services: