We use cookies or similar technologies to personalize your online experience and tailor marketing to you. Many of our product features require cookies to function properly. Your use of this site and online product constitutes your consent to these personalization technologies. Read our Privacy Policy to find out more.

X

Beeline for Python

The Python Beeline for Honeycomb is a quick and easy way to instrument your Python application. It has two powerful features:

Official API reference docs can be found here.

To see an example of the Python Beeline in action, try out the Python-Gatekeeper Example App.

If you’d like to see more options or functionality in the Python Beeline, please file an issue or vote up one already filed! You can also contact us at support@honeycomb.io.

If you prefer more control over your application’s instrumentation, check out our Python SDK.

Requirements

You can find your API key on your Team Settings page. If you don’t have an API key yet, sign up for a Honeycomb trial.

To use the Python Beeline for automatic instrumentation, you need to be using one of the following frameworks:

The Python Beeline does not currently support asyncio-based frameworks, and has limited support for Tornado. If you are using an asynchronous framework and would like to see support added in the Beeline, please file an issue.

Installation

To install the Python Beeline for your application:

pip install honeycomb-beeline

Note: Make sure your version of setuptools is up to date (pip install -U setuptools).

Alternatively, you can add honeycomb-beeline to your project’s requirements.txt.

To initialize the Beeline, include the following at the entry point to your application:

import beeline
beeline.init(
    # Get this via https://ui.honeycomb.io/account after signing up for Honeycomb
    writekey='YOUR_API_KEY',
    # The name of your app is a good choice to start with
    dataset='my-python-app',
    service_name='my-python-app',
    debug=True,
)

Note: Honeycomb API keys have the ability to create and delete data, and should be managed in the same way as your other application secrets. For example you might prefer to configure production API keys via environment variables, rather than checking them into version control.

Using Automatic Instrumentation

Implementation of automatic instrumentation varies between frameworks, but each need just a few lines of code. Each implementation instruments the entrypoint for your application (usually a request) by starting a trace, collecting some commonly used fields, ending the trace, and sending the event data to Honeycomb.

You can build on this instrumentation by adding application-specific context as well as your own trace spans.

The Beeline uses Django’s request/response middleware (>1.10) and database query execution wrapper (>2.0).

In your project’s settings.py file:

MIDDLEWARE = [
    ...
    'beeline.middleware.django.HoneyMiddleware',
    ...
]

(If you’d prefer to not use db instrumentation, use the HoneyMiddlewareHttp class instead.)

Then, in your app’s apps.py file:

from django.apps import AppConfig
import beeline

class MyAppConfig(AppConfig):
    def ready(self):
        beeline.init(
            writekey='YOUR_API_KEY',
            dataset='my-django-app',
            service_name='my-app-name',
            debug=True,
        )

Don’t forget to set your app’s default config in your app’s __init__.py file:

default_app_config = 'myapp.apps.MyAppConfig'

The Beeline makes use of WSGI middleware. If you are using Flask’s SQLAlchemy extension, you can also include our database middleware to get built-in query instrumentation.

Pass your Flask app to HoneyMiddleware.

import beeline
from beeline.middleware.flask import HoneyMiddleware

# see note below if you are using uwsgi/gunicorn pre-fork models (multiple processes)
beeline.init(writekey='YOUR_API_KEY', dataset="my-flask-app", service_name="my-app-name", debug=True)
app = Flask(__name__)
# db_events defaults to True, set to False if not using our db middleware with Flask-SQLAlchemy
HoneyMiddleware(app, db_events=True)

This will instrument HTTP requests. If you are using Flask’s SQLAlchemy extension, it will also instrument DB events. If you are not using Flask SQLAlchemy or prefer not to instrument DB events, set db_events to False.

The beeline instruments AWS Lambda using a decorator function that is applied to each Lambda handler function.

beeline.init(writekey='YOUR_API_KEY', dataset="my-lambda-app", service_name="my-lambda-function", debug=True)
from beeline.middleware.awslambda import beeline_wrapper

# wrapper starts a trace that ends when the function exits
@beeline_wrapper
def my_handler(event, context):
    # my code

    # add some context to this event
    beeline.add_context({"user_id", user_id})

The Tornado integration patches the Tornado Web RequestHandlers with code to instrument HTTP requests and exceptions.

import beeline
import libhoney
from beeline.patch import tornado

beeline.init(
    writekey='YOUR_API_KEY',
    dataset='my-app',
    service_name='my-app',
    # use a tornado coroutine rather than a threadpool to send events
    transmission_impl=libhoney.transmission.TornadoTransmission(),
    debug=True,
)

The beeline makes use of WSGI middleware to instrument HTTP requests.

To use it, add the following code where your Bottle app is initialized:

import beeline
from beeline.middleware.bottle import HoneyWSGIMiddleware

beeline.init(
writekey='YOUR_API_KEY',
dataset='my-app',
service_name='my-app',
debug=True,
)

app = bottle.app()
myapp = HoneyWSGIMiddleware(app)
bottle.run(app=myapp)

Using Traces

A Complete Trace

The Python Beeline has a complete tracing API that you can use on your own or in addition to the default automatic instrumentation. Tracing can be reduced to a few basic steps:

  1. Start a trace - this creates a “root span” which is the top of your trace
  2. Create one or more “child spans”. Each span covers a notable section of your application: a database call, an external API call, a long computation, etc.
  3. Close child spans in reverse order.
  4. Close trace (root span).


Starting and ending a trace

start_trace is the starting point for all tracing. You’ll want to call it at the start of an interesting transaction, operation, or workflow that you want to instrument.

start_trace returns a copy of the root span which should be passed back to finish_trace when you want to complete your trace. Calling finish_trace concludes the trace in the Beeline’s internal state manager and sends its event data to Honeycomb.

def main():
    trace = beeline.start_trace(context={
        "name": "my application",
        "hostname": hostname,
        # other initial context
    })

    # application code

    # add some more context
    beeline.add_context({"user_id", user_id})

    # more application code

    beeline.finish_trace(trace)

    # since our app is shutting down, call close() to flush any remaining
    # events
    beeline.close()

If you are using the middleware for Django, Flask, Lambda, etc, you do not need to explicitly start a trace - it will be done for you.

Starting a span

Tracing tells a story of what transpired inside your transaction or workflow, but to tell that story, you need spans! Spans represent noteworthy parts of your application that you’d like to instrument. Examples include: database queries, external service calls, long computations, batch processing, and more. If it’s a potential bottleneck or point of failure, consider enclosing it in a span.

start_span creates a new span as a child of the active span. When you start a trace, the root span is the active span. As you add and finish spans, this will change. The beeline keeps track of this for you, but it is important that each call to start_span is matched with a call to finish_span in order for the Beeline’s internal state manager to keep the trace’s ordering intact.

You must have an active trace to call start_span or no span will be created.

# trace is started - we have one active span: the root span
trace = beeline.start_trace()

# ...

# something interesting is coming up: a db query!
# create a new span
span = beeline.start_span(context={
    "name": "db query",
    "db_host": db_host,
    "query": sql,
})

# run db query

# add some more context to the current span
beeline.add_context({"db_result_count", result_count})

# close this span. This will automatically add a `duration_ms` field describing
# the length of the span
beeline.finish_span(span)

# more code

beeline.finish_trace(trace)

Using a Context Manager

The Python Beeline includes an optional tracer context manager, which will do the work of starting and closing a span for you.

# This will create a span - or if no trace is in progress, will also
# start a trace
with beeline.tracer(name="external_api_call"):
    # add some context to this span
    beeline.add_context({"request_params": request_params})
    result, error = external_api_request(request_params)
    # add more context once we have results
    beeline.add_context({"result_count": len(result), "error": error})

    # You can have nested tracer context managers. A new child span will be
    # created
    with beeline.tracer(name="process api results"):
        # if an unhandled exception occurs inside a context manager, the event
        # will still get closed, and the exception type and message
        # will get added to the event
        raise Exception("something went wrong!")

# We've exited the context manager, which has done the work of closing the span
# for us

Using a Decorator

You can instrument entire functions using the traced decorator. Wrapping a function with traced decorator will create a span when the function is called, and close it when it returns. If no trace is ongoing, calling the decorated function will start a new trace.

import beeline

@beeline.traced(name='external_api_request')
def external_api_request(request_params):
    # ...

    # adding fields here will add it to the active span wrapping
    # this function
    beeline.add_context_field("response_time", response_time)

    # ...


@traced(name='main')
def main():
  beeline.add_context({"request_params": request_params})
  # calling this function will create a new span, under the "main" span
  result, error = external_api_request(request_params)
  # add more context once we have results
  beeline.add_context({"result_count": len(result), "error": error})

# This will create a span - or if no trace is in progress, will also
# start a trace
main()

# Don't forget to close the beeline to ensure all spans get sent
# before the application ends!
beeline.close()

Distributed Tracing

Distributed Tracing enables you trace and visualize interactions between multiple instrumented services. For example, your users may interact with a front-end API service which talks to two internal APIs to fulfill their request. To get the complete picture of a transaction, you want Distributed Tracing.

Distributed Tracing requires the following:

The Python, Go, Ruby, and NodeJS beelines all support the exchange of trace context between HTTP services using the X-Honeycomb-Trace header, which can be added to outbound requests. Upon receiving a request from an upstream service with this header, trace context can be extracted and a local trace can be started.

Here is an example of the header contents:

1;trace_id=weofijwoeifj,parent_id=owefjoweifj,context=SGVsbG8gV29ybGQ=

trace_id is the upstream trace ID that we want to continue. parent_id is the ID of the originating (parent) span. Context is a base64 encoded JSON string containing key/value pairs that make up the trace fields. These are optional fields that the upstream service wants to propagate to the entire trace.

For more information about tracing, see our tracing docs.

Extracting Trace Context in Django, Flask, and Lambda

The Django, Flask, and Lambda middleware in the Python Beeline already look for an incoming X-Honeycomb-Trace header, and, if found, start a trace using that context information

Extracting Trace Context Manually

If your framework of choice does not have automatic instrumentation in the Beeline, you can still get trace context using the trace API and use it to start your trace manually.

# req is my generic incoming request with a dict of headers
header = req.headers['X-Honeycomb-Trace']

# trace_id and parent_ids are strings
# context is a dict of key/value pairs
# these values will be None if the header is malformed
trace_id, parent_id, context = beeline.trace.unmarshal_trace_context(header)

trace = beeline.start_trace(trace_id=trace_id, parent_span_id=parent_id)

# populate our trace fields using the upstream context object
# these will be added to the current span as well as all future spans
for k, v in context.items():
    beeline.add_trace_field(k, v)

Sending trace context downstream

If you are using the requests library, the beeline can instrument outbound requests for you. Simply import the requests patch:

from beeline.patch import requests
import requests

This will inject the X-Honeycomb-Trace header automatically on all requests.

Be careful when calling third-party services with the X-Honeycomb-Trace header. If you have added sensitive fields with add_trace_field, these will be sent in the header, potentially exposing them.

Sending trace context manually

You might want finer control over when trace context propagates. Simply add the header only on outbound requests to services you want to send it to. Use the marshal_trace_context method to generate the header contents.

import requests

trace_context = beeline.get_beeline().tracer_impl.marshal_trace_context()
requests.post("https://myservice/endpoint", data=mydata, headers={'X-Honeycomb-Trace': trace_context})

Threading and Traces

Due to its use of threadlocal storage to keep track of spans, tracing in the Python Beeline does not currently work smoothly across threads. If you use the tracer context manager inside a thread, it will generate a new trace ID separate from your other traced events. If you call start_span inside a thread, it not work unless you also explictly start a new trace with start_trace inside that thread.

trace = beeline.start_trace()

def thread_func():
    # there's no trace context inside this thread, so the span will not start
    span = beeline.start_span(context={'name': 'thread_func'})

    # ...

    beeline.finish_span(span)

    # tracer will detect that there is no active trace inside this thread, and
    # start a new one
    with beeline.tracer(name='thread_func'):
        # ...

t = threading.Thread(target=thread_func).start()

To work around this, you can explicitly start a new trace inside the thread as a child of the current span.

trace = beeline.start_trace()

def thread_func():
    child_trace = beeline.start_trace(
        context={'name': 'thread_func'},
        trace_id=trace.trace_id,
        # trace.id refers to the root span id. You could also do this with
        # a child span returned by `start_span`
        parent_span_id=trace.id,
    )

    # ...

    beeline.finish_trace(child_trace)


t = threading.Thread(target=thread_func).start()

beeline.finish_trace(trace)

Adding context to a span

Having detailed events is key to understanding your application. Consider keeping as much context as possible. Try putting a timer around a section of code, adding per-user information, or details about what it took to craft a response. You can add fields when and where you need to, or for some events but not others. (Error handlers are a good example of this.)

Use beeline.add_context_field to add fields to the currently active span. For example, if want to add the user ID associated with the current request before the event is sent to Honeycomb:

beeline.add_context_field('user_id', user_id)

The currently active span is determined by the beeline’s internal state manager. If a field is being attributed to the wrong event, make sure that for every call to beeline.start_span() there is a matching call to beeline.finish_span().

Adding context to an entire trace

Sometimes it’s useful for context you capture earlier in a trace to be available in other spans. For example, maybe you have an instrumented function shared by multiple HTTP endpoint handlers, and you’d like to know which endpoint called it. You can do this with a trace field.

# Start a new trace
trace = beeline.start_trace(context={
    "name": "my application",
    "src_addr": src_addr,
    "method": http_method,
    # other initial context
})

# this field will be added to our root span and all child spans going forward
beeline.add_trace_field("endpoint", http_endpoint)

# ...

# here's our shared function, we'll wrap it with `tracer` to start a new span
# The span automatically inherits the field "endpoint" from the trace
# with no extra effort
with beeline.tracer(name="my_shared_function"):
    my_shared_function()

# ...

# don't forget to finish our trace!
beeline.finish_trace(trace)

Creating isolated events

There may be circumstances where you’d like to send events that are not closely related to an ongoing trace. You can access the Beeline’s underlying Libhoney client to create raw events that do not contain trace metadata.

client = beeline.get_beeline().client
event = client.new_event()
ev.add(myfields)
ev.send()

Sampling events

We have built-in support for sampling in the Beeline. Simply set the sample_rate variable when calling beeline.init():

beeline.init(
   writekey='YOUR_API_KEY',
   dataset='my-app',
   service_name='my-app',
   debug=True,
   sample_rate=10,
)

The Python Beeline uses a deterministic sampling algorithm when sampling traces. Since trace IDs are randomly generated, sampling can be applied to the ID rather than randomly selecting individual events. This allows traces to stay intact when sampling is enabled, while still being random. Setting the sample rate to 10 will sample one in 10 traces.

Customizing sampling logic

Our sampler hook gives you more power over how events are sampled. By including a sampler hook, you override the built-in sampling logic driven by the sample_rate variable and replace it with your own.

For example, assume you have instrumented an HTTP app and want a default sampling rate of 1 in 10 events. However, you’d like to keep all errored requests, and heavily sample healthy traffic (200 response codes). Also, you don’t really care about 302 redirects in your app, and want to drop those. You could define a sampler function like so:

import hashlib
import math

MAX_INT32 = math.pow(2, 32) - 1

# Deterministic _should_sample taken from https://github.com/honeycombio/beeline-python/blob/1ffe66ed1779143592edf9227d3171cb805216b6/beeline/trace.py#L258-L267
def _should_sample(trace_id, sample_rate):
  sample_upper_bound = MAX_INT32 / sample_rate
  sha1 = hashlib.sha1()
  sha1.update(trace_id.encode('utf-8'))
  # convert last 4 digits to int
  value, = struct.unpack('<I', sha1.digest()[-4:])
  return value < sample_upper_bound

def sampler(fields):
  # our default sample rate, sample one in every 10 events
  sample_rate = 10

  response_code = fields.get('response.status_code')
  # False indicates that we should not keep this event
  if response_code == 302:
    return False, 0
  elif response_code == 200:
    # heavily sample healthy requests
    sample_rate = 100
  elif response_code >= 500:
    # sample every error request
    sample_rate = 1

  # The truthiness of the first return argument determines whether we keep the
  # event. The second argument, the sample rate, tells Honeycomb what rate the
  # event was sampled at (important to correctly weight calculations on the data).
  trace_id = fields.get('trace.trace_id')
  if _should_sample(trace_id, sample_rate):
    return True, sample_rate
  return False, 0

To apply this new logic, pass this function to beeline.init():

import beeline
beeline.init(writekey='mywritekey', dataset='myapp', debug=True, sampler_hook=sampler)

Note: Defining a sampling hook overrides the deterministic sampling behavior for trace IDs. Unless you take trace.trace_id into account (as we did above in by hashing the trace ID), you will get incomplete traces.

Scrubbing Sensitive Data

Our presend hook enables you to modify data right before it is sent to Honeycomb. For example, if you have a field that sometimes contains PII or other sensitive data, you might want to scrub the field or drop it all together.

def presend(fields):
  # We don't want to log customer IPs that get captured in the beeline
  if 'request.remote_addr' in 'fields':
    del fields['request.remote_addr']

  # this field is useful, but sometimes contains sensitive data.
  # Run a scrubber method against it before sending
  if 'transaction_log_msg' in 'fields':
    fields['transaction_log_msg'] = scrub_msg(fields['transaction_log_msg'])

After defining your presend hook function, pass it to the beeline’s init method:

import beeline
beeline.init(writekey='mywritekey', dataset='myapp', debug=True, presend_hook=presend)

Note: Sampler hooks are executed before presend hooks.

Example events

Below is a sample event from the Python Beeline. This example is an http_server event, generated when your app handles an incoming HTTP request.

{
  "Timestamp": "2018-07-03T04:57:12.517022Z",
  "duration_ms": 619.703,
  "meta.beeline_version": "0.1.1",
  "meta.local_hostname": "hostname123",
  "name": "django_http_get",
  "request.content_length": "",
  "request.host": "localhost:8000",
  "request.method": "GET",
  "request.path": "/",
  "request.post": "{}",
  "request.query": "{}",
  "request.remote_addr": "127.0.0.1",
  "request.scheme": "http",
  "request.secure": false,
  "request.user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
  "request.xhr": false,
  "response.status_code": 200,
  "service_name": "my-app",
  "trace.span_id": "3eada0ce-934b-4ffd-bb72-2c9f57b02bf1",
  "trace.trace_id": "07d2150c-5397-4e8e-aa23-3becd17d7266",
  "type": "http_server"
}

Events are also (optionally) created for DB queries.

{
  "Timestamp": "2018-07-03T04:57:13.119424Z",
  "db.duration": 6.8919999999999995,
  "db.last_insert_id": 0,
  "db.query": "SELECT `blog_post`.`id`, `blog_post`.`author_id`, `blog_post`.`title`, `blog_post`.`text` FROM `blog_post`",
  "db.query_args": "[]",
  "db.rows_affected": 6,
  "duration_ms": 6.986,
  "meta.beeline_version": "0.1.1",
  "meta.local_hostname": "hostname123",
  "name": "django_mysql_query",
  "service_name": "my-app",
  "trace.parent_id": "3eada0ce-934b-4ffd-bb72-2c9f57b02bf1",
  "trace.span_id": "4a5bb9ab-1aab-4772-ab06-4ba31e196bea",
  "trace.trace_id": "07d2150c-5397-4e8e-aa23-3becd17d7266",
  "type": "db"
}

Queries to try

Here are some examples to get you started querying your app’s behavior:

Which of my app’s routes are the slowest?

Where is my app spending the most time?

Which users are using the endpoint that I’d like to deprecate?

This query uses a custom user.email field. To create a custom field, see Adding context, below.

Customizing Event Transmission

By default, events are sent to the Honeycomb API. It’s possible to override the default transmission implementation by specifying transmission_impl to init. A couple of alternative implementations ship with the libhoney SDK.

To override the default transmission and write events out to stderr:

import beeline
import libhoney
from libhoney.transmission import FileTransmission

beeline.init(writekey='yourwritekey', transmission_impl=FileTransmission(output=sys.stderr))

Troubleshooting the Beeline

Not seeing your newly instrumented dataset appearing in Honeycomb? Here are some things to check.

Debug Mode

The Python Beeline supports an optional debug mode. When enabled, additional logging to stderr will indicate when an event is enqueued, when a trace starts and ends, and whether or not the Beeline initialized correctly. To enable verbose mode, set debug=True in the call to beeline.init():

beeline.init(
   writekey='YOUR_API_KEY',
   dataset='my-app',
   service_name='my-app'
   debug=True,
   )

Checking the Responses Queue

You can use the queue returned by beeline.get_responses_queue() to check whether events were successfully received by Honeycomb’s servers.

Each response includes the following fields:

Call close at application shutdown

Honeycomb events are sent in batches. By default, this happens every 100ms. When your application terminates, it’s possible for some events to still be in the queue, unsent. To ensure that all events get flushed before shutdown, call beeline.close() before your application exits. You can register an atexit handler after initializing the Beeline like so:

import beeline
import atexit

beeline.init(...)
atexit.register(beeline.close)

Using the Python Beeline with Python Pre-fork Models

Popular servers like uWSGI and Gunicorn utilize a pre-fork model where requests are delegated to separate Python processes.

Initializing the Python Beeline before the fork happens can lead to a state where events cannot be sent. To initialize the Python Beeline correctly, you will need to run your init code inside a post-fork hook.

Using the Python Beeline with Gevent

The beeline uses the requests lib, which uses urllib3. If you are using gevent, you should call its monkey patching functions before importing the beeline for the first time, or you may encounter a RecursionError when sending events.

import gevent.monkey
gevent.monkey.patch_all()

import beeline

uWSGI

Users of uWSGI can use a postfork decorator. Simply add the @postfork decorator to the function that initializes the Python Beeline, and it will be executed post-fork.

import logging
import os

import beeline
from uwsgidecorators import postfork

@postfork
def init_beeline():
    logging.info(f'beeline initialization in process pid {os.getpid()}')
    beeline.init(writekey="YOUR_API_KEY", dataset="honeycomb-uwsgi-example", debug=True)

Gunicorn

Gunicorn users can define a post_worker_init function in the Gunicorn config, and initialize the Python Beeline there.

 # conf.py
import logging
import os
import beeline

def post_worker_init(worker):
    logging.info(f'beeline initialization in process pid {os.getpid()}')
    beeline.init(writekey="YOUR_API_KEY", dataset="honeycomb-gunicorn-example", debug=True)

Then start gunicorn with the -c option:

gunicorn -c /path/to/conf.py

Celery

Celery uses a pre-fork approach to create worker processes. You can specify a worker_process_init decorated function to initialize the Python Beeline after each worker has started.

import logging
import os
import beeline

from celery.signals import worker_process_init
@worker_process_init.connect
def initialize_honeycomb(**kwargs):
    logging.info(f'beeline initialization in process pid {os.getpid()}')
    beeline.init(writekey="YOUR_API_KEY", dataset="honeycomb-celery-example", debug=True)

Contributions

Features, bug fixes and other changes to Beelines are gladly accepted. Please open issues or a pull request with your change via GitHub. Remember to add your name to the CONTRIBUTORS file!

All contributions will be released under the Apache License 2.0.