Configuring the Honeycomb Kubernetes Agent | Honeycomb

We use cookies or similar technologies to personalize your online experience & tailor marketing to you. Many of our product features require cookies to function properly.

Read our privacy policy I accept cookies from this site

Configuring the Honeycomb Kubernetes Agent

Applications in Kubernetes tend to use different logging formats. In our opinion, your own applications should use a structured, self-describing log format such as JSON. But Kubernetes system components use the glog format, reverse proxies and ingress controllers may use a combined log format, and so on.

You might also want to aggregate events only from specific services, rather than from everything that might be running in a cluster. Or you might want to send logs from different services to different datasets.

To accommodate these real-world use cases, you can customize the Honeycomb Kubernetes Agent’s behavior with a YAML configuration file. Ordinarily, you’ll create this file as a Kubernetes ConfigMap that will be mounted inside the agent container.

How the Honeycomb Kubernetes Agent works  🔗

First, a bit of background. The Honeycomb Kubernetes Agent runs as a DaemonSet. That is, one copy of the agent runs on each node in the cluster.

Honeycomb Kubernetes Agent as DaemonSet

Metrics for resources running on the node, are collected by communicating with the local node’s kubelet directly.

Logs from containers' stdout and stderr are written by the Docker daemon to the local node filesystem. The Honeycomb Kubernetes Agent reads these logs, augments them with metadata from the Kubernetes API, and ships them to Honeycomb so that you can see what’s going on.

Agent inside node

The Honeycomb Kubernetes Agent’s configuration file describes which pods' logs to process, and how to handle them, as well as which resource types to collect metrics from.

Metrics collection configuration  🔗

Metrics collection can be configured using the metrics configuration property.

metrics:
  enabled: true
  dataset: kubernetes-metrics
  clusterName: MyCluster
  metricGroups:
    - node
    - pod  

A clusterName should be specified for each cluster. By default, only node and pod metrics will be collected.

The following table describes all properties for metrics configuration

key required? type description
enabled yes bool Enables metrics to be collected and sent to Honeycomb. default: false
dataset yes string Name of dataset to sent events to. default: kubernetes-metrics
clusterName yes string Name of Kubernetes cluster. Will be emitted as a field to Honeycomb. default: k8s-cluster
interval no string Collection interval in time duration formation. default: 10s
metricGroups no list Resource groups to collected metrics from. Valid values are: node, pod, container, volume. default: node, pod
omitLabels no list Labels in this list will not be collected and sent as fields to Honeycomb. default: nil
additionalFields no map A map of fields name and values to apply to each metric event. default: nil

Example metrics configuration  🔗

This configuration will collect metrics from the node, pods, containers, and volumes, every 10 seconds. The auto generated controler-revision-hash label will be omitted, and additional fields for region and az will be added.

metrics:
  enabled: true
  dataset: kubernetes-metrics
  clusterName: k8s-cluster
  interval: 10s
  metricGroups:
    - node
    - pod
    - container
    - volume
  omitLabels:
    - controller-revision-hash
  additionalFields:
    region: us-east
    az: us-east-1a 

Log Watchers configuration  🔗

Logs parsing and collection can be configured using the watchers configuration property.

watchers:
- labelSelector: "app=nginx"
  parser: nginx
  dataset: kubernetes-nginx

- labelSelector: "app=frontend"
  parser: json
  dataset: kubernetes-frontend

Each block in the watchers list describes a set of pods whose logs you want to handle in a specific way, and has the following keys:

key required? type description
labelSelector yes* string A Kubernetes label selector identifying the set of pods to watch.
parser yes string Describes how this watcher should parse events.
dataset yes string The dataset that this watcher should send events to.
containerName no string If you only want to consume logs from one container in a multi-container pod, the name of the container to watch.
processors no list A list of processors to apply to events after they’re parsed
namespace no string The Kubernetes namespace the pods are located in. If not supplied, default namespace is used.

Validating a configuration file  🔗

To check a configuration file without needing to deploy it into the cluster, you can run the Honeycomb Kubernetes Agent container locally with the --validate flag:

docker run -v /FULL/PATH/TO/YOUR/config.yaml:/etc/honeycomb/config.yaml honeycombio/honeycomb-kubernetes-agent:head --validate

Uploading a configuration file to a cluster  🔗

To make a configuration file visible to the Honeycomb Kubernetes Agent inside a Kubernetes cluster, you’ll need to create a Kubernetes ConfigMap from it.

To create a brand-new ConfigMap from a local file config.yaml, run:

kubectl create configmap honeycomb-agent-config --from-file=config.yaml

To replace an existing ConfigMap, you can run:

kubectl create configmap honeycomb-agent-config \
    --from-file=config.yaml --output=yaml \
    --dry-run | kubectl replace --filename=-

Then restart running agent pods:

kubectl delete pod --selector k8s-app=honeycomb-agent

Parsers  🔗

Currently, the following parsers are supported:

nop  🔗

Does no parsing on logs, and submits an event with the entire contents of the log line in a "log" field, plus some metadata. Use this if you just want the “raw” log line, or if your log line structure doesn’t match one of the parsers below. You can still query datasets with raw log lines to some degree using string filters and derived columns, but structuring your logs is strongly encouraged.

json  🔗

JSON is a great format for structured logs. With the JSON parser, we map JSON key/value pairs to event fields.

nginx  🔗

Parses NGINX access logs.

If you’re using a custom NGINX log format, you can specify the format using the following configuration:

watchers:
- labelSelector: "io.kompose.service=nginx"
  parser:
    name: nginx
    dataset: nginx-inner
    options:
      log_format: '$remote_addr - $remote_user [$time_local] $host "$request" $status $bytes_sent $body_bytes_sent $request_time "$http_referer" "$http_user_agent" $request_length "$http_authorization" "$http_x_forwarded_proto" "$http_x_forwarded_for" $server_name'

Note: This uses the enhanced additional log fields from our using NGINX with Honeytail guide. You may need to modify the log format in the watcher YAML, or your NGINX config file / ConfigMap, to match.

glog  🔗

Parses logs produced by glog, which look like this:

I0719 23:09:54.422170       1 kube.go:118] Node controller sync successful

This format is commonly used by Kubernetes system components, such as the API server.

redis  🔗

Parses logs produced by redis 3.0+, which look like this:

1:M 08 Aug 22:59:58.739 * Background saving started by pid 43

Thanks to MacRae Linton for contributing the Redis parser.

keyval  🔗

Parses logs in key=value format.

Key-value formatted logs often have a special prefix, such as a timestamp. For example, Kubernetes audit logs are formatted as:

2017-08-25T17:54:56.783361454Z AUDIT: ip="172.20.67.135" method="PUT" ...`

You can specify a regular expression to parse that prefix using the following configuration:

watchers:
- labelSelector: "com.myco.logging.keyvalformat=true"
  parser:
    name: keyval
    options:
      prefixRegex: "(?P<timestamp>[0-9:\\-\\.TZ]+) AUDIT: "

More parsers will be added in the future. If you’d like to see support for additional log formats, please open an issue or email support@honeycomb.io!

Processors  🔗

Processors transform events after they’re parsed. Currently, the following processors are supported:

additional_fields  🔗

The additional_fields processor accepts a static map of field names and values and appends those to every event it processes. These values will overwrite existing fields of the same name, if they exist.

For example, with the following configuration::

processors:
  - additional_fields:
      environment: production
      owner: me@example.com

the fields environment and owner will be added to the event.

sample  🔗

The sample processor will only send a subset of events to Honeycomb. Honeycomb natively supports sampled event streams, allowing you to send a representative subset of events while still getting high-fidelity query results.

Options:

key type description
type "static" or "dynamic" How events should be sampled.
rate integer The rate at which to sample events. Specifying a sample rate of 20 will cause one in 20 events to be sent.
keys list of strings The list of field keys to use when doing dynamic sampling.

drop_field  🔗

The drop_field processor will remove the specified field from all events before sending them to Honeycomb. This is useful for removing sensitive information from events.

Options:

key value description
field string The name of the field to drop.

request_shape  🔗

The request_shape processor will take a field representing an HTTP request, such as GET /api/v1/users?id=22 HTTP/1.1, and unpack it into its constituent parts.

Options:

key value description
field string The name of the field containing the HTTP request (e.g., "request")
patterns list of strings A list of URL patterns to match when unpacking the request
queryKeys list of strings A whitelist of keys in the URL query string to unpack
prefix string A prefix to prepend to the unpacked field names

For example, with the following configuration:

processors:
- request_shape:
    field: request
    patterns:
    - /api/:version/:resource
    queryKeys:
    - id

the request_shape processor will expand the event

{"request": "GET /api/v1/users?id=22 HTTP/1.1", ...}

into

{
    "request": "GET /api/v1/users?id=22 HTTP/1.1",
    "request_method": "GET",
    "request_protocol_version": "HTTP/1.1",
    "request_uri": "/api/v1/users?id=22",
    "request_path": "/api/v1/users",
    "request_query": "id=22",
    "request_shape": "/api/:version/:resource?id=?",
    "request_path_version": "v1",
    "request_path_resource": "users",
    "request_pathshape": "/api/:version/:resource",
    "request_queryshape": "id=?",
    "request_query_id": "22",
    ...
}

timefield  🔗

The timefield processor will replace the default timestamp in an event with one extracted from a specific field in the event.

Options:

key value description
field string The name of the field containing the timestamp
format string The format of the timestamp found in timefield, in strftime or Golang format

Note: This processor isn’t generally necessary when collecting pod logs. The Honeycomb Kubernetes Agent will automatically use the timestamp recorded by the Docker json-log driver. It’s useful when parsing logs that live at a particular path on the node filesystem, such as Kubernetes audit logs.

Sample configurations  🔗

Here are some example Kubernetes Honeycomb Agent configurations.

Parse logs from pods labelled with app: nginx:

---
writekey: "YOUR_API_KEY"
watchers:
  - labelSelector: app=nginx
    parser: nginx
    dataset: nginx-kubernetes

    processors:
    - request_shape:
        field: request

Send logs from different services to different datasets:

---
writekey: "YOUR_API_KEY"
watchers:
  - labelSelector: "app=nginx"
    parser: nginx
    dataset: nginx-kubernetes

  - labelSelector: "app=frontend-web"
    parser: json
    dataset: frontend

Sample events from a frontend-web deployment: only send one in 20 events from the prod namespace, and one in 10 events from the staging namespace.

---
writekey: "YOUR_API_KEY"
watchers:
  - labelSelector: "app=frontend-web"
    namespace: prod
    parser: json
    dataset: frontend

    processors:
      - sample:
          type: static
          rate: 20
      - drop_field:
        field: user_email

  - labelSelector: "app=frontend-web"
    namespace: staging
    parser: json
    dataset: frontend

    processors:
      - sample:
          type: static
          rate: 10

Get logs from a multi-container pod, but only from the sidecar container:

---
writekey: "YOUR_API_KEY"
watchers:
  - labelSelector: "app=frontend-web"
    containerName: sidecar
    parser: json
    dataset: frontend

Getting help  🔗

Have questions? Encountering difficulties? Missing features? Drop us a line at support@honeycomb.io!