Applications in Kubernetes tend to use different logging formats. In our opinion, your own applications should use a structured, self-describing log format such as JSON. But Kubernetes system components use the glog format, reverse proxies and ingress controllers may use a combined log format, and so on.
You might also want to aggregate events only from specific services, rather than from everything that might be running in a cluster. Or you might want to send logs from different services to different datasets.
To accommodate these real-world use cases, you can customize the Honeycomb Kubernetes Agent’s behavior with a YAML configuration file.
Ordinarily, you will create this file as a Kubernetes ConfigMap
that will be mounted inside the agent container.
First, a bit of background. The Honeycomb Kubernetes Agent runs as a DaemonSet. That is, one copy of the agent runs on each node in the cluster.
Metrics for resources running on the node are collected by communicating with the local node’s kubelet directly.
Logs from containers' stdout
and stderr
are written by the Docker daemon to the local node filesystem.
The Honeycomb Kubernetes Agent reads these logs, augments them with metadata from the Kubernetes API, and ships them to Honeycomb so that you can see what is going on.
The Honeycomb Kubernetes Agent’s configuration file describes which pods' logs to process, and how to handle them, as well as which resource types to collect metrics from.
What metadata does the agent add to these logs?
pod.labels
pod.name
pod.namespace
pod.resourceVersion
pod.UID
pod.nodeName
pod.nodeSelector
pod.serviceAccountName
pod.subdomain
pod.annotations
container.args
container.command
container.name
container.env
container.image
container.ports
container.VolumeMounts
container.workingDir
container.resources
The above metadata is added by default.
Node metadata can be added by specifying includeNodeLabels: true
within the agent’s metrics configuration.
Metrics collection can be configured using the metrics
configuration property.
metrics:
enabled: true
dataset: kubernetes-metrics
clusterName: MyCluster
metricGroups:
- node
- pod
A clusterName
should be specified for each cluster.
By default, only node and pod metrics will be collected.
The following table describes all properties for metrics configuration:
key | required? | type | description |
---|---|---|---|
enabled |
yes | bool |
Enables metrics to be collected and sent to Honeycomb. default: false |
dataset |
yes | string |
Name of dataset to sent events to. default: kubernetes-metrics |
clusterName |
yes | string |
Name of Kubernetes cluster. Will be emitted as a field to Honeycomb. default: k8s-cluster |
interval |
no | string |
Collection interval in time duration format, which is specified with a duration suffix. Valid time units are ns , us (or µs ), ms , s , m , h . default: 10s |
metricGroups |
no | list |
Resource groups to collected metrics from. Valid values are: node , pod , container , volume . default: node , pod |
omitLabels |
no | list |
Labels in this list will not be collected and sent as fields to Honeycomb. default: nil |
additionalFields |
no | map |
A map of fields name and values to apply to each metric event. default: nil |
includeNodeLabels |
no | bool |
If enabled, attaches node metadata to metric events. Node labels will respect the omitLabels list. |
This configuration will collect metrics from the node, pods, containers, and volumes, every 10 seconds.
The auto generated controller-revision-hash
label will be omitted, and additional fields for region and az will be added.
metrics:
enabled: true
dataset: kubernetes-metrics
clusterName: k8s-cluster
interval: 10s
metricGroups:
- node
- pod
- container
- volume
omitLabels:
- controller-revision-hash
additionalFields:
region: us-east
az: us-east-1a
Logs parsing and collection can be configured using the watchers
configuration property.
watchers:
- labelSelector: "app=nginx"
parser: nginx
dataset: kubernetes-nginx
- labelSelector: "app=frontend"
parser: json
dataset: kubernetes-frontend
Each block in the watchers
list describes a set of pods whose logs you want to handle in a specific way, and has the following keys:
key | required? | type | description |
---|---|---|---|
labelSelector |
yes* | string |
A Kubernetes label selector identifying the set of pods to watch. |
parser |
yes | string |
Describes how this watcher should parse events. |
dataset |
yes | string |
The dataset that this watcher should send events to. |
containerName |
no | string |
If you only want to consume logs from one container in a multi-container pod, the name of the container to watch. |
processors |
no | list |
A list of processors to apply to events after they are parsed |
namespace |
no | string |
The Kubernetes namespace the pods are located in. If not supplied, default namespace is used. |
paths |
no | string array |
Glob-style* paths to the log files. If not supplied, the default Kubernetes log paths and filenames are used. |
exclude |
no | string array |
Glob-style* paths for files to exclude from consideration. If a given file matches an exclude, it will not be watched. If not supplied, no files are excluded. |
“Glob-style” means:
*
matches any sequence of non-path-separators/**/
matches zero or more directories?
matches any single non-path-separator characterTo check a configuration file without needing to deploy it into the cluster, you can run the Honeycomb Kubernetes Agent container locally with the --validate
flag:
docker run -v /FULL/PATH/TO/YOUR/config.yaml:/etc/honeycomb/config.yaml honeycombio/honeycomb-kubernetes-agent:head --validate
To make a configuration file visible to the Honeycomb Kubernetes Agent inside a Kubernetes cluster, you will need to create a Kubernetes ConfigMap
from it.
To create a brand-new ConfigMap
from a local file config.yaml
, run:
kubectl create configmap honeycomb-agent-config --from-file=config.yaml
To replace an existing ConfigMap
, you can run:
kubectl create configmap honeycomb-agent-config \
--from-file=config.yaml --output=yaml \
--dry-run | kubectl replace --filename=-
Then restart running agent pods:
kubectl delete pod --selector k8s-app=honeycomb-agent
Currently, the following parsers are supported:
Does no parsing on logs, and submits an event with the entire contents of the log line in a "log"
field, plus the aforementioned kubernetes metadata.
Use this if you just want the “raw” log line, or if your log line structure does not match one of the parsers below.
You can still query datasets with raw log lines to some degree using string filters and derived columns, but structuring your logs is strongly encouraged.
JSON is a great format for structured logs. With the JSON parser, we map JSON key/value pairs to event fields.
Parses NGINX access logs.
If you are using a custom NGINX log format, you can specify the format using the following configuration:
watchers:
- labelSelector: "io.kompose.service=nginx"
parser:
name: nginx
dataset: nginx-inner
options:
log_format: '$remote_addr - $remote_user [$time_local] $host "$request" $status $bytes_sent $body_bytes_sent $request_time "$http_referer" "$http_user_agent" $request_length "$http_authorization" "$http_x_forwarded_proto" "$http_x_forwarded_for" $server_name'
Note: This uses the enhanced additional log fields from our using NGINX with Honeytail guide. You may need to modify the log format in the watcher YAML, or your NGINX config file / ConfigMap, to match.
Parses logs produced by glog, which look like this:
I0719 23:09:54.422170 1 kube.go:118] Node controller sync successful
This format is commonly used by Kubernetes system components, such as the API server.
Parses logs produced by redis 3.0+, which look like this:
1:M 08 Aug 22:59:58.739 * Background saving started by pid 43
Thanks to MacRae Linton for contributing the Redis parser.
Parses logs in key=value
format, such as:
time=2022-05-15T05:43:19Z msg="server response - time 12ms code: 401 - request: GET /hello ..."
Key-value formatted logs often have a special prefix, such as a log level.
INFO: time=2022-05-15T05:43:19Z msg="server response - time 12ms code: 401 - request: GET /hello ..."
For parsing lines that have additional fields without an equal sign, and to ensure fields are extracted properly into fields in the Honeycomb UI, specify a regular expression to parse that prefix in the configuration.
watchers:
- labelSelector: "com.myco.logging.keyvalformat=true"
parser:
name: keyval
options:
prefixRegex: "(?P<loglevel>[A-Z]+): (?P<timestamp>time=[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}Z )"
If you would like to see support for additional log formats, please open an issue!
Processors transform events after they are parsed. Currently, the following processors are supported:
The additional_fields
processor accepts a static map of field names and values and appends those to every event it processes.
These values will overwrite existing fields of the same name, if they exist.
For example, with the following configuration:
processors:
- additional_fields:
environment: production
owner: me@example.com
The field’s environment and owner will be added to the event.
The sample
processor will only send a subset of events to Honeycomb.
Honeycomb natively supports sampled event streams, allowing you to send a representative subset of events while still getting high-fidelity query results.
Options:
key | type | description |
---|---|---|
type |
"static" or "dynamic" |
How events should be sampled. |
rate |
integer |
The rate at which to sample events. Specifying a sample rate of 20 will cause one in 20 events to be sent. |
keys |
list of string s |
The list of field keys to use when doing dynamic sampling. |
windowsize |
integer |
How often to refresh estimated sample rates during dynamic sampling, in seconds. Default: 30s . |
minEventsPerSec |
integer |
Whenever the number of events per second being processed falls below this value for a time window (see windowSize ), sampling will be disabled for the next time window (all events will be sent with a sample rate of 1). Default: 50 . The minimum possible value is 1 . |
The drop_field
processor will remove the specified field from all events before sending them to Honeycomb.
This is useful for removing sensitive information from events.
Options:
key | value | description |
---|---|---|
field |
string |
The name of the field to drop. |
The request_shape
processor will take a field representing an HTTP request, such as GET /api/v1/users?id=22 HTTP/1.1
, and unpack it into its constituent parts.
Options:
key | value | description |
---|---|---|
field |
string |
The name of the field containing the HTTP request (for example, "request" ) |
patterns |
list of string s |
A list of URL patterns to match when unpacking the request |
queryKeys |
list of string s |
An allowlist of keys in the URL query string to unpack |
prefix |
string |
A prefix to prepend to the unpacked field names |
For example, with the following configuration:
processors:
- request_shape:
field: request
patterns:
- /api/:version/:resource
queryKeys:
- id
the request_shape processor will expand the event
{"request": "GET /api/v1/users?id=22 HTTP/1.1", ...}
into
{
"request": "GET /api/v1/users?id=22 HTTP/1.1",
"request_method": "GET",
"request_protocol_version": "HTTP/1.1",
"request_uri": "/api/v1/users?id=22",
"request_path": "/api/v1/users",
"request_query": "id=22",
"request_shape": "/api/:version/:resource?id=?",
"request_path_version": "v1",
"request_path_resource": "users",
"request_pathshape": "/api/:version/:resource",
"request_queryshape": "id=?",
"request_query_id": "22",
...
}
The timefield
processor will replace the default timestamp in an event with one extracted from a specific field in the event.
Options:
key | value | description |
---|---|---|
field |
string |
The name of the field containing the timestamp |
format |
string |
The format of the timestamp found in timefield, in strftime or Golang format |
Note: This processor is not generally necessary when collecting pod logs. The Honeycomb Kubernetes Agent will automatically use the timestamp recorded by the Docker json-log driver. It is useful when parsing logs that live at a particular path on the node filesystem, such as Kubernetes audit logs.
Here are some example Kubernetes Honeycomb Agent configurations.
Parse logs from pods labelled with app: nginx
:
---
writekey: "YOUR_API_KEY"
watchers:
- labelSelector: app=nginx
parser: nginx
dataset: nginx-kubernetes
processors:
- request_shape:
field: request
Send logs from different services to different datasets:
---
writekey: "YOUR_API_KEY"
watchers:
- labelSelector: "app=nginx"
parser: nginx
dataset: nginx-kubernetes
- labelSelector: "app=frontend-web"
parser: json
dataset: frontend
Sample events from a frontend-web deployment: only send one in 20 events from the prod namespace, and one in 10 events from the staging namespace.
---
writekey: "YOUR_API_KEY"
watchers:
- labelSelector: "app=frontend-web"
namespace: prod
parser: json
dataset: frontend
processors:
- sample:
type: static
rate: 20
- drop_field:
field: user_email
- labelSelector: "app=frontend-web"
namespace: staging
parser: json
dataset: frontend
processors:
- sample:
type: static
rate: 10
Get logs from a multi-container pod, but only from the sidecar
container:
---
writekey: "YOUR_API_KEY"
watchers:
- labelSelector: "app=frontend-web"
containerName: sidecar
parser: json
dataset: frontend
Have questions? Encountering difficulties? Missing features? Join our Pollinators Community Slack to ask questions and learn more.
Did you find what you were looking for?