- How do resource limits compare to container resource use?
- How does application performance vary with container resource limits?
- Are application errors happening on specific nodes or across the fleet?
Before You Begin
Before beginning this guide, you should have:- Created a running Kubernetes cluster.
- Deployed an application to Kubernetes.
- Completed the Kubernetes Quick Start.
Explore Your Data
In Honeycomb, you can slice and dice your data by Kubernetes attributes from your Home view, or create Boards to save relevant queries and visualizations. Either way, you will want to leverage Honeycomb’s features to create Triggers and investigate outliers using BubbleUp and Correlations.Slice and Dice Your Data
Once you have Kubernetes data in Honeycomb, navigate to the Home view and select your Kubernetes dataset to begin exploring your data. At a minimum, you will see event data, which you can group by various Kubernetes attributes. If you have instrumented your code, you will also see trace data.
Create a Board
For quick reference over time, you should create a Board that you can customize to show Kubernetes-specific items of interest. When creating your Board, we recommend that you use one of our customized Board Templates for Kubernetes data, which will get you started with queries and visualizations of particular interest to Kubernetes users. You can locate Board Templates by selecting Explore Templates from the Home view. Our Kubernetes Board Templates include:- Kubernetes Pod Metrics
- Kubernetes Node Metrics
- Kubernetes Workload Health
Kubernetes Pod Metrics: Queries and visualizations that help you investigate pod performance and resource usage within Kubernetes clusters.
For example, you could use the Kubernetes Pod Metrics Board Template to determine if a pod uses too many resources.
Queries include:
| Query Name | Query Description | Required Fields |
|---|---|---|
| Pod CPU Usage | Shows the amount of CPU used by each pod in the cluster. CPU is reported as the average core usage measured in cpu units. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers, and 1 hyper-thread on bare-metal Intel processors. |
|
| Pod Memory Usage | Shows the amount of memory being used by each Kubernetes pod. |
|
| Pod Uptime Smokestacks | As pod uptime ever-increases, this query uses the smokestack method, which applies a LOG10 to the Pod Uptime metric, and newly started or restarted pods appear more significantly than pods that have been running a long time, which move into a straight line eventually. |
|
| Unhealthy Pods | Shows trouble that pods may be experiencing during their operating lifecycle. Many of these events are present during start-up and get resolved so the presence of a count isn’t necessarily bad. |
|
| Pod CPU Utilization vs. Limit | When a CPU Limit is present in a pod configuration, this query shows how much CPU that each pod uses as a percentage against that limit. |
|
| Pod CPU Utilization vs. Request | When a CPU Request is present in a pod configuration, this query shows how much CPU that each pod uses as a percentage against that request value. |
|
| Pod Memory Utilization vs. Limit | When a Memory Limit is present in a pod configuration, this query shows how much memory that each pod uses as a percentage against that limit value. |
|
| Pod Memory Utilization vs. Request | When a Memory Request is present in a pod configuration, this query shows how much memory that each pod uses as a percentage against that request value. |
|
| Pod Network IO Rates | Displays Network IO RATE_MAX for Transmit and Receive network traffic (in bytes) as a stacked graph, and gives the overall network rate and the individual rate for each node. |
|
| Pods With Low Filesystem Availability | Shows any pods where filesystem availability is below 5 GB. |
|
| Pod Filesystem Usage | Shows the amount of filesystem usage per Kubernetes pod, displayed in a stack graph to show total filesystem usage of all pods. |
|
| Pods Per Namespace | Shows the number of pods currently running in each Kubernetes namespace. |
|
| Pods Per Node | Shows the number of pods currently running in each Kubernetes Node. |
|
| Pod Network Errors | Shows network errors in receive and transmit, grouped by pod. |
|
| Pods Per Deployment | Shows the number of pods currently deployed in different Kubernetes deployments. |
|
Create Triggers
Once you have created a Board, you will likely want to configure some Triggers, so you can receive notifications when your data in Honeycomb crosses defined thresholds. Some examples of triggers that may be of interest to Kubernetes users include:- CPU Use: Pods or nodes that are reaching set CPU limits
- Memory Use: Pods experiencing OOMKilled or nodes that are reaching a certain memory usage limit
- Unhealthy pods: Pods that are experiencing a problematic status in Kubernetes Events, such as a reason of
BackOff,Failed,Err, orUnhealthy
- On your Board, locate a Query from which you want to create a Trigger.
- Select the query to open it in the Query Builder display.
- Select the three-dot overflow menu, located to the left of Run Query, and select Make Trigger.
- Configure the trigger by defining trigger details, an alert threshold, and your notification preferences.
Investigate Outliers
Follow our guided Kubernetes sandbox scenario to see how you can simplify debugging by using the core analysis loop—Honeycomb BubbleUp, Correlations, and rich queryable data—to link application behavior to underlying infrastructure.Examples
Let’s look at some examples to learn how to use Honeycomb to investigate some common issues.Trace an Application Issue to Infrastructure
The OpenTelemetry Kubernetes Attributes Processor adds Kubernetes context to your telemetry, allowing for correlation with an application’s traces, metrics, and logs. With this data now on our spans, let’s investigate some slow traces and identify the cause of them.Find Slow Traces
To find slow traces:-
In Query Builder, enter VISUALIZE
HEATMAP(duration_ms). - Select Run Query. This creates a heatmap below the Query Builder. The slowest traces appear towards the top of the heatmap.
- In the heatmap, select a slow request towards the top of the chart. In this example, it appears as a teal square with a high duration.
-
In the menu that appears, select View trace.


Identify the Cause
To investigate further, return to the previous query and use BubbleUp:- In the top left of the Trace Waterfall view, select the back arrow next to “Query in all datasets”. The previous Query Results page with the heatmap appears.
- In the heatmap, draw a box around the slow trace data to define the selection. A menu appears.
- Select BubbleUp Outliers. The BubbleUp charts appear below the heatmap. BubbleUp creates charts to show differences between the selected slow requests and all other requests returned for the time window.
k8s.pod.name) looks to be a significant outlier and that audit job (audit_id) 130720 is failing.


Find Correlations
Now, let’s see if there are any correlations between our previously identified application issue and our infrastructure. Within the query results, select the Correlations tab below the heatmap. The dropdown windows allows us to use a pre-existing Board with saved queries to correlate data with our Query Results. In our example, our Correlations board is Kubernetes Pod Metrics, which is available as a Board Template for your own use. The Correlations results show two indicators: Spikes in pod memory, and CPU consumption.
130720, which leads to the application issues we originally noticed.
Pinpoint Users Affected by an Infrastructure Issue
You can use the Kubernetes Workload Health Board to monitor and investigate infrastructure issues, and in conjunction with Query Builder, identify affected users. (Kubernetes Workload Health Board is available as a Board Template, which you can use and apply to your data.) In our example, the Unhealthy Nodes Query in our Kubernetes Workload Health Board is showing unhealthy nodes appearing off and on between the 23rd and 27th of October.
-
Run a query with:
VISUALIZE WHERE GROUP BY HEATMAP(duration_ms) user.id exists user.id - Use the time picker to adjust the time window to the Last 7 days.
- Select Run Query.

- Select the Correlations tab below the heatmap.
- Select the dropdown window that displays the selected data source.
- Choose the Kubernetes Workload Health board from the available options.

-
Select the Overview tab to the left of the Correlations tab.
The query being used includes GROUP BY
user.id, which allows you to group results by that field and see them listed in the Overview tab’s summary table. Hovering over eachuser.id’s row adjusts the above heatmap display of slow traces, so another correlation between a specific affected user and slow traces can be determined. -
Try adding additional fields to the existing query, such as user information like
user.emailto the GROUP BY clause. The goal is to give more readable and potentially actionable information about the affected users.