Use Board Templates | Honeycomb

Use Board Templates

Note
This feature is available for teams using the Environments and Services data model. Honeycomb Classic users must migrate first to use this feature.

Get insights into your system instantly with Board templates. The out-of-the-box boards provide you with custom queries and visualizations on your data with minimal clicking.

Access Templates 

To access Board Templates:

  1. Select Boards in the left navigation bar.
  2. In the Boards display, select the Templates tab at the top of the page.
  1. Select Home in the left navigation bar. The Home display appears.
  2. In the Home area display, select the Explore Templates button in the top right of the page.

Use Templates 

To create a board from a template:

  1. In the Templates tab, select a template. The next screen shows a preview of visualizations using your data.
  2. Screen the board to determine if you want to create a board from those queries. A progress bar indicates the number of available queries that can be populated from your data. Select the Review Setup link or view the Setup tab to learn more about unavailable queries and see tips on how to interpret template queries.
    Tip
    To troubleshoot missing queries, add any required fields related to the board’s queries, as found in our available templates or in the Setup tab of the template.
  3. In the upper right corner, select Use Template. The created board contains the visible queries in the template and does not include any missing queries at its creation time.

Available Templates 

Refinery 

For teams using Refinery to sample their data, the Refinery board template gives an overview of sampling operations.

Tip
Refinery emits metrics that give indications about its health as well as its trace throughput and sampling statistic. The required fields in the Refinery board template correspond to these Refinery metric fields, and populate automatically when sent to Honeycomb. Read more about these fields in Refinery Configuration.

The Refinery board template includes the following queries:

Query Name Query Description Fields Required
Stress Relief Status Current stress level of the Refinery cluster. stress_level
stress_relief_activated
hostname or host.name
Dropped From Stress Total traces dropped due to stress on the Refinery cluster. dropped_from_stress
hostname or host.name
Stress Relief Log Reasons why Refinery is going into stress relief. StressRelief
reason
msg
hostname or host.name
Cache Health Metrics for cache health. collect_cache_buffer_overrun
memory_inuse
collect_cache_entries_max or collect_cache_entries.max
collect_cache_capacity
num_goroutines
process_uptime_seconds
hostname or host.name
Cache Ejections Number of traces ejected from cache. trace_send_ejected_full
trace_send_ejected_memsize
hostname or host.name
Intercommunications Total events from outside Refinery and events redirected from a peer. incoming_router_span
peer_router_batch
hostname or host.name
Receive Buffers Receive buffers operations. incoming_router_dropped
peer_router_dropped
hostname or host.name
Peer Send Buffers Metrics for the queue used to buffer spans to send to peer nodes. libhoney_peer_queue_overflow
libhoney_peer_send_errors
hostname or host.name
Upstream Send Buffers Metrics for the queue used to buffer spans to send to Honeycomb. libhoney_upstream_queue_length
libhoney_upstream_enqueue_errors
libhoney_upstream_response_errors
libhoney_upstream_send_errors
libhoney_upstream_send_retries
hostname or host.name
EMADynamicSampler Performance EMADynamicSampler sampling effectiveness. emadynamic_sample_rate_avg
emadynamic_keyspace_size
emadynamic_num_kept
emadynamic_num_dropped
EMAThroughputSampler Performance EMAThroughputSampler sampling effectiveness. emathroughput_sample_rate_avg
emathroughput_keyspace_size
emathroughput_num_kept
emathroughput_num_dropped
WindowedThroughput Performance WindowedThroughput sampling effectiveness. windowedthroughput_sample_rate_avg
windowedthroughput_keyspace_size
windowedthroughput_num_kept
windowedthroughput_num_dropped
TotalThroughputSampler Performance TotalThroughputSampler sampling effectiveness. totalthroughput_sample_rate_avg
etotalthroughput_keyspace_size
totalthroughput_num_kept
totalthroughput_num_dropped
Dynamic Performance Dynamic sampling effectiveness. dynamic_sample_rate_avg
dynamic_keyspace_size
dynamic_num_kept
dynamic_num_dropped
RulesBasedSampler Performance RulesBasedSampler sampling effectiveness. rulesbased_sample_rate_avg
rulesbased_num_kept
rulesbased_num_dropped
Trace Indicators Total traces sent before completion and span received for a trace already sent. trace_sent_cache_hit
trace_send_no_root
Sampling Decisions Total traces accepted and sent or dropped. trace_accepted
trace_send_dropped
trace_send_kept
Refinery Send Event Error Logs Errors when sending events to its peers or upstream to our API server. msg
dataset
api_host
error
Refinery Handler Event Error Logs Errors when receiving or parsing events being sent to a node. msg
dataset
api_host
error.err
error.msg
Refinery Events Exceeding Max Size Errors when events are too large to be sent to Honeycomb. msg
dataset
api_host
error

Service Health 

The Service Health board template gives an overview of the health of your services. It provides insights into request volumes, where the slowest requests are occurring, and more.

Tip
The required fields in the Service Health board template are derived from Dataset Definitions. Send the corresponding fields to automatically populate field definitions or manually configure another field.

The Service Health board template includes the following queries:

Query Name Query Description Fields Required
Trace Counts by Service View total trace volume by service. Parent Span Id,
Service Name
Trace Counts by HTTP Status Code View total trace volume by status code. Parent Span Id,
HTTP Status Code
Trace Duration Heatmap Heatmap of the duration for all traces in the environment. Duration,
Parent Span Id
Duration Heatmap View the heatmap for duration across all services. Duration
Duration by Service View key duration percentiles by service. Duration,
Service Name
Duration by Route View duration by route or endpoint. Duration,
Route
Duration by Name View duration by function name. Duration,
Name
Errors by Service View errors by service. Error,
Service Name
Errors by Route View errors by route or endpoint. Error,
Route

Real User Monitoring (RUM) 

The RUM board template gives an overview of real user monitoring information in your frontend applications.

Tip
The required fields in the RUM board template are derived from Dataset Definitions. Send the corresponding fields to automatically populate field definitions or manually configure another field. Learn more about instrumenting your frontend application.

The RUM board template includes the following queries:

Query Name Query Description Fields Required
Total Pageviews Total number of pageviews Name
Most Visited Pages Pages with most views Name,
Route
Pages with most interactions Pages with the most clicks Name,
Route
Average and p95 Duration of Page Loads Average and 95th percentile duration for page loads Duration,
Name
Largest Page Resources View and investigate heatmaps for the largest page resources Name,
Duration,
Route

Kubernetes 

Tip
Use the Kubernetes Quick Start to instrument the required fields for all Kubernetes board templates.

Kubernetes Pod Metrics 

The Kubernetes Pod Metrics board template includes queries to help you investigate pod performance and resource usage within Kubernetes clusters.

Query Name Query Description Fields Required
Pod CPU Usage The amount of CPU used by each pod in the cluster. CPU is reported as the average core usage measured in cpu units. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers, and 1 hyper-thread on bare-metal Intel processors. k8s.pod.cpu.utilization
k8s.pod.name
Pod Memory Usage The amount of memory being used by each Kubernetes pod. k8s.pod.memory.usage
k8s.pod.name
Pod Uptime Smokestacks As pod uptime ever-increases, this query uses the smokestack method, which applies a LOG10 to the Pod Uptime metric, and newly started or restarted pods appear more significantly than pods that have been running a long time, which move into a straight line eventually. LOG10($k8s.pod.uptime)
k8s.pod.name
k8s.pod.uptime
Unhealthy Pods This query shows trouble that pods may be experiencing during their operating lifecycle. Many of these events are present during start-up and get resolved so the presence of a count isn’t necessarily bad. k8s.namespace.name
k8s.pod.name
reason
Pod CPU Utilization vs. Limit When a CPU Limit is present in a pod configuration, this query shows how much CPU that each pod uses as a percentage against that limit. k8s.pod.cpu_limit_utilization
k8s.pod.name
Pod CPU Utilization vs. Request When a CPU Request is present in a pod configuration, this query shows how much CPU that each pod uses as a percentage against that request value. k8s.pod.cpu_request_utilization
k8s.pod.name
Pod Memory Utilization vs. Limit When a Memory Limit is present in a pod configuration, this query shows how much memory that each pod uses as a percentage against that limit value. k8s.pod.memory_limit_utilization
k8s.pod.name
Pod Memory Utilization vs. Request When a Memory Request is present in a pod configuration, this query shows how much memory that each pod uses as a percentage against that request value. k8s.pod.memory_request_utilization
k8s.pod.name
Pod Network IO Rates Displays Network IO RATE_MAX for Transmit and Receive network traffic as a stacked graph, and gives the overall network rate and the individual rate for each node. k8s.pod.name
k8s.pod.network.io.receive
k8s.pod.network.io.transmit
Pods With Low Filesystem Availability Shows any pods where filesystem availability is below 5 GB. k8s.pod.filesystem.available
k8s.pod.name
Pod Filesystem Usage Shows the amount of filesystem usage per Kubernetes pod, displayed in a stack graph to show total filesystem usage of all pods. k8s.pod.filesystem.usage
k8s.pod.name
Pods Per Namespace Shows the number of pods currently running in each Kubernetes namespace. k8s.namespace.name
k8s.pod.name
Pods Per Node Shows the number of pods currently running in each Kubernetes Node. k8s.node.name
k8s.pod.name
Pod Network Errors Shows network errors in receive and transmit, grouped by pod. k8s.pod.name
k8s.pod.network.errors.receive
k8s.pod.network.errors.transmit
Pods Per Deployment The number of pods currently deployed in different Kubernetes deployments. k8s.deployment.name
k8s.pod.name

Kubernetes Node Metrics 

The Kubernetes Node Metrics board template includes queries to help you investigate node performance and resource usage within Kubernetes clusters.

Query Name Query Description Fields Required
Node CPU Usage The amount of CPU used on each node in the cluster. CPU is reported as the average core usage measured in cpu units. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers, and 1 hyper-thread on bare-metal Intel processors. k8s.node.cpu.utilization
k8s.node.name
Node Memory Utilization Shows percent of memory used on each Kubernetes node. IF(EXISTS($k8s.node.memory.available), MUL(DIV($k8s.node.memory.working_set, $k8s.node.memory.available), 100))
k8s.node.memory.available
k8s.node.memory.usage
k8s.node.name
Node Network IO Rates Displays Network IO RATE_MAX for Transmit and Receive network traffic as a stacked graph, and gives overall network rate and the individual rate for each node. k8s.node.name
k8s.node.network.io.receive
k8s.node.network.io.transmit
Unhealthy Nodes This query shows errors that Kubernetes nodes are experiencing. k8s.namespace.name
k8s.node.name
reason
severity_text
Node Filesystem Utilization Shows percent of filesystem used on each node. IF(EXISTS($k8s.node.filesystem.usage),MUL(DIV($k8s.node.filesystem.usage,$k8s.node.filesystem.capacity), 100))
k8s.node.filesystem.capacity
k8s.node.filesystem.usage
k8s.node.name
Node Uptime Smokestack As node uptime ever-increases, this query uses the smokestack method, which applies a LOG10 to the Node Uptime metric, and newly started or restarted nodes appear more significantly than nodes that have been running a long time, which move into a straight line eventually. LOG10($k8s.node.uptime)
k8s.node.name
k8s.node.uptime
Node Network Errors Shows network transmit and receive errors for each node. k8s.node.name
k8s.node.network.errors.receive
k8s.node.network.errors.transmit
Pods and Containers per Node Shows the number of pods and the number of containers per node as stacked graphs, and also shows total number of pods and containers across the environment. k8s.container.name
k8s.node.name
k8s.pod.name

Kubernetes Workload Health 

The Kubernetes Workload Health board template includes queries that help you investigate Kubernetes-related application problems.

Query Name Query Description Fields Required
Container Restarts Shows the total number of restarts per pod, and the rate of restarts of pods where the restart count is greater than zero. k8s.container.name
k8s.container.restarts
k8s.namespace.name
k8s.pod.name
Unhealthy Pods This query shows trouble that pods may be experiencing during their operating lifecycle. Many of these events are present during start-up and get resolved so the presence of a count isn’t necessarily bad. k8s.namespace.name
k8s.pod.name
reason
Pending Pods Find pods in a “Pending” state. k8s.pod.name
k8s.pod.phase
Failed Pods Find pods in a “Failed” or “Unknown” state. k8s.pod.name
k8s.pod.phase
Unhealthy Nodes This query shows errors that Kubernetes nodes are experiencing. k8s.namespace.name
reason
k8s.pod.name
reason
severity_text
Unhealthy Volumes This query shows volume creation and attachment failures. k8s.namespace.name
k8s.pod.name
reason
severity_text
Unscheduled Daemonset Pods Track cases where a pod in a daemonset is not currently running on every node in the cluster as it should be. SUB($k8s.daemonset.desired_scheduled_nodes, $k8s.daemonset.current_scheduled_nodes)
k8s.daemonset.current_scheduled_nodes
k8s.daemonset.desired_scheduled_nodes
k8s.daemonset.name
k8s.namespace.name
Stateful Set Pod Readiness Track any stateful sets where pods are in an non-ready state that should be in a ready state. SUB($k8s.statefulset.desired_pods,$k8s.statefulset.ready_pods)
k8s.statefulset.desired_pods
k8s.statefulset.name
k8s.statefulset.ready_pods
Deployment Pod Status Look for Deployments where Pods have not fully deployed. Numbers greater than zero show pods in a deployment that are not yet “ready”. SUB($k8s.deployment.desired,$k8s.deployment.available)
k8s.deployment.available
k8s.deployment.desired
k8s.deployment.name
Job Failures Track the number of failed pods in Kubernetes jobs. k8s.job.failed_pods
k8s.job.name
Active Cron Jobs Track the number of active pods in each Kubernetes cron job. k8s.cronjob.active_jobs
k8s.cronjob.name

OpenTelemetry Collector Operations 

The OpenTelemetry Collector Operations board template includes queries with useful metrics that are emitted by the OpenTelemetry Collector during its operation.

Query Name Query Description Fields Required
Exporter Span Failures Shows when errors happen during enqueueing or sending in exporters. net.host.name,
otelcol_exporter_enqueue_failed_spans,
otelcol_exporter_send_failed_spans
Collector Uptime Smokestacks Shows the uptime for different pods with a Log10 to make it clearer where restarts are happening. LOG10($otelcol_process_uptime),
net.host.name,
otelcol_process_uptime
Exporter Metric Send Failures Shows when errors happen during sending from exporters. net.host.name,
otelcol_exporter_enqueue_failed_metric_points,
otelcol_exporter_send_failed_metric_points
Exporter Metrics Enqueue Failures Shows when errors happen during enqueueing in exporters. net.host.name,
otelcol_exporter_send_failed_metric_points
Exporter Log Records Failures Shows when errors happen during enqueueing or sending in exporters. net.host.name,
otelcol_exporter_enqueue_failed_log_records

Learn more about the OpenTelemetry Collector.

OpenTelemetry Java Metrics 

The OpenTelemetry Java Metrics board template includes queries that help to investigate application problems related to Java Virtual Machine (JVM). Metrics for Java applications come from the JVM, and are reported by the OpenTelemetry Java Agent or Honeycomb OpenTelemetry Distribution for Java.

Query Name Query Description Fields Required
JVM Memory Usage (Young Generation) Eden space on the JVM heap is where newly created objects are stored. When it fills, a minor Garbage Collection (GC) occurs, moving all “live” objects to the Survivor space. In addition to current memory usage, committed represents the guaranteed available memory, and limit represents maximum usable. host.name,
pool,
process.runtime.jvm.memory.committed,
process.runtime.jvm.memory.limit,
process.runtime.jvm.memory.usage,
process.runtime.jvm.memory.usage_after_last_gc,
service.name,
type
JVM Memory Usage (Old Generation) Tenured Gen JVM heap space stores long-lived objects. When a Full or Major GC is performed, it is expensive and may pause app execution. Committed represents guaranteed available memory, and limit represents maximum usable memory. host.name,
pool,
process.runtime.jvm.memory.committed,
process.runtime.jvm.memory.limit,
process.runtime.jvm.memory.usage,
process.runtime.jvm.memory.usage_after_last_gc,
service.name,
type
JVM GC (Garbage Collection) Activity JVM GC actions occur periodically to reclaim memory but consume CPU cycles to do so. In the worst cases, a GC can cause the entire JVM to pause, making the application appear unresponsive. process.runtime.jvm.gc.duration.count,
action,
gc,
host.name,
process.runtime.jvm.gc.duration.avg,
process.runtime.jvm.gc.duration.max,
service.name
JVM CPU Utilization Shows system CPU utilization and 1-minute load average, as captured by the JVM. host.name,
process.runtime.jvm.cpu.utilization,
process.runtime.jvm.system.cpu.load_1m,
service.name
JVM Buffer Memory Usage Buffer memory is provided by the OS and is outside the JVM’s heap memory allocation. It is used by Java NIO to quickly write data to network or disk. host.name,
process.runtime.jvm.buffer.limit,
process.runtime.jvm.buffer.usage,
service.name
JVM Non-Heap Memory Usage JVM non-heap memory is allocated above and beyond the heap size you’ve configured. It is a section of memory in the JVM that stores class information (Metaspace), compiled code cache, thread stack, and so on. It cannot be garbage collected. host.name,
pool,
process.runtime.jvm.memory.committed,
process.runtime.jvm.memory.limit,
process.runtime.jvm.memory.usage,
service.name,
type

Troubleshooting 

Missing Queries 

Visualizations in board templates depend on specific fields being available in your data. Add additional insights and eliminate missing queries by configuring or sending the required fields. Sending the necessary data populates the templates with more queries. For more information, refer to our available templates and their required fields.