Get insights into your system instantly with Board templates. The out-of-the-box boards provide you with custom queries and visualizations on your data with minimal clicking.
To access Board Templates:
To create a board from a template:
When creating a board from a template, customize the template with your own fields at the point of previewing the board template:
Before completing your board template creation, use Copy Link to copy and share this customized board template with other users. Selecting your shared link directs users to the setup page of this template preview with the customized fields populated.
For teams using Refinery to sample their data, the Refinery board template gives an overview of sampling operations.
The Refinery board template includes the following queries:
Query Name | Query Description | Fields Required |
---|---|---|
Stress Relief Status | Current stress level of the Refinery cluster. | stress_level stress_relief_activated hostname or host.name |
Dropped From Stress | Total traces dropped due to stress on the Refinery cluster. | dropped_from_stress hostname or host.name |
Stress Relief Log | Reasons why Refinery is going into stress relief. | StressRelief reason msg hostname or host.name |
Cache Health | Metrics for cache health. | collect_cache_buffer_overrun memory_inuse collect_cache_entries_max or collect_cache_entries.max collect_cache_capacity num_goroutines process_uptime_seconds hostname or host.name |
Cache Ejections | Number of traces ejected from cache. | trace_send_ejected_full trace_send_ejected_memsize hostname or host.name |
Intercommunications | Total events from outside Refinery and events redirected from a peer. | incoming_router_span peer_router_batch hostname or host.name |
Receive Buffers | Receive buffers operations. | incoming_router_dropped peer_router_dropped hostname or host.name |
Peer Send Buffers | Metrics for the queue used to buffer spans to send to peer nodes. | libhoney_peer_queue_overflow libhoney_peer_send_errors hostname or host.name |
Upstream Send Buffers | Metrics for the queue used to buffer spans to send to Honeycomb. | libhoney_upstream_queue_length libhoney_upstream_enqueue_errors libhoney_upstream_response_errors libhoney_upstream_send_errors libhoney_upstream_send_retries hostname or host.name |
EMADynamicSampler Performance | EMADynamicSampler sampling effectiveness. | emadynamic_sample_rate_avg emadynamic_keyspace_size emadynamic_num_kept emadynamic_num_dropped |
EMAThroughputSampler Performance | EMAThroughputSampler sampling effectiveness. | emathroughput_sample_rate_avg emathroughput_keyspace_size emathroughput_num_kept emathroughput_num_dropped |
WindowedThroughput Performance | WindowedThroughput sampling effectiveness. | windowedthroughput_sample_rate_avg windowedthroughput_keyspace_size windowedthroughput_num_kept windowedthroughput_num_dropped |
TotalThroughputSampler Performance | TotalThroughputSampler sampling effectiveness. | totalthroughput_sample_rate_avg etotalthroughput_keyspace_size totalthroughput_num_kept totalthroughput_num_dropped |
Dynamic Performance | Dynamic sampling effectiveness. | dynamic_sample_rate_avg dynamic_keyspace_size dynamic_num_kept dynamic_num_dropped |
RulesBasedSampler Performance | RulesBasedSampler sampling effectiveness. | rulesbased_sample_rate_avg rulesbased_num_kept rulesbased_num_dropped |
Trace Indicators | Total traces sent before completion and span received for a trace already sent. | trace_sent_cache_hit trace_send_no_root |
Sampling Decisions | Total traces accepted and sent or dropped. | trace_accepted trace_send_dropped trace_send_kept |
Refinery Send Event Error Logs | Errors when sending events to its peers or upstream to our API server. | msg dataset api_host error |
Refinery Handler Event Error Logs | Errors when receiving or parsing events being sent to a node. | msg dataset api_host error.err error.msg |
Refinery Events Exceeding Max Size | Errors when events are too large to be sent to Honeycomb. | msg dataset api_host error |
The Service Health board template gives an overview of the health of your services. It provides insights into request volumes, where the slowest requests are occurring, and more.
The Service Health board template includes the following queries:
Query Name | Query Description | Fields Required |
---|---|---|
Trace Counts by Service | View total trace volume by service. | Parent Span Id, Service Name |
Trace Counts by HTTP Status Code | View total trace volume by status code. | Parent Span Id, HTTP Status Code |
Trace Duration Heatmap | Heatmap of the duration for all traces in the environment. | Duration, Parent Span Id |
Duration Heatmap | View the heatmap for duration across all services. | Duration |
Duration by Service | View key duration percentiles by service. | Duration, Service Name |
Duration by Route | View duration by route or endpoint. | Duration, Route |
Duration by Name | View duration by function name. | Duration, Name |
Errors by Service | View errors by service. | Error, Service Name |
Errors by Route | View errors by route or endpoint. | Error, Route |
The RUM board template gives an overview of real user monitoring information in your frontend applications.
The RUM board template includes the following queries:
Query Name | Query Description | Fields Required |
---|---|---|
Largest Contentful Paint (LCP) | Ratings based on the render time for the largest content on a page | lcp.rating name |
Cumulative Layout Shift (CLS) | Ratings based on the stability of content layout on a page | cls.rating name |
Interaction to Next Paint (INP) | Ratings based on the responsiveness of a page | inp.rating name |
Largest Contentful Paint P75 | The 75th percentile for LCP | name lcp.value |
Cumulative Layout Shift P75 | The 75th percentile for CLS | cls.value name |
Interaction to Next Paint P75 | The 75th percentile for INP | inp.value name |
Total Events by Type | Event types ranked by occurrence | name meta.annotation_type |
Largest Resource Requests | The largest resource requests ranked by the average length of their response content | http.response_content_length http.url name |
Top 5 Endpoints by Request Count | Top 5 endpoints ranked by number of requests | http.method name http.url |
Slowest Requests by Endpoint | The slowest endpoints based on the 75th percentile of request durations | http.url duration_ms name |
Top Landing Pages by Session Count | The most visited landing pages ranked by session count | entry_page.path name |
Pages With the Most Events | Pages with the highest number of events, highlighting the most active pages | Route |
The Activity Log Security board template includes queries that show API Key activity.
Query Name | Query Description | Fields Required |
---|---|---|
API Key Added Permissions | Shows when permissions are added to an existing API key. | resource.type resource.changed_fields environment.slug |
API Key Activities by User | Displays the number of changes to API keys broken down by user. | key_type environment.slug user.email resource.action |
Authentication Type by User | Displays which type of authentication is used for each user. | authentication_method user.email |
The Activity Log Leaderboard board template includes queries that show advanced and frequent usage of Honeycomb by your team.
Query Name | Query Description | Fields Required |
---|---|---|
Queries by User | Shows which environments are being queried. | resource.type user.email |
Complex Queries by User | Shows which users frequently use Visualize, Where, and Having clauses. | resource.type SUM( IF(EXISTS($query.having), 3, 0), REG_COUNT($query.where, `,`), REG_COUNT($query.visualize, `,`)) user.email |
Top Query Visualizations | Shows the most commonly used visualizations. | resource.type SUM( IF(EXISTS($query.having), 3, 0), REG_COUNT($query.where, `,`), REG_COUNT($query.visualize, `,`)) query.visualize |
Top Tinkerers | Lists which users perform the most updates to SLOs, Triggers, and Derived Columns. | resource.type user.email |
Queries by Dataset | Shows which datasets are being queried the most. | resource.type environment.slug dataset.slug |
Queries by Environment | Shows a count of run queries as grouped by environment. | resource.type environment.slug |
The Activity Log Trigger and SLO Activity board template includes queries related to trigger and SLO activations and modifications.
Query Name | Query Description | Fields Required |
---|---|---|
Trigger State Changes | Shows instances when triggers have been triggered or resolved. | resource.type resource.action name |
Trigger Modifications | Shows creations, modifications, and deletions of triggers. | resource.type resource.action |
Most Updated Triggers | Shows triggers that received the most changes recently. | resource.type resource.action name |
Top Updated SLOs by Update Type | Shows creations, modifications, and deletions of SLOs and the supporting SLI (Derived Column). | resource.type resource.action environment.slug resource.changed_fields name user.email |
SLOs Created and Deleted | Shows creation and deletion of SLOs. | resource.type resource.action environment.slug name resource.changed_fields user.email |
SLI Expression Changes by SLO | Shows when SLIs (derived columns) related to SLOs have been changed. | resource.type resource.action resource.changed_fields environment.slug name sli.expression before.sli.expression user.email |
The Kubernetes Pod Metrics board template includes queries to help you investigate pod performance and resource usage within Kubernetes clusters.
Query Name | Query Description | Fields Required |
---|---|---|
Pod CPU Usage | The amount of CPU used by each pod in the cluster. CPU is reported as the average core usage measured in cpu units. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers, and 1 hyper-thread on bare-metal Intel processors. | k8s.pod.cpu.utilization k8s.pod.name |
Pod Memory Usage | The amount of memory being used by each Kubernetes pod. | k8s.pod.memory.usage k8s.pod.name |
Pod Uptime Smokestacks | As pod uptime ever-increases, this query uses the smokestack method, which applies a LOG10 to the Pod Uptime metric, and newly started or restarted pods appear more significantly than pods that have been running a long time, which move into a straight line eventually. | LOG10($k8s.pod.uptime) k8s.pod.name k8s.pod.uptime |
Unhealthy Pods | This query shows trouble that pods may be experiencing during their operating lifecycle. Many of these events are present during start-up and get resolved so the presence of a count isn’t necessarily bad. | k8s.namespace.name k8s.pod.name reason |
Pod CPU Utilization vs. Limit | When a CPU Limit is present in a pod configuration, this query shows how much CPU that each pod uses as a percentage against that limit. | k8s.pod.cpu_limit_utilization k8s.pod.name |
Pod CPU Utilization vs. Request | When a CPU Request is present in a pod configuration, this query shows how much CPU that each pod uses as a percentage against that request value. | k8s.pod.cpu_request_utilization k8s.pod.name |
Pod Memory Utilization vs. Limit | When a Memory Limit is present in a pod configuration, this query shows how much memory that each pod uses as a percentage against that limit value. | k8s.pod.memory_limit_utilization k8s.pod.name |
Pod Memory Utilization vs. Request | When a Memory Request is present in a pod configuration, this query shows how much memory that each pod uses as a percentage against that request value. | k8s.pod.memory_request_utilization k8s.pod.name |
Pod Network IO Rates | Displays Network IO RATE_MAX for Transmit and Receive network traffic (in bytes) as a stacked graph, and gives the overall network rate and the individual rate for each node. | k8s.pod.name k8s.pod.network.io.receive k8s.pod.network.io.transmit |
Pods With Low Filesystem Availability | Shows any pods where filesystem availability is below 5 GB. | k8s.pod.filesystem.available k8s.pod.name |
Pod Filesystem Usage | Shows the amount of filesystem usage per Kubernetes pod, displayed in a stack graph to show total filesystem usage of all pods. | k8s.pod.filesystem.usage k8s.pod.name |
Pods Per Namespace | Shows the number of pods currently running in each Kubernetes namespace. | k8s.namespace.name k8s.pod.name |
Pods Per Node | Shows the number of pods currently running in each Kubernetes Node. | k8s.node.name k8s.pod.name |
Pod Network Errors | Shows network errors in receive and transmit, grouped by pod. | k8s.pod.name k8s.pod.network.errors.receive k8s.pod.network.errors.transmit |
Pods Per Deployment | The number of pods currently deployed in different Kubernetes deployments. | k8s.deployment.name k8s.pod.name |
The Kubernetes Node Metrics board template includes queries to help you investigate node performance and resource usage within Kubernetes clusters.
Query Name | Query Description | Fields Required |
---|---|---|
Node CPU Usage | The amount of CPU used on each node in the cluster. CPU is reported as the average core usage measured in cpu units. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers, and 1 hyper-thread on bare-metal Intel processors. | k8s.node.cpu.utilization k8s.node.name |
Node Memory Utilization | Shows percent of memory used on each Kubernetes node. | IF(EXISTS($k8s.node.memory.available), MUL(DIV($k8s.node.memory.working_set, $k8s.node.memory.available), 100)) k8s.node.memory.available k8s.node.memory.usage k8s.node.name |
Node Network IO Rates | Displays Network IO RATE_MAX for Transmit and Receive network traffic as a stacked graph, and gives overall network rate and the individual rate for each node. | k8s.node.name k8s.node.network.io.receive k8s.node.network.io.transmit |
Unhealthy Nodes | This query shows errors that Kubernetes nodes are experiencing. | k8s.namespace.name k8s.node.name reason severity_text |
Node Filesystem Utilization | Shows percent of filesystem used on each node. | IF(EXISTS($k8s.node.filesystem.usage),MUL(DIV($k8s.node.filesystem.usage,$k8s.node.filesystem.capacity), 100)) k8s.node.filesystem.capacity k8s.node.filesystem.usage k8s.node.name |
Node Uptime Smokestack | As node uptime ever-increases, this query uses the smokestack method, which applies a LOG10 to the Node Uptime metric, and newly started or restarted nodes appear more significantly than nodes that have been running a long time, which move into a straight line eventually. | LOG10($k8s.node.uptime) k8s.node.name k8s.node.uptime |
Node Network Errors | Shows network transmit and receive errors for each node. | k8s.node.name k8s.node.network.errors.receive k8s.node.network.errors.transmit |
Pods and Containers per Node | Shows the number of pods and the number of containers per node as stacked graphs, and also shows total number of pods and containers across the environment. | k8s.container.name k8s.node.name k8s.pod.name |
The Kubernetes Workload Health board template includes queries that help you investigate Kubernetes-related application problems.
Query Name | Query Description | Fields Required |
---|---|---|
Container Restarts | Shows the total number of restarts per pod, and the rate of restarts of pods where the restart count is greater than zero. | k8s.container.name k8s.container.restarts k8s.namespace.name k8s.pod.name |
Unhealthy Pods | This query shows trouble that pods may be experiencing during their operating lifecycle. Many of these events are present during start-up and get resolved so the presence of a count isn’t necessarily bad. | k8s.namespace.name k8s.pod.name reason |
Pending Pods | Find pods in a “Pending” state. | k8s.pod.name k8s.pod.phase |
Failed Pods | Find pods in a “Failed” or “Unknown” state. | k8s.pod.name k8s.pod.phase |
Unhealthy Nodes | This query shows errors that Kubernetes nodes are experiencing. | k8s.namespace.name reason k8s.pod.name reason severity_text |
Unhealthy Volumes | This query shows volume creation and attachment failures. | k8s.namespace.name k8s.pod.name reason severity_text |
Unscheduled Daemonset Pods | Track cases where a pod in a daemonset is not currently running on every node in the cluster as it should be. | SUB($k8s.daemonset.desired_scheduled_nodes, $k8s.daemonset.current_scheduled_nodes) k8s.daemonset.current_scheduled_nodes k8s.daemonset.desired_scheduled_nodes k8s.daemonset.name k8s.namespace.name |
Stateful Set Pod Readiness | Track any stateful sets where pods are in an non-ready state that should be in a ready state. | SUB($k8s.statefulset.desired_pods,$k8s.statefulset.ready_pods) k8s.statefulset.desired_pods k8s.statefulset.name k8s.statefulset.ready_pods |
Deployment Pod Status | Look for Deployments where Pods have not fully deployed. Numbers greater than zero show pods in a deployment that are not yet “ready”. | SUB($k8s.deployment.desired,$k8s.deployment.available) k8s.deployment.available k8s.deployment.desired k8s.deployment.name |
Job Failures | Track the number of failed pods in Kubernetes jobs. | k8s.job.failed_pods k8s.job.name |
Active Cron Jobs | Track the number of active pods in each Kubernetes cron job. | k8s.cronjob.active_jobs k8s.cronjob.name |
The OpenTelemetry Collector Operations board template includes queries with useful metrics that are emitted by the OpenTelemetry Collector during its operation.
Query Name | Query Description | Fields Required |
---|---|---|
Exporter Span Failures | Shows when errors happen during enqueueing or sending in exporters. | net.host.name, otelcol_exporter_enqueue_failed_spans, otelcol_exporter_send_failed_spans |
Collector Uptime Smokestacks | Shows the uptime for different pods with a Log10 to make it clearer where restarts are happening. |
LOG10($otelcol_process_uptime), net.host.name, otelcol_process_uptime |
Exporter Metric Send Failures | Shows when errors happen during sending from exporters. | net.host.name, otelcol_exporter_enqueue_failed_metric_points, otelcol_exporter_send_failed_metric_points |
Exporter Metrics Enqueue Failures | Shows when errors happen during enqueueing in exporters. | net.host.name, otelcol_exporter_send_failed_metric_points |
Exporter Log Records Failures | Shows when errors happen during enqueueing or sending in exporters. | net.host.name, otelcol_exporter_enqueue_failed_log_records |
Learn more about the OpenTelemetry Collector.
The OpenTelemetry Java Metrics board template includes queries that help to investigate application problems related to Java Virtual Machine (JVM). Metrics for Java applications come from the JVM, and are reported by the OpenTelemetry Java Agent or Honeycomb OpenTelemetry Distribution for Java.
Query Name | Query Description | Fields Required |
---|---|---|
JVM Memory Usage (Young Generation) | Eden space on the JVM heap is where newly created objects are stored. When it fills, a minor Garbage Collection (GC) occurs, moving all “live” objects to the Survivor space. In addition to current memory usage, committed represents the guaranteed available memory, and limit represents maximum usable. | host.name, pool, process.runtime.jvm.memory.committed, process.runtime.jvm.memory.limit, process.runtime.jvm.memory.usage, process.runtime.jvm.memory.usage_after_last_gc, service.name, type |
JVM Memory Usage (Old Generation) | Tenured Gen JVM heap space stores long-lived objects. When a Full or Major GC is performed, it is expensive and may pause app execution. Committed represents guaranteed available memory, and limit represents maximum usable memory. | host.name, pool, process.runtime.jvm.memory.committed, process.runtime.jvm.memory.limit, process.runtime.jvm.memory.usage, process.runtime.jvm.memory.usage_after_last_gc, service.name, type |
JVM GC (Garbage Collection) Activity | JVM GC actions occur periodically to reclaim memory but consume CPU cycles to do so. In the worst cases, a GC can cause the entire JVM to pause, making the application appear unresponsive. | process.runtime.jvm.gc.duration.count, action, gc, host.name, process.runtime.jvm.gc.duration.avg, process.runtime.jvm.gc.duration.max, service.name |
JVM CPU Utilization | Shows system CPU utilization and 1-minute load average, as captured by the JVM. | host.name, process.runtime.jvm.cpu.utilization, process.runtime.jvm.system.cpu.load_1m, service.name |
JVM Buffer Memory Usage | Buffer memory is provided by the OS and is outside the JVM’s heap memory allocation. It is used by Java NIO to quickly write data to network or disk. | host.name, process.runtime.jvm.buffer.limit, process.runtime.jvm.buffer.usage, service.name |
JVM Non-Heap Memory Usage | JVM non-heap memory is allocated above and beyond the heap size you’ve configured. It is a section of memory in the JVM that stores class information (Metaspace), compiled code cache, thread stack, and so on. It cannot be garbage collected. | host.name, pool, process.runtime.jvm.memory.committed, process.runtime.jvm.memory.limit, process.runtime.jvm.memory.usage, service.name, type |
The AWS Lambda Health board template includes queries for observing AWS Lambda function health via invocations, errors, throttles, and concurrency.
Query Name | Query Description | Fields Required |
---|---|---|
Duration & Execution by ID/Version | Tracks the execution time of Lambda functions, identified by their ID or version. Useful for analyzing the performance and efficiency of different versions or instances of a function over time. | duration_ms, faas.execution, faas.name, faas.version |
Lambda Invocations by Function | Displays the total number of times each Lambda function is invoked. It helps in tracking the frequency of usage of different functions, enabling a clear understanding of which functions are most or least used. | FunctionName, MetricName, Namespace |
Latency by Function/Metric | Displays the response time for each Lambda function, broken down by specific metrics. Useful for identifying functions that may be experiencing performance issues due to high latency. | FunctionName, MetricName, Namespace, amazonaws.com/AWS/Lambda/Duration.max, amazonaws.com/AWS/Lambda/PostRuntimeExtensionsDuration.max |
Function Error Count and Rate | Displays two key pieces of information: the total number of errors encountered by each Lambda function and the error rate, calculated as the ratio of errors to total invocations. Useful for pinpointing functions that are failing or experiencing issues. | FunctionName, MetricName, Namespace, amazonaws.com/AWS/Lambda/Errors.count |
Lambda Throttles | Displays the instances where Lambda invocations are being throttled, such as when the number of function calls exceeds the concurrency limits. Tracking this helps in managing and optimizing the scalability settings for each function. | FunctionName, MetricName, Namespace, amazonaws.com/AWS/Lambda/Throttles.count |
Function Concurrency | Monitors the simultaneous execution count of each Lambda function, tracking how many instances of a function are running at the same time. | FunctionName, MetricName, Namespace, amazonaws.com/AWS/Lambda/ConcurrentExecutions.avg, amazonaws.com/AWS/Lambda/UnreservedConcurrentExecutions.avg |
To explore common issues when working with Board Templates, visit Common Issues with Visualization: Board Templates.