Skip to main content
Get instant insights into your system with Board Templates.
This functionality is available only for teams using Honeycomb’s current data model. If you use Honeycomb Classic, we recommend migrating to Honeycomb Environments, so you can take advantage of its expanded data model and future product updates.

What is a Board Template?

Board Templates are pre-configured Boards that come with ready-made queries and visualizations, providing valuable insights with minimal set up. Use a template as starting point to create a Board. Templates are designed for specific use cases and built around industry best practices, ensuring effective configurations for tracking key metrics and visualizing data accurately.

Board Templates At a Glance

Choose from a variety of templates to quickly gain insights across different areas of your system:
  • General:
    • Service Health: Insight into service health, including request volumes and where slowest requests occur.
    • Airflow: Overview of data workflow performance. Monitoring Airflow operations can highlight problems which may occur in the process of running data pipelines.
    • Kafka: Insight into Kafka brokers, topics, partition, and consumers.
    • Linux Host: Useful queries for monitoring Linux hosts, including CPU, memory, disk, filesystem, and network utilization on the configured hosts.
    • Spring Boot: Insight into application health and performance metrics for your Spring Boot microservices.
    • Django: Insight into application heath and performance metrics for your Django application.
    • Rails: Queries to help investigate the performance and health of your Rails application.
    • RabbitMQ: Visualizations for core RabbitMQ metrics and client signals.
    • My Services: Application Performance Monitoring (APM) metrics for a variety of services and frameworks.
  • Data Stores
    • MySQL Operations: Insight into MySQL database operations, including thread count by type, query rate, resource usage, and row/table locks.
    • Redis: Insight into Redis primary and replica nodes, including command activity, latency/volume and execution time, expired keys, and CPU consumption.
    • Postgres: Insight into Postgres’s operations, including active connections, database size, table count, and transaction throughput.
    • MongoDB: Metrics-driven visualizations for monitoring MongoDB nodes.
    • SQL Server: Useful metrics for monitoring SQL Server database operations.
  • Frontend Investigation
    • Real User Monitoring (RUM): Real user monitoring data for frontend applications, including performance and user experience insights.
    • Android Auto-Instrumentation: Auto-instrumentation data for Android applications provided by the Honeycomb OpenTelemetry Android SDK.
    • iOS Auto-Instrumentation: Auto-instrumentation data for iOS applications provided by the Honeycomb OpenTelemetry Swift SDK.
  • Kubernetes:
    • Kubernetes Pod Metrics: Queries and visualizations that help you investigate pod performance and resource usage within Kubernetes clusters.
    • Kubernetes Node Metrics: Queries and visualizations that help you investigate node performance and resource usage within Kubernetes clusters.
    • Kubernetes Workload Health: Queries and visualizations that help you investigate application problems related to Kubernetes workloads.
  • OpenTelemetry:
    • OpenTelemetry Collector Operations: Metrics emitted by the OpenTelemetry Collector during operation.
    • OpenTelemetry Java Metrics: Insights into Java Virtual Machine (JVM) health and performance via metrics reported by OpenTelemetry Java Agent or Honeycomb OpenTelemetry Distribution for Java.
  • Amazon Web Services (AWS):
    • AWS Lambda Health: Information about AWS Lambda function health, including invocations, errors, throttles, and concurrency.
    • EC2 Health: Information about AWS EC2 instance, status failures, and EBS read/write operations.
    • ALB/ELB Health: Information about AWS Load Balancers, including Load Balancer’s health, status codes, active connections, and requests.
    • SQS: Insight into critical AWS SQS operations.
    • RDS: Insight to monitor and optimize performance for AWS RDS databases.
  • Artificial Intelligence:
    • Anthropic Usage & Cost Monitoring: Comprehensive insights into Anthropic API usage and costs, including token consumption, feature usage, and cost attribution across models, workspaces, and API keys.
  • Honeycomb Features:
    • Refinery Operations: Overview of sampling operations, including trace throughput and sampling statistics. Automatically populated by Refinery metrics sent to Honeycomb.
    • Activity Log Security: Queries showing API Key activity.
    • Activity Log Leaderboard: Queries showing advanced and frequent Honeycomb usage by your team.
    • Activity Log Trigger and SLO Activity: Queries related to trigger and SLO activations and modifications.

General

Service Health

The Service Health Board Template offers an overview of your services’ health. It provides insights into request volumes, identifies where the slowest requests are occurring, and more.
This template relies on your source data fields being mapped to Honeycomb standard fields. To learn how to map your fields, visit Dataset Definitions.
Query NameQuery DescriptionRequired Fields
Trace Counts by ServiceShows total trace volume by service.
  • Parent span ID or trace.parent_id
  • Service name or service.name or service_name
Trace Counts by HTTP Status CodeShows total trace volume by status code.
  • Parent span ID or trace.parent_id
  • HTTP Status Code or http.response.status.code or http.status_code
Trace Duration HeatmapShows a heatmap of the duration for all traces.
  • Span duration or duration_ms
  • Parent span ID or trace.parent_id
Duration HeatmapShows a heatmap of duration across all services.
  • Span duration or duration_ms
Duration by ServiceShows key duration percentiles by service.
  • Span duration or duration_ms
  • Service name or service.name or service_name
Duration by RouteShows duration by route or endpoint.
  • Span duration or duration_ms
  • Route or http.route
Duration by NameShows duration by function name.
  • Span duration or duration_ms
  • Name or name
Errors by ServiceShows a count of errors grouped by service.
  • Error or error
  • Service name or service.name or service_name
Errors by RouteShows a count of errors grouped by route or endpoint.
  • Error or error
  • Route or http.route

Airflow

The Airflow Board Template gives an overview of data workflow performance. Monitoring Airflow operations can highlight problems which may occur in the process of running data pipelines.
We derive the required fields in this template from Airflow’s support for OpenTelemetry logs, metrics, and traces. To learn more, visit our documentation about about instrumenting your Python data pipelines and applications.
Query NameQuery DescriptionRequired Fields
DAG Processing Import ErrorsShows the sum of the number of errors from trying to parse DAG files by host.name. Parsing errors prevent DAGs from being loaded. Tracking these errors helps identify configuration or syntax issues that need immediate attention.
  • airflow.dag_processing.import_errors
  • host.name
DAG Processing Import Errors by File PathShows the sum of the number of errors during import and parse of DAG files, broken out by DAG File Path and host.name. Tracking these errors helps identify configuration or syntax issues with a given file or host.
  • host.name
  • import_errors
  • file_path
Duration of Tasks (AVG, P95)Shows the average and P95 duration of a Task by DAG ID, task ID, and host.name. Execution time helps identify which specific tasks are performance bottlenecks, allowing you to optimize your workflows. Note: Uses trace signal type.
  • host.name
  • meta.signal_type
  • duration_ms
  • task_id
  • dag_id
DAG Failed Duration (AVG)Shows the average duration in milliseconds (ms) taken for a DagRun to reach a failed state by DAG ID and host.name. Failed DAG runs consume valuable resources. Monitoring this metric helps to identify inefficient failure patterns.
  • dag_id
  • host.name
  • airflow.dagrun.duration.failed
DAG Success Duration (AVG)Shows the average duration in milliseconds (ms) for a DagRun to reach success state by DAG ID and host.name. Monitoring duration allows you to optimize resource allocation and set appropriate SLAs.
  • airflow.dagrun.duration.success
  • dag_id
  • host.name
Task CountsShows the count of Tasks grouped by DAG ID, task ID, host.name, and state. Use the overall workflow health and the proportion of tasks experiencing issues to highlight potential issues with Airflow operations. Note: Uses trace signal type.
  • host.name
  • state
  • dag_id
  • task_id
DAG Schedule DelayShows the average duration in milliseconds (ms) of delay between the scheduled DagRun start date and the actual DagRun start date, grouped by DAG ID and host.name. Use to identify scheduler bottlenecks, resource constraints, or overloaded Airflow instances that prevents timely workflow execution.
  • dag_id
  • host.name
  • airflow.dagrun.schedule_delay
Scheduler TasksShows the sum of Airflow Scheduler Tasks that are executing or starving by host ID. Use to understand scheduler load, identify periods when the scheduler might be overwhelmed with too many tasks, and ensure task distribution works as expected.
  • host.name
  • airflow.scheduler.tasks.executable
  • airflow.scheduler.tasks.starving
Executor TasksShows the maximum count of Executor Tasks (queued, running and open slots), grouped by host.name. Note that Queued reflects the number of queued tasks on executor, Running reflects the number of running tasks on executor, and Open Slots reflects the number of open slots on executor.
  • executor.open_slots
  • host.name
  • executor.queued_tasks
  • executor.running_tasks
Pool Task Slots by HostShows the maximum count of Airflow Pool Slots - Deferred, Queued, Open, Running, Starving and Scheduled by Host. Can be used to monitor resource allocation, identify when pools are at capacity, and optimize your configuration to match your workflow needs.
  • airflow.pool.open_slots
  • airflow.pool.running_slots
  • airflow.pool.starving_tasks
  • host.name
  • pool_name
  • airflow.pool.queued_slots
  • airflow.pool.scheduled_slots
  • airflow.pool.deferred_slots

Kafka

The Kafka Board Template provides insight into Kafka brokers, topics, partition, and consumers.
This template relies on the Kafka Metrics receiver provided by the OpenTelemetry Collector Contrib distribution. To learn how to set up this receiver, visit Kafka metrics receiver documentation in the OpenTelemetry Collector Contrib repo. To receive relevant Java Virtual Machine (JVM) metrics, include the OpenTelemetry Java Agent in Kafka nodes as well.
Query NameQuery DescriptionRequired Fields
Number of Active BrokersShows the number of active brokers.
  • kafka.brokers
Consumer Group MembershipShows the number of consumers per broker.
  • group
  • host.name
  • kafka.consumer_group.members
Consumer Progress Lag vs Offset RateShows the average rate of Kafka consumer group lag and offsets over time, grouped by topic partitions. Use to monitor consumer progress and to detect delays by comparing offset increases to lag.
  • host.name
  • kafka.consumer_group.lag
  • kafka.consumer_group.offset
  • topic
  • group
Partition Offset OverviewShows the rate of change in the oldest and current offsets across Kafka partitions.
  • kafka.partition.current_offset
  • topic
  • host.name
  • kafka.partition.oldest_offset
Partition Count By TopicShows the number of partitions for each topic. Use for capacity planning an ensuring proper topic configuration.
  • topic
  • host.name
  • kafka.partition.current_offset
  • partition
Partition Replication HealthShows the number of in-sync replicas for each partition compared to total replicas. Use to identify under-replicated partitions.
  • kafka.partition.replicas_in_sync
  • kafka.partition.replicas
  • topic
  • partition
  • host.name
Consumer Group Lag by TopicShows total lag across all partitions for each consumer group and topic combination.
  • group
  • topic
  • kafka.consumer_group.lag_sum
Partition Balance AnalysisShows distribution of offsets across partitions for each topic. Use to identify potential partition imbalances.
  • kafka.partition.current_offset
  • topic
  • partition
High Consumer LagShows high consumer group lag, which may indicate potential consumer issues.
  • group
  • topic
  • host.name
  • kafka.consumer_group.lag_sum
Message ThroughputShows the approximate message throughput for each topic by measuring the rate of change in offset over time.
  • kafka.partition.current_offset
  • topic
  • host.name
JVM Thread Count by Cluster and StateShows the total JVM thread count across Kafka clusters, grouped by thread state. Use to identify thread contention or resource leaks.
  • host.name
  • jvm.thread.count
  • kafka.cluster.alias
  • jvm.thread.state
  • service.name
  • service.instance.id
  • jvm.thread.daemon
JVM Garbage Collection DurationsShows the median JVM and the P90 garbage collection durations. Use to understand garbage collection efficiency and memory management health.
  • jvm.gc.duration.p50
  • jvm.gc.duration.p90
  • kafka.cluster.alias
  • service.name
  • jvm.gc.action
  • jvm.gc.name
  • host.name
Max Recent JVM CPU UtilizationShows the highest CPU utilization within the JVM at a default 30 minute window. Use to identify potential load spikes or bottlenecks that may affect your cluster.
  • kafka.cluster.alias
  • service.name
  • host.name
  • jvm.cpu.recent_utilization
JVM Memory Usage and CommitmentShows memory usage patterns in clusters, providing a view in how memory is used and committed in the JVM. Use to track inefficient memory usage.
  • jvm.memory.used
  • jvm.memory.committed
  • kafka.cluster.alias
  • jvm.memory.type
  • jvm.memory.pool.name
  • host.name

Linux Host

The Linux Host Board Template provides useful queries for monitoring Linux hosts. It provides insights into CPU, memory, disk, filesystem, and network utilization on the configured hosts.
This template uses the Host Metrics receiver provided by the OpenTelemetry Collector Contrib distribution. To learn how to set up this receiver, visit the Host Metrics receiver documentation in the OpenTelemetry Collector Contrib repo.When configuring the hostmetrics receiver for this Board Template, include these scrapers:
  • CPU
  • Disk
  • Load
  • Filesystem
  • Memory
  • Network
  • Paging
  • Processes
  • Process
Query NameQuery DescriptionRequired Fields
Process CPU Time BreakdownShows the total CPU time consumed by different processes, broken down by process owner and command. Use to identify which processes are consuming the most CPU resources over time.
  • process.owner
  • process.executable.name
  • os.type
  • process.cpu.time
  • host.name
Memory Consumption TrendsShows the average memory usage across host, operating system, and state. Use to monitor and diagnose system memory usage trends.
  • state
  • os.type
  • system.memory.usage
  • host.name
CPU Utilization TrendsShows the distribution of CPU time spent on user processes, system operations, and idle time. Use to identify which hosts are under load.
  • os.type
  • system.cpu.time.user
  • system.cpu.time.system
  • system.cpu.time.idle
  • host.name
Disk I/OShows the active Disk input and output based on device. Use to identify high read/write rates.
  • system.disk.io.write
  • host.name
  • device
  • os.type
  • system.disk.io.read
Memory Usage by ProcessShows Linux processes by memory usage and virtual memory consumption. Use to troubleshoot resource bottlenecks and optimize memory allocation.
  • os.type
  • process.memory.usage
  • process.memory.virtual
  • host.name
  • process.command
  • process.owner
Filesystem UsageShows filesystem usage across different mount points, devices, and modes. Use for capacity planning and troubleshooting storage issues.
  • host.name
  • device
  • mountpoint
  • mode
  • os.type
  • system.filesystem.usage.used
Network MetricsShows network operations per network interface.
  • system.network.io.receive
  • system.network.io.transmit
  • host.name
  • device
  • os.type

Spring Boot

The Spring Boot Board Template provides insight into application health and performance metrics for Spring Boot microservices.
Source data for this Board Template is configured using automatic instrumentation provided by the OpenTelemetry Java Agent SDK. To learn more, visit our Java automatic instrumentation instructions.
Query NameQuery DescriptionRequired Fields
Database UsageShows database performance metrics. Use to help identify slow-performing queries and connection issues.
  • db.client.connections.use_time.avg
  • db.client.connections.wait_time.avg
  • host.name
  • telemetry.sdk.language
API Endpoint LatencyShows a heatmap of API endpoint response times. Use to highlight bottlenecks or anomalies in performance.
  • http.server.request.duration.avg
  • http.route
  • http.response.status_code
  • http.request.method
  • host.name
  • telemetry.sdk.language
Garbage Collection Performance MonitorShows maximum, average, and P95 duration of garbage collection metrics. Use to identify memory allocation patterns that causes application slow down.
  • jvm.gc.duration.avg
  • jvm.gc.duration.p95
  • jvm.gc.action
  • host.name
  • telemetry.sdk.language
  • jvm.gc.duration.max
Request Per MinuteShows requests made per minute. Use to observe the traffic patterns and to detect unexpected load or errors.
  • host.name
  • telemetry.sdk.language
  • http.route
  • http.request.method
  • http.response.status_code
Heap used vs Heap Max LimitShows the JVM memory matrix and compares current memory usage against maximum heap limit. Use to identify out of memory errors.
  • jvm.memory.used
  • jvm.memory.limit
  • host.name
  • telemetry.sdk.language
API ErrorsShows error responses with status code >= 400. Use to monitor API health.
  • http.route
  • host.name
  • telemetry.sdk.language
  • http.response.status_code
Response Size DistributionShows response payload size. Use to monitor data transfer efficiency, and to identify any unexpectedly large response.
  • http.request.method
  • http.response.status_code
  • host.name
  • telemetry.sdk.language
  • http.response.body.size
  • http.route
JVM CPU Time RateShows CPU consumption rate metrics. Use to identify processing-intensive operations and to detect performance decline overtime.
  • jvm.cpu.time
  • host.name
  • telemetry.sdk.language
  • meta.signal_type

Django

The Django Board Template provides insight into application heath and performance metrics for a Django application.
This template uses the OpenTelemetry Python API for automatic instrumentation via the OpenTelemetry Python SDK. To learn more, visit the OpenTelemetry Python API documentation and its Django instrumentation instructions.
Query NameQuery DescriptionRequired Fields
Request Count Per MinuteShows requests made per minute. Use to observe the traffic patterns and to detect unexpected load or errors.
  • telemetry.sdk.language
  • http.host
  • http.route
  • http.method
  • http.status_code
  • http.server_name
HTTP Response DurationShows the P95 response duration by route, status code and server name. Highlights Django HTTP performance.
  • http.route
  • http.method
  • http.status_code
  • http.server_name
  • telemetry.sdk.language
  • http.response.body.size
  • duration_ms
HTTP ErrorsShows the count of HTTP errors by route, status code, and host.name. Use to assess the success and error rate of APIs.
  • http.status_code
  • http.server_name
  • error
  • telemetry.sdk.language
  • http.route
  • http.method
ExceptionsShows exceptions thrown in the service. Use to access overall health of the application.
  • http.server_name
  • exception.type
  • code.namespace
  • exception.message
  • exception.stacktrace
  • telemetry.sdk.language
AVG and P95 Request SizeShows the average and P95 HTTP request size to monitor payload efficiency.
  • http.server_name
  • telemetry.sdk.language
  • http.request.body.size
  • http.route
  • http.method
  • http.status_code
AVG and P95 Response SizeShows the average and P95 HTTP response size to monitor payload efficiency.
  • telemetry.sdk.language
  • http.response.body.size
  • http.route
  • http.method
  • http.status_code
  • http.server_name
P95 and Heatmap of Job DurationShows the P95 and Heatmap of Job Duration by messaging destination, messaging system, and server name. Provides insights into status async job runners.
  • http.server_name
  • telemetry.sdk.language
  • duration_ms
  • messaging.destination
  • messaging.system
Jobs ExecutedShows the count of root traces with messaging system and destination. Can be used to assess overall performance of the async job operations.
  • http.server_name
  • messaging.destination
  • messaging.system
  • telemetry.sdk.language
  • messaging.destination_kind
DB connection Count Per MinShows the connection count per minute where db connection event is “open”. Helps gain visibility into connection pooling efficiency.
  • telemetry.sdk.language
  • db.operation
  • db.system
  • db.name
  • db.connection.event

Rails

The Rails Board Template gives you visibility into Rails behavior, performance, and health. The queries and visualizations help identify slow database queries, inefficient code paths, and other performance bottlenecks.
We derive the required fields in this template from Ruby and Ruby on Rails support for OpenTelemetry logs, metrics, and traces. To learn more, visit our documentation on instrumenting your Ruby and Ruby on Rails applications.
Query NameQuery DescriptionRequired Fields
Requests ServedShows count of requests served by Rails by host.name. Use to provide an overview of traffic volume at a glance.
  • host.name
  • telemetry.sdk.language
  • http.route
HTTP Response DurationShows P95 response duration by route, controller namespace, controller function, status code, and host.name. Use for Rails HTTP performance.
  • duration_ms
  • http.route
  • code.namespace
  • code.function
  • http.status_code
  • host.name
  • telemetry.sdk.language
HTTP Duration HeatmapShows a heatmap of HTTP response duration by route, status code and host.name. Use to assess and investigate outliers.
  • http.status_code
  • host.name
  • telemetry.sdk.language
  • duration_ms
  • http.route
HTTP ErrorsShows count of HTTP errors by route, Controller namespace, status code, and host.name. Use to assess success and error rate of Rails web endpoints.
  • error
  • http.route
  • code.namespace
  • http.status_code
  • host.name
  • telemetry.sdk.language
DB Statement DurationShows a heatmap and the P95 of database duration per database name, operation, statement and host.name. A heatmap provides more information to help identify outlier DB statements.
  • duration_ms
  • db.name
  • db.operation
  • db.statement
  • telemetry.sdk.language
P95 and Heatmap of Job DurationShows P95 and a heatmap of Job Duration by messaging destination, messaging system, service name, and host.name. Provides insights into status of Rails async job runners, such as ActiveJob and Sidekiq.
  • duration_ms
  • messaging.destination
  • messaging.system
  • service.name
  • host.name
  • telemetry.sdk.language
ExceptionsShows exceptions thrown by type, code namespace, and host.name. Use to assess overall health of your Rails application.
  • code.namespace
  • host.name
  • telemetry.sdk.language
  • error
  • exception.message
  • exception.type
Jobs ExecutedShows count of root traces with messaging system and destination. Use to assess overall performance of Rails async job operations.
  • telemetry.sdk.language
  • host.name
  • messaging.system
  • messaging.destination

RabbitMQ

The RabbitMQ Board contains visualizations for core RabbitMQ metrics and client signals.
This Board uses the RabbitMQ receiver provided by the opentelemetry-collector-contrib distribution. To learn how to set up this receiver, visit the RabbitMQ documentation in OpenTelemetry’s Collector Contrib repo. When configuring RabbitMQ, enable the management plugin to use the receiver.By default, the RabbitMQ receiver disables several key metrics for resource and connectivity utilization. To learn more, visit the RabbitMQ metrics documentation in OpenTelemetry’s Collector Contrib repo.
Query NameDescriptionRequired Fields
Message StatsVisualizes the number of messages published to the number of current messages on queues, per node.
  • host.name
  • rabbitmq.message.current
  • rabbitmq.message.published
  • rabbitmq.node.name
  • rabbitmq.vhost.name
Connectivity ProfileVisualizes the number of channels created over the number of channels closed, per node. Helpful for identifying channel leaks, potential resource exhaustion, or other connectivity issues.
  • host.name
  • rabbitmq.node.channel.closed
  • rabbitmq.node.channel.created
  • rabbitmq.node.connection.closed
  • rabbitmq.node.connection.created
  • rabbitmq.node.name
File Descriptor UtilizationVisualizes File Descriptors (FDs). Useful for identifying resource limitations.
  • host.name
  • rabbitmq.node.fd.total
  • rabbitmq.node.fd.used
  • rabbitmq.node.name
Consumer ActivityVisualizes the number of consumers attached to each queue.
  • host.name
  • rabbitmq.consumer.count, rabbitmq.node.name
  • rabbitmq.queue.name
System Resource PressureVisualizes core metrics for system resources, including memory and file descriptor utilization.
  • host.name
  • rabbitmq.node.fd.total
  • rabbitmq.node.fd.used
  • rabbitmq.node.mem.limit
  • rabbitmq.node.name
Queue HealthVisualizes queue lengths and counts to catch congestion.
  • host.name
  • rabbitmq.message.current
  • rabbitmq.node.name
  • rabbitmq.queue.name

My Services

The My Services template provides Application Performance Monitoring (APM) metrics for a variety of services and frameworks.
This template relies solely on semantic conventions and traces to provide a general overview of APM for HTTP-driven services. It should work with a variety of frameworks, languages, and runtimes.To learn how to generate telemetry data for this Board, visit Honeycomb OpenTelemetry documentation.
Query NameDescriptionRequired Fields
Total RequestsVisualizes the number of requests for a service.
  • http.route
  • service.name
Request DistributionVisualizes the number of requests by status code.
  • http.route
  • http.status_code
  • service.name
P95 Request LatencyVisualizes latency excluding the slowest 5% of responses.
  • duration_ms
  • http.route
  • service.name
Average LatencyVisualizes the average latency per endpoint.
  • duration_ms
  • http.route
  • service.name
Error TrendVisualizes the number of errors by route and status code.
  • error
  • http.route
  • http.status_code
  • service.name
Successful Response Counts - 2xxVisualizes all requests in the 2xx range.
  • http.status_code
  • service.name
Client Error Response Counts - 4xxVisualizes 4xx HTTP status codes.
  • http.status_code
  • service.name
Server Error Response Counts - 5xxVisualizes HTTP responses in the 5xx range.
  • http.status_code
  • service.name
ErrorsVisualizes the number of errors emitted over the selected time frame.
  • error
  • http.route
  • http.status_code
  • http.status_text
  • service.name
Redirection Response Count - 3xxVisualizes HTTP status codes in the 3xx range.
  • http.status_code
  • service.name
DurationVisualizes request duration in a heatmap.
  • duration_ms
  • service.name

Data Stores

MySQL Operations

The MySQL Board Template provides insights into MySQL database operations, including thread count by type, query rate, resource usage, and row/table locks.
This template relies on the MySQL metrics receiver provided by the OpenTelemetry Collector Contrib distribution. To learn how to set up this receiver, visit MySQL Receiver documentation in the OpenTelemetry Collector Contrib repo.
Query NameQuery DescriptionRequired Fields
Server StatusShows server uptime. Use to track server restarts.
  • mysql.uptime
  • mysql.instance.endpoint
Buffer Pool PagesShows the number of pages in the InnoDB buffer pool by type. Use to understand buffer pool utilization.
  • mysql.instance.endpoint
  • kind
  • mysql.buffer_pool.pages
Buffer Pool Data PagesShows the number of data pages in the InnoDB buffer pool by status (clean or dirty). Use to track page writes to disk.
  • mysql.buffer_pool.data_pages
  • mysql.instance.endpoint
  • status
Buffer Pool Page FlushesShows the rate of page flush requests from the InnoDB buffer pool. Use to help identify input/output pressure.
  • mysql.instance.endpoint
  • mysql.buffer_pool.page_flushes
Buffer Pool OperationsShows buffer pool operations by type. Use to identify patterns in buffer pool usage.
  • mysql.instance.endpoint
  • operation
  • mysql.buffer_pool.operations
Row and Page OperationsShows the rate of InnoDB row and page operations. Use to provide insight into database workload and input/output patterns.
  • mysql.row_operations
  • mysql.page_operations
  • mysql.instance.endpoint
  • operation
Doublewrite RateShows the rate of writes to the InnoDB doublewrite buffer. Use to understanding database durability.
  • kind
  • mysql.double_writes
  • mysql.instance.endpoint
Handler Requests and Thread StatusShows the rate of requests to various handlers and the state of system threads. Provides insight into how the database is processing queries and allows monitoring of connection usage and thread efficiency.
  • mysql.handlers
  • mysql.threads
  • mysql.instance.endpoint
  • kind
Row and Table LocksShows InnoDB lock statistics, and MySQL Table locks. Use to help identify lock contention.
  • mysql.row_locks
  • mysql.instance.endpoint
  • kind
  • mysql.locks
Resource UsageShows the rate of opened resources and temporary resources. Use to help identify resource utilization, and the usage of temporary tables or files.
  • mysql.tmp_resources
  • mysql.instance.endpoint
  • resource
  • mysql.opened_resources
Query RateShows query throughput and slow query rates across MySQL instances. Use to pinpoint instances with the highest query load.
  • mysql.query.count
  • mysql.query.slow.count
  • mysql.instance.endpoint
Thread Count by TypeShows thread count by type. Use to indicate operations currently being performed by the set of threads executing within the server.
  • kind
  • mysql.threads
  • mysql.instance.endpoint
Table Open Cache EfficiencyShows Table Cache Efficiency. Use to monitor filesystem input/output within the instances.
  • mysql.table_open_cache
  • mysql.instance.endpoint
  • status

Redis

The Redis Board Template provides insights into Redis primary and replica nodes, including command activity, latency/volume and execution time, expired keys, and CPU consumption.
This template uses the Redis receiver provided by the OpenTelemetry Collector Contrib distribution. To learn how to set up this receiver, visit the Redis receiver documentation in the OpenTelemetry Collector Contrib repo.The Redis receiver does not automatically publish some key server attributes, like address or port. The visualizations on this Board Template use server address to ensure that visualizing across multiple Redis instances is possible.
Query NameQuery DescriptionRequired Fields
Cache ConnectionsShows connections received and rejected per server. Use to diagnose connectivity issues.
  • redis.connections.received
  • redis.connections.rejected
  • server.address
UptimeShows the number of seconds since a server start by server.
  • server.address
  • redis.uptime
Server DurabilityShows the number of write operations that have happened since the last successful RDB snapshot. Use to track durability issues per server.
  • redis.rdb.changes_since_last_save
  • server.address
Key CountShows the number of keys per database and per server.
  • redis.db.keys
  • server.address
  • db
Server CPU TimeShows the CPU consumed by Redis server since server start.
  • server.address
  • redis.cpu.time
Client ActivityShows Redis client activity per server address and activity between connected and blocked clients.
  • redis.clients.connected
  • redis.clients.blocked
  • server.address
  • redis.version
Command ActivityShows the number of commands processed per second and the number of commands processed by the server. Use to track operational load of servers.
  • redis.commands.processed
  • redis.commands
  • server.address
Client I/OShows the input/output buffers of Redis clients by server. Use to diagnose or troubleshoot input/output issues with clients.
  • redis.clients.max_input_buffer
  • redis.clients.max_output_buffer
  • server.address
Network ActivityShows network input/output by server.
  • redis.net.output
  • server.address
  • redis.net.input
P99 Command LatencyShows the P99 of command latency. Use to identify anomalous commands.
  • redis.cmd.latency
  • cmd
  • server.address
  • percentile
Command Volume and Execution TimeShows the number of calls for a command and the total time for all executions of a command per server.
  • redis.cmd.calls
  • redis.cmd.usec
  • server.address
  • cmd
Average Command LatencyShows the average latency of commands by server. Use to understand the baseline latency of a command.
  • percentile
  • redis.cmd.latency
  • server.address
  • cmd
Expired KeysShows the total number of key expiration events per server.
  • redis.keys.expired
  • server.address
Keyspace Hits and MissesShows the number of successful and failed key lookups per server.
  • redis.keyspace.hits
  • redis.keyspace.misses
  • server.address
Memory ProfileShows memory metrics per server.
  • redis.memory.peak
  • redis.memory.fragmentation_ratio
  • redis.memory.rss
  • redis.memory.lua
  • server.address
  • redis.memory.used
Primary ReplicationShows the replication offsets per server.
  • redis.replication.offset
  • redis.replication.backlog_first_byte_offset
  • server.address
  • redis.slaves.connected
Follower ReplicationShows the replication offset for follower instances.
  • redis.replication.replica_offset
  • server.address
  • redis.slaves.connected

Postgres

The Postgres Board Template provides insight into Postgres’s operations, including active connections, database size, table count, and transaction throughput.
Query NameQuery DescriptionRequired Fields
Active ConnectionsShows the current number of active connections.
  • host.name
  • postgresql.backends
  • postgresql.connection.max
Database SizeShows the database size over time. Use to help with capacity planning and identifying unexpected growth patterns.
  • postgresql.db_size
  • postgresql.database.name
  • host.name
Database and Table CountShows visibility into number of databases and tables, which can identify database sprawl.
  • postgresql.table.count
  • postgresql.database.name
  • host.name
  • postgresql.database.count
Transaction ThroughputShows the rate of commits and rollbacks per database, which provides insight into transaction throughput and success rates.
  • postgresql.commits
  • postgresql.rollbacks
  • postgresql.database.name
  • host.name
Block Read PerformanceShows the the sources of block reads and their rates. Use to diagnose input/output performance issues.
  • postgresql.blocks_read
  • source
  • postgresql.database.name
  • postgresql.table.name
  • host.name
Index UsageShows the rate of index scans. Use to identify frequently used indexes.
  • postgresql.index.name
  • host.name
  • postgresql.index.scans
  • postgresql.table.name
Database OperationsShows database operations. Use to provide insight into workload patterns.
  • postgresql.operations
  • operation
  • postgresql.table.name
  • postgresql.database.name
  • host.name
Background Writer ActivityShows buffer writes by source. Use to identify potential input/output bottlenecks.
  • source
  • host.name
  • postgresql.bgwriter.buffers.writes
Checkpoint FrequencyShows the rate of checkpoints by type (requested versus scheduled), which can help identify if checkpoints are occurring too frequently.
  • host.name
  • postgresql.bgwriter.checkpoint.count
  • type
Checkpoint DurationShows time spent on checkpoint operations across databases and tables. Longer checkpoint durations can negatively impact database performance.
  • postgresql.bgwriter.duration
  • host.name
  • type
Table SizeShows the top 10 largest tables, which may identify tables that require optimization or partitioning.
  • postgresql.table.size
  • postgresql.table.name
Index SizeShows the top 10 largest indexes, which may identify indexes that need rebuilding or optimization.
  • postgresql.database.name
  • postgresql.table.name
  • host.name
  • postgresql.index.size
  • postgresql.index.name
Cache Hit RatioShows the sum of block reads satisfied from the buffer cache. A higher number indicates better performance.
  • postgresql.blocks_read
  • postgresql.database.name
  • postgresql.table.name
  • host.name
  • source
Replication WAL DelayShows time between flushing recent WAL and notification standby servers have completed operation on it. Use to track replication delays.
  • host.name
  • postgresql.wal.delay
  • replication_client
Replication Data DelayShows the amount of data delayed in replication, which can help identify network or performance issues affecting replication.
  • postgresql.replication.data_delay
  • replication_client
  • host.name
Database Locks by TypeShows the maximum number of database locks per type. Use for situations where multiple concurrent transactions may cause resource contention.
  • host.name
  • postgresql.database.locks
  • mode
  • lock_type
Postgres Memory UtilizationShows memory usage and amount of committed memory for postgres processes. Use to identify inefficient processes.
  • process.memory.usage
  • process.memory.virtual
  • process.command
  • process.executable.name
  • host.name
Postgres CPU Utilization TrendsShows CPU utilization for PostgreSQL processes. Use to identify inefficient queries, excessive index scanning, and so on.
  • process.cpu.time
  • process.command
  • host.name
Number of Postgres OperationsShows the number of PostgreSQL operations per database and table name.
  • postgresql.table.name
  • operation
  • host.name
  • postgresql.operations
  • postgresql.database.name

MongoDB

The MongoDB template contains metrics-driven visualizations for monitoring MongoDB nodes.
This Board leverages metrics collected via the MongoDB receiver provided by the OpenTelemetry Collector Contrib distribution. The MongoDB receiver enables observability into key performance, resource utilization, and replication metrics for MongoDB clusters and nodes. To configure this receiver, visit MongoDB receiver documentation in the OpenTelemetry Collector Contrib repo.
Query NameDescriptionRequired Fields
Server HealthShows health status by server.
  • mongod.status
  • mongodb.server.name
Count of Active ConnectionsShows current active connections. Useful for identifying leaks, connection saturation, or performance bottlenecks.
  • mongodb.connections.current
  • mongodb.server.name
  • host.name
Available ConnectionsVisualizes the number of available connections.
  • mongodb.connections.available
  • mongodb.server.name
  • host.name
Network I/OVisualizes bytes received and transmitted per server.
  • mongodb.network.bytes.in
  • mongodb.network.bytes.out
  • mongodb.server.name
Database CountVisualizes the number of databases per host.
  • mongodb.database.count
  • host.name
CollectionsShows the number of collections per server and database.
  • mongodb.collection.count
  • mongodb.database.name
  • host.name
Cache Hit RatioDisplays cache hits and misses.
  • mongodb.cache.hits
  • mongodb.cache.misses
  • host.name
Document OperationsVisualizes document operations by server and database.
  • mongodb.document.operations.rate
  • mongodb.server.name
  • host.name
Memory Usage by TypeProfiles memory usage by type. Useful for identifying low query performance or high latency due to memory utilization. High read usage coupled with low write usage generally indicates a healthy memory profile.
  • mongodb.memory.usage
  • mongodb.server.name
  • host.name
Index UtilizationTracks how indexes are being accessed across different collections.
  • mongodb.index.accesses
  • mongodb.server.name
  • host.name
Read Write OperationsShows the number of reads and writes currently being processed.
  • mongodb.operations.reads
  • mongodb.operations.writes
  • mongodb.server.name
Database LocksVisualizes locks and lock types per database.
  • mongodb.locks
  • mongodb.lock.time_ms
  • mongodb.database.name
  • host.name
Activity OverviewVisualizes DB activity by server. Useful for identifying read vs. load and throughput.
  • mongodb.commands.count
  • mongodb.server.name
  • mongodb.operations.inserts
  • mongodb.operations.updates
  • mongodb.operations.deletes
  • host.name
Replication OverviewVisualizes replication operations per server.
  • mongodb.replication.oplog.insert.count
  • mongodb.replication.oplog.update.count
  • mongodb.replication.oplog.delete.count
  • mongodb.server.name

SQL Server

The SQL Server Board template contains useful metrics for monitoring SQL Server database operations.
This template leverages metrics gathered primarily by the SQL Server Receiver provided by the opentelemetry-collector-contrib distribution. The SQL Server receiver provides insights into query execution, connection states, memory usage, and throughput across SQL Server instances and databases. To learn how to set up this receiver, visit the SQL Server Receiver documentation in the OpenTelemetry Collector Contrib repo.
Query NameDescriptionRequired Fields
Batch Requests RateShows total request rate (per second). Useful for diagnosing busy or idle instances.
  • host.name
  • sqlserver.batch_requests.rate
Lock Await RateShows the total rate of locks requests resulting in a wait.
  • host.name
  • sqlserver.locks.await.rate
Buffer EfficiencyShows buffer efficiency of cache lookups without having to read from disk. Drops in this value indicate inefficient queries.
  • host.name
  • sqlserver.buffer.page.lookups.rate
  • sqlserver.buffer.page.reads.rate
Query Plan Activity RateVisualizes the rate at which SQL Server generates new query execution plans when other existing plans are discarded and regenerated.
  • host.name
  • sqlserver.batch.sql_compilations.rate
  • sqlserver.batch.sql_recompilations.rate
Read I/O ThroughputVisualizes the total I/O throughput per host. Useful for identifying I/O operations or slow queries.
  • host.name
  • sqlserver.io.read.bytes
  • sqlserver.io.read.operations.rate
Active ConnectionsVisualizes active user connections.
  • host.name
  • sqlserver.connection.count
Memory UtilizationVisualizes the average amount of memory utilized per host, in KiB.
  • host.name
  • sqlserver.memory.usage
Table CountVisualizes the number of tables per host.
  • host.name
  • sqlserver.database.name
  • sqlserver.table.count
Database LatencyVisualizes the rate of wait counts across all waits resulting in I/O.
  • host.name
  • sqlserver.io.wait.count
  • sqlserver.io.wait.time
Execution ErrorsVisualizes the number of execution errors.
  • host.name
  • sqlserver.errors.count
Rollback RateVisualizes the number of rollbacks.
  • host.name
  • sqlserver.transactions.rollback.rate
CPU and I/O DurationVisualizes CPU activity by category and queries.
  • host.name
  • sqlserver.performance.cpu
  • sqlserver.performance.io
Server OperationsVisualizes the number of operations issued.
  • host.name
  • sqlserver.batch.transactions.rate
Database By StatusVisualizes the status of databases per host. Useful for quickly diagnosing unexpected database states.
  • host.name
  • sqlserver.database.state
  • sqlserver.database.name
Top Current QueriesVisualizes the most recent queries on the host.
  • host.name
  • sqlserver.query.text

Frontend Investigation

Real User Monitoring (RUM)

The RUM Board Template provides an overview of real user monitoring data from your frontend applications.
This template relies on your source data fields being mapped to Honeycomb standard fields. To learn how to map your fields, visit Dataset Definitions. To learn more about instrumenting your frontend application, visit Send Browser Data with Honeycomb Web Instrumentation.
Query NameQuery DescriptionRequired Fields
Largest Contentful Paint (LCP)Shows ratings based on the render time for the largest content on a page.
  • lcp.rating
  • name
Cumulative Layout Shift (CLS)Shows ratings based on the stability of content layout on a page.
  • cls.rating
  • name
Interaction to Next Paint (INP)Shows ratings based on the responsiveness of a page.
  • inp.rating
  • name
LCP P75Shows the 75th percentile for LCP.
  • name
  • lcp.value
CLS P75Shows the 75th percentile for CLS.
  • cls.value
  • name
INP P75Shows the 75th percentile for INP.
  • inp.value
  • name
Total Events by TypeShows event types ranked by occurrence.
  • name
  • meta.annotation_type
Largest Resource RequestsShows the largest resource requests ranked by the average length of their response content.
  • http.response_content_length
  • http.url
  • name
Top 5 Endpoints by Request CountShows the top 5 endpoints ranked by number of requests.
  • http.method
  • name
  • http.url
Slowest Requests by EndpointShows the slowest endpoints based on the 75th percentile of request durations.
  • http.url
  • duration_ms
  • name
Top Landing Pages by Session CountShows the most visited landing pages ranked by session count.
  • entry_page.path
  • name
Pages With the Most EventsShows pages with the highest number of events, highlighting the most active pages.
  • Route or http.route

Android Auto-Instrumentation

The Android Auto-Instrumentation Board Template provides an overview of the Honeycomb OpenTelemetry Android SDK auto-instrumentation.
This template relies on your source data fields being mapped to Honeycomb standard fields. To learn how to map your fields, visit Dataset Definitions. To learn more about instrumenting your frontend application, visit Send Android Data to Honeycomb.
Query NameQuery DescriptionRequired Fields
Average App Startup TimesAverage time the application took to start up. Grouped into cold, warm, and hot startups.
  • duration_ms
  • name
  • start.type
Total Startup Times Over 1.5sNumber of instances where any startup time surpassed the threshold of 1.5 seconds.
  • duration_ms
  • name
  • start.type
App’s Memory and Heap UsageStatistics about the application’s memory and heap usage.
  • heap.free
  • storage.free
Average Network Request Time per ScreenAverage duration for a screen’s requests to successfully retrieve data.
  • duration_ms
  • http.request.method
  • http.response.status_code
  • screen.name
Screens with the Most Network RequestsScreens that have the most network activity.
  • http.request.method
  • screen.name
Top Screens by Total Network Request FailuresScreens with the highest number of failed network requests.
  • http.response.status_code
  • screen.name
Screens with Application Not Responding (ANR) ErrorsNumber of instances where the application is unresponsive for more than 5 seconds.
  • exception.stacktrace
  • name
  • screen.name
Screens with Slow/Frozen RendersScreens that take more than 16ms (slow) or more than 700ms (frozen) to render.
  • name
  • screen.name
Top App Crashes & ErrorsTotal number of times the application crashed, excluding ANR events.
  • exception.message
  • exception.stacktrace
  • exception.type
  • name

i0S Auto-Instrumentation

The iOS Auto-Instrumentation Board Template provides an overview of the Honeycomb OpenTelemetry Swift SDK auto-instrumentation.
This template relies on your source data fields being mapped to Honeycomb standard fields. To learn how to map your fields, visit Dataset Definitions. To learn more about instrumenting your frontend application, visit Send iOS Data to Honeycomb with Swift.
Query NameQuery DescriptionRequired Fields
Monthly Active UsersTotal number of distinct users that have used the application in the past month.
  • device.id
Weekly Active UsersTotal number of distinct users that have used the application in the past week.
  • device.id
Daily Active UsersTotal number of distinct users that have used the application in the past day.
  • device.id
Average App Startup TimesAverage time the application took to start up. Grouped into cold, warm, and hot startups.
  • metrickit.app_launch.app_resume_time_average
  • metrickit.app_launch.optimized_time_to_first_draw_average
  • metrickit.app_launch.time_to_first_draw_average
  • name
Total Startup Times Over 1.5sTotal number of instances where any startup time surpassed the threshold of 1.5 seconds.
  • metrickit.app_launch.app_resume_time_average
  • metrickit.app_launch.optimized_time_to_first_draw_average
  • metrickit.app_launch.time_to_first_draw_average
  • name
Abnormal App Exit RatioRatio between abnormal application exits (foreground and background) and total application exits.
  • DIV(SUM($metrickit.app_exit.background.abnormal_exit_count, $metrickit.app_exit.foreground.abnormal_exit_count), SUM($metrickit.app_exit.background.normal_app_exit_count, $metrickit.app_exit.foreground.normal_app_exit_count, $metrickit.app_exit.background.abnormal_exit_count, $metrickit.app_exit.foreground.abnormal_exit_count))
Average App Performance Across All DevicesStatistics on how the resources the application is using perform on average.
  • metrickit.cpu.cpu_time
  • metrickit.gpu.time
  • metrickit.memory.peak_memory_usage
  • metrickit.memory.suspended_memory_average
  • name
Average Network Request Time per ScreenAverage duration for all the app’s screens to successfully retrieve data.
  • duration_ms
  • http.request.method
  • http.response.status_code
  • screen.name
Screens with the Most Network RequestsScreens that have the most network requests.
  • http.request.method
  • screen.name
Top Screens by Total Network Request FailuresTop screens that have failing network requests.
  • http.response.status_code
  • screen.name
Long Hanging ScreensScreens that are hanging for more than 0.5 seconds.
  • metrickit.app_responsiveness.
  • hang_time_average
  • name
  • screen.name
Average Screen Hang TimesLength of time each screen hangs on average.
  • metrickit.app_responsiveness.hang_time_average
  • name
  • screen.name
Most Used OS VersionsOperating systems used by the most users.
  • device.id
  • os.version

Kubernetes

Use the Kubernetes Quick Start to instrument the required fields for Kubernetes Board Templates.

Kubernetes Pod Metrics

The Kubernetes Pod Metrics Board Template includes queries that help you investigate pod performance and resource usage within Kubernetes clusters:
Query NameQuery DescriptionRequired Fields
Pod CPU UsageShows the amount of CPU used by each pod in the cluster. CPU is reported as the average core usage measured in cpu units. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers, and 1 hyper-thread on bare-metal Intel processors.
  • k8s.pod.cpu.utilization
  • k8s.pod.name
Pod Memory UsageShows the amount of memory being used by each Kubernetes pod.
  • k8s.pod.memory.usage
  • k8s.pod.name
Pod Uptime SmokestacksAs pod uptime ever-increases, this query uses the smokestack method, which applies a LOG10 to the Pod Uptime metric, and newly started or restarted pods appear more significantly than pods that have been running a long time, which move into a straight line eventually.
  • LOG10($k8s.pod.uptime)
  • k8s.pod.name
  • k8s.pod.uptime
Unhealthy PodsShows trouble that pods may be experiencing during their operating lifecycle. Many of these events are present during start-up and get resolved so the presence of a count isn’t necessarily bad.
  • k8s.namespace.name
  • k8s.pod.name
  • reason
Pod CPU Utilization vs. LimitWhen a CPU Limit is present in a pod configuration, this query shows how much CPU that each pod uses as a percentage against that limit.
  • k8s.pod.cpu_limit_utilization
  • k8s.pod.name
Pod CPU Utilization vs. RequestWhen a CPU Request is present in a pod configuration, this query shows how much CPU that each pod uses as a percentage against that request value.
  • k8s.pod.cpu_request_utilization
  • k8s.pod.name
Pod Memory Utilization vs. LimitWhen a Memory Limit is present in a pod configuration, this query shows how much memory that each pod uses as a percentage against that limit value.
  • k8s.pod.memory_limit_utilization
  • k8s.pod.name
Pod Memory Utilization vs. RequestWhen a Memory Request is present in a pod configuration, this query shows how much memory that each pod uses as a percentage against that request value.
  • k8s.pod.memory_request_utilization
  • k8s.pod.name
Pod Network IO RatesDisplays Network IO RATE_MAX for Transmit and Receive network traffic (in bytes) as a stacked graph, and gives the overall network rate and the individual rate for each node.
  • k8s.pod.name
  • k8s.pod.network.io.receive
  • k8s.pod.network.io.transmit
Pods With Low Filesystem AvailabilityShows any pods where filesystem availability is below 5 GB.
  • k8s.pod.filesystem.available
  • k8s.pod.name
Pod Filesystem UsageShows the amount of filesystem usage per Kubernetes pod, displayed in a stack graph to show total filesystem usage of all pods.
  • k8s.pod.filesystem.usage
  • k8s.pod.name
Pods Per NamespaceShows the number of pods currently running in each Kubernetes namespace.
  • k8s.namespace.name
  • k8s.pod.name
Pods Per NodeShows the number of pods currently running in each Kubernetes Node.
  • k8s.node.name
  • k8s.pod.name
Pod Network ErrorsShows network errors in receive and transmit, grouped by pod.
  • k8s.pod.name
  • k8s.pod.network.errors.receive
  • k8s.pod.network.errors.transmit
Pods Per DeploymentShows the number of pods currently deployed in different Kubernetes deployments.
  • k8s.deployment.name
  • k8s.pod.name

Kubernetes Node Metrics

The Kubernetes Node Metrics Board Template includes queries that help you investigate node performance and resource usage within Kubernetes clusters:
Query NameQuery DescriptionRequired Fields
Node CPU UsageShows the amount of CPU used on each node in the cluster. CPU is reported as the average core usage measured in cpu units. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers, and 1 hyper-thread on bare-metal Intel processors.
  • k8s.node.cpu.utilization
  • k8s.node.name
Node Memory UtilizationShows percent of memory used on each Kubernetes node.
  • IF(EXISTS($k8s.node.memory.available), MUL(DIV($k8s.node.memory.working_set, $k8s.node.memory.available), 100))
  • k8s.node.memory.available
  • k8s.node.memory.usage
  • k8s.node.name
Node Network IO RatesDisplays Network IO RATE_MAX for Transmit and Receive network traffic as a stacked graph, and gives overall network rate and the individual rate for each node.
  • k8s.node.name
  • k8s.node.network.io.receive
  • k8s.node.network.io.transmit
Unhealthy NodesShows errors that Kubernetes nodes are experiencing.
  • k8s.namespace.name
  • k8s.node.name
  • reason
  • severity_text
Node Filesystem UtilizationShows percent of filesystem used on each node.
  • IF(EXISTS($k8s.node.filesystem.usage),MUL(DIV($k8s.node.filesystem.usage,$k8s.node.filesystem.capacity), 100))
  • k8s.node.filesystem.capacity
  • k8s.node.filesystem.usage
  • k8s.node.name
Node Uptime SmokestackAs node uptime ever-increases, this query uses the smokestack method, which applies a LOG10 to the Node Uptime metric, and newly started or restarted nodes appear more significantly than nodes that have been running a long time, which move into a straight line eventually.
  • LOG10($k8s.node.uptime)
  • k8s.node.name
  • k8s.node.uptime
Node Network ErrorsShows network transmit and receive errors for each node.
  • k8s.node.name
  • k8s.node.network.errors.receive
  • k8s.node.network.errors.transmit
Pods and Containers per NodeShows the number of pods and the number of containers per node as stacked graphs, and also shows total number of pods and containers across the environment.
  • k8s.container.name
  • k8s.node.name
  • k8s.pod.name

Kubernetes Workload Health

The Kubernetes Workload Health Board Template includes queries that help you diagnose Kubernetes-related application issues:
Query NameQuery DescriptionRequired Fields
Container RestartsShows the total number of restarts per pod, and the rate of restarts of pods where the restart count is greater than zero.
  • k8s.container.name
  • k8s.container.restarts
  • k8s.namespace.name
  • k8s.pod.name
Unhealthy PodsShows trouble that pods may be experiencing during their operating lifecycle. Many of these events are present during start-up and get resolved so the presence of a count isn’t necessarily bad.
  • k8s.namespace.name
  • k8s.pod.name
  • reason
Pending PodsShows pods in a “Pending” state.
  • k8s.pod.name
  • k8s.pod.phase
Failed PodsShows pods in a “Failed” or “Unknown” state.
  • k8s.pod.name
  • k8s.pod.phase
Unhealthy NodesShows errors that Kubernetes nodes are experiencing.
  • k8s.namespace.name
  • reason
  • k8s.pod.name
  • reason
  • severity_text
Unhealthy VolumesShows volume creation and attachment failures.
  • k8s.namespace.name
  • k8s.pod.name
  • reason
  • severity_text
Unscheduled Daemonset PodsTracks cases where a pod in a daemonset is not currently running on every node in the cluster as it should be.
  • SUB($k8s.daemonset.desired_scheduled_nodes, $k8s.daemonset.current_scheduled_nodes)
  • k8s.daemonset.current_scheduled_nodes
  • k8s.daemonset.desired_scheduled_nodes
  • k8s.daemonset.name
  • k8s.namespace.name
Stateful Set Pod ReadinessTracks any stateful sets where pods are in an non-ready state that should be in a ready state.
  • SUB($k8s.statefulset.desired_pods,$k8s.statefulset.ready_pods)
  • k8s.statefulset.desired_pods
  • k8s.statefulset.name
  • k8s.statefulset.ready_pods
Deployment Pod StatusShows Deployments where Pods have not fully deployed. Numbers greater than zero show pods in a deployment that are not yet “ready”.
  • SUB($k8s.deployment.desired,$k8s.deployment.available)
  • k8s.deployment.available
  • k8s.deployment.desired
  • k8s.deployment.name
Job FailuresTracks the number of failed pods in Kubernetes jobs.
  • k8s.job.failed_pods
  • k8s.job.name
Active Cron JobsTracks the number of active pods in each Kubernetes cron job.
  • k8s.cronjob.active_jobs
  • k8s.cronjob.name

OpenTelemetry

OpenTelemetry Collector Operations

The OpenTelemetry Collector Operations Board Template includes queries with key metrics emitted by the OpenTelemetry Collector during its operation:
Query NameQuery DescriptionRequired Fields
Exporter Span FailuresShows when errors happen during enqueueing or sending in exporters.
  • net.host.name
  • otelcol_exporter_enqueue_failed_spans
  • otelcol_exporter_send_failed_spans
Collector Uptime SmokestacksShows the uptime for different pods with a Log10 to make it clearer where restarts are happening.
  • LOG10($otelcol_process_uptime)
  • net.host.name
  • otelcol_process_uptime
Exporter Metric Send FailuresShows when errors happen during sending from exporters.
  • net.host.name
  • otelcol_exporter_enqueue_failed_metric_points
  • otelcol_exporter_send_failed_metric_points
Exporter Metrics Enqueue FailuresShows when errors happen during enqueueing in exporters.
  • net.host.name
  • otelcol_exporter_send_failed_metric_points
Exporter Log Records FailuresShows when errors happen during enqueueing or sending in exporters.
  • net.host.name
  • otelcol_exporter_enqueue_failed_log_records

OpenTelemetry Java Metrics

The OpenTelemetry Java Metrics Board Template includes queries that help you investigate application issues related to the Java Virtual Machine (JVM).
Metrics for Java applications are sourced from the JVM and reported by the OpenTelemetry Java Agent or Honeycomb OpenTelemetry Distribution for Java.
Query NameQuery DescriptionRequired Fields
JVM Memory Usage (Young Generation)Shows memory usage for Eden space on the JVM heap, which is where newly created objects are stored. When it fills, a minor Garbage Collection (GC) occurs, moving all “live” objects to the Survivor space. In addition to current memory usage, committed represents the guaranteed available memory, and limit represents maximum usable.
  • host.name
  • pool
  • process.runtime.jvm.memory.committed
  • process.runtime.jvm.memory.limit
  • process.runtime.jvm.memory.usage
  • process.runtime.jvm.memory.usage_after_last_gc
  • service.name
  • type
JVM Memory Usage (Old Generation)Shows memory usage for tenured Gen JVM heap space, which stores long-lived objects. When a Full or Major GC is performed, it is expensive and may pause app execution. Committed represents guaranteed available memory, and limit represents maximum usable memory.
  • host.name
  • pool
  • process.runtime.jvm.memory.committed
  • process.runtime.jvm.memory.limit
  • process.runtime.jvm.memory.usage
  • process.runtime.jvm.memory.usage_after_last_gc
  • service.name
  • type
JVM Garbage Collection (GC) ActivityShows JVM garbage collection activity. JVM GC actions occur periodically to reclaim memory but consume CPU cycles to do so. In the worst cases, a GC can cause the entire JVM to pause, making the application appear unresponsive.
  • process.runtime.jvm.gc.duration.count
  • action
  • gc
  • host.name
  • process.runtime.jvm.gc.duration.avg
  • process.runtime.jvm.gc.duration.max
  • service.name
JVM CPU UtilizationShows system CPU utilization and 1-minute load average, as captured by the JVM.
  • host.name
  • process.runtime.jvm.cpu.utilization
  • process.runtime.jvm.system.cpu.load_1m
  • service.name
JVM Buffer Memory UsageShows usage of buffer memory, which is provided by the OS and is outside the JVM’s heap memory allocation. Buffer memory is used by Java NIO to quickly write data to network or disk.
  • host.name
  • process.runtime.jvm.buffer.limit
  • process.runtime.jvm.buffer.usage
  • service.name
JVM Non-Heap Memory UsageShows usage of JVM non-heap memory, which is allocated above and beyond the heap size you’ve configured. JVM non-heap memory is a section of memory in the JVM that stores class information (Metaspace), compiled code cache, thread stack, and so on. It cannot be garbage collected.
  • host.name
  • pool
  • process.runtime.jvm.memory.committed
  • process.runtime.jvm.memory.limit
  • process.runtime.jvm.memory.usage
  • service.name
  • type

AWS

AWS Lambda Health

The AWS Lambda Health Board Template includes queries that monitor the health of AWS Lambda functions, including metrics for invocations, errors, throttles, and concurrency:
Query NameQuery DescriptionRequired Fields
Duration & Execution by ID/VersionTracks the execution time of Lambda functions, identified by their ID or version. Useful for analyzing the performance and efficiency of different versions or instances of a function over time.
  • duration_ms
  • faas.execution
  • faas.name
  • faas.version
Lambda Invocations by FunctionShows the total number of times each Lambda function is invoked. It helps in tracking the frequency of usage of different functions, enabling a clear understanding of which functions are most or least used.
  • FunctionName
  • MetricName
  • Namespace
Latency by Function/MetricShows the response time for each Lambda function, broken down by specific metrics. Useful for identifying functions that may be experiencing performance issues due to high latency.
  • FunctionName
  • MetricName
  • Namespace
  • amazonaws.com/AWS/Lambda/Duration.max
  • amazonaws.com/AWS/Lambda/PostRuntimeExtensionsDuration.max
Function Error Count and RateShows two key pieces of information: the total number of errors encountered by each Lambda function and the error rate, calculated as the ratio of errors to total invocations. Useful for pinpointing functions that are failing or experiencing issues.
  • FunctionName
  • MetricName
  • Namespace
  • amazonaws.com/AWS/Lambda/Errors.count
Lambda ThrottlesShows the instances where Lambda invocations are being throttled, such as when the number of function calls exceeds the concurrency limits. Tracking this helps in managing and optimizing the scalability settings for each function.
  • FunctionName
  • MetricName
  • Namespace
  • amazonaws.com/AWS/Lambda/Throttles.count
Function ConcurrencyMonitors the simultaneous execution count of each Lambda function, tracking how many instances of a function are running at the same time.
  • FunctionName
  • MetricName
  • Namespace
  • amazonaws.com/AWS/Lambda/ConcurrentExecutions.avg
  • amazonaws.com/AWS/Lambda/UnreservedConcurrentExecutions.avg

EC2 Health

The AWS EC2 Board Template includes queries that monitor the health of AWS EC2 instances, including status failures, disk Read and Write operations, and EBS operations:
Query NameQuery DescriptionRequired Fields
CPU UtilizationShows CPU utilization per EC2 instance.
  • amazonaws.com/AWS/EC2/CPUUtilization.max
  • Dimensions.InstanceId
  • cloud.account.id
  • cloud.region
Network I/OShows network input and output per EC2 instance.
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/EC2/NetworkIn.max
  • amazonaws.com/AWS/EC2/NetworkPacketsOut.max
  • Dimensions.InstanceId
EBS Read OperationsShows the number of read operations committed by the instance.
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/EC2/EBSReadOps.max
  • Dimensions.InstanceId
EBS Write OperationsShows the number of write operations committed by the instance.
  • amazonaws.com/AWS/EC2/EBSWriteOps.max
  • Dimensions.InstanceId
  • cloud.account.id
  • cloud.region
EBS IO BalanceShows available input and output per second that attached EBS volumes are utilizing. Use to monitor potential throttling on an EBS volume attached to an instance.
  • amazonaws.com/AWS/EC2/EBSIOBalance%.max
  • Dimensions.InstanceId
  • cloud.account.id
  • cloud.region
Instance Metadata Service OutliersShows the number of instances that are not currently using IMDSv2. Use to identify potential security issues with EC2 instances.
  • amazonaws.com/AWS/EC2/MetadataNoToken.max
  • Dimensions.InstanceId
  • cloud.account.id
  • cloud.region
EC2 Disk Read/WriteShows Write and Read operations undertaken by EC2 instances. Use to monitor EBS volume usage.
  • amazonaws.com/AWS/EC2/EBSWriteBytes.max
  • amazonaws.com/AWS/EC2/EBSReadBytes.max
  • Dimensions.InstanceId
  • Namespace
EC2 Instance Status FailuresShows any EC2 instances that have failed a status check in the provided time period.
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/EC2/StatusCheckFailed.max
  • Dimensions.InstanceId

AWS ALB/ELB Health

The AWS ALB/ELB Board Template includes queries that monitor the Load Balancer’s health, status codes, active connections, and requests.
This template relies on AWS Metrics streams provided by AWS Cloudwatch. Data is streamed from an AWS Kinesis Data Firehose to an endpoint compatible with CloudWatch Metric Streams. To use this template, you must provision a metrics stream for EC2 instances that you wish to monitor.
Query NameQuery DescriptionRequired Fields
Request Count Per TargetShows how requests are distributed across targets. Use to diagnose imbalanced traffic in the load balancer.
  • cloud.region
  • Dimensions.AvailabilityZone
  • amazonaws.com/AWS/ApplicationELB/RequestCountPerTarget.count
  • Dimensions.LoadBalancer
  • Dimensions.TargetGroup
  • cloud.account.id
Healthy vs. Unhealthy Host CountShows the number of healthy versus unhealthy hosts per load balancer, which is segmented across target groups and availability zones. Use to quickly spot failing load balancer targets.
  • amazonaws.com/AWS/ApplicationELB/HealthyHostCount.max
  • amazonaws.com/AWS/ApplicationELB/UnHealthyHostCount.max
  • Dimensions.LoadBalancer
  • Dimensions.TargetGroup
  • cloud.account.id
  • Dimensions.AvailabilityZone
Load Balancer Status CodesShows status codes per load balancer. Use to identify routing or traffic management issues.
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/ApplicationELB/HTTPCode_ELB_3XX_Count.count
  • amazonaws.com/AWS/ApplicationELB/HTTPCode_ELB_4XX_Count.count
  • amazonaws.com/AWS/ApplicationELB/HTTPCode_ELB_5XX_Count.count
  • Dimensions.LoadBalancer
Active ConnectionsShows active connections per load balancer.
  • amazonaws.com/AWS/ApplicationELB/ActiveConnectionCount.count
  • Dimensions.LoadBalancer
  • cloud.account.id
  • cloud.region
State RoutingShows load balancer state routing. Use to identify network configuration errors, unresponsive applications, or health check delays.
  • amazonaws.com/AWS/ApplicationELB/UnhealthyStateRouting.max
  • Dimensions.LoadBalancer
  • Dimensions.TargetGroup
  • Dimensions.AvailabilityZone
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/ApplicationELB/HealthyStateRouting.max
Load Balancer Capacity UnitsShows LCUs consumed during a given period of time. Use to optimize load balancer cost and detecting bottlenecks.
  • Dimensions.LoadBalancer
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/ApplicationELB/PeakLCUs.max
Anomalous Host CountShows the number of hosts behaving abnormally. Use to detect and diagnose excessive error rates, latency issues, or inconsistent health check results.
  • amazonaws.com/AWS/ApplicationELB/AnomalousHostCount.max
  • Dimensions.LoadBalancer
  • Dimensions.TargetGroup
  • cloud.account.id
DNS Target StateShows load balancer DNS target state resolution. Use to identify failing targets and DNS misconfigurations.
  • amazonaws.com/AWS/ApplicationELB/HealthyStateDNS.max
  • amazonaws.com/AWS/ApplicationELB/HealthyStateDNS.count
  • amazonaws.com/AWS/ApplicationELB/UnhealthyStateDNS.max
  • Dimensions.LoadBalancer
  • Dimensions.TargetGroup
  • cloud.account.id
  • Dimensions.AvailabilityZone
TLS Negotiation ErrorsShows the number of TLS negotiation errors per load balancer.
  • amazonaws.com/AWS/ApplicationELB/ClientTLSNegotiationErrorCount.count
  • Dimensions.LoadBalancer
  • Dimensions.AvailabilityZone
  • cloud.account.id
  • cloud.region
Connection Error CountShows errors on targets. Use to diagnose and troubleshoot misconfigured load balancer targets.
  • Dimensions.TargetGroup
  • amazonaws.com/AWS/ApplicationELB/TargetConnectionErrorCount.max
  • Dimensions.LoadBalancer
  • cloud.account.id
  • cloud.region

SQS

The SQS Board Template provides insight into critical AWS SQS operations.
This template relies on AWS Metrics streams provided by AWS Cloudwatch. Data is streamed from an AWS Kinesis Data Firehose to an endpoint compatible with CloudWatch Metric Streams. To use this template, you must provision a metrics stream for EC2 instances that you wish to monitor.
Query NameQuery DescriptionRequired Fields
Request Count Per MinuteShows requests made per minute. Use to observe the traffic patterns and detect unexpected load or errors.
  • telemetry.sdk.language
  • http.host
  • http.route
  • http.method
  • http.status_code
  • http.server_name
HTTP Response DurationShows the P95 response duration by route, status code, and server name. Use for Django HTTP performance.
  • http.route
  • http.method
  • http.status_code
  • http.server_name
  • telemetry.sdk.language
  • http.response.body.size
  • duration_ms
HTTP ErrorsShows count of HTTP errors by route, status code, and host.name. Use to assess success and error rates of APIs.
  • http.status_code
  • http.server_name
  • error
  • telemetry.sdk.language
  • http.route
  • http.method
ExceptionsShows exceptions thrown in the service. Use to assess the overall health of the application.
  • http.server_name
  • exception.type
  • code.namespace
  • exception.message
  • exception.stacktrace
  • telemetry.sdk.language
AVG and P95 Request SizeShows the average and P95 HTTP request size. Use to monitor payload efficiency.
  • http.server_name
  • telemetry.sdk.language
  • http.request.body.size
  • http.route
  • http.method
  • http.status_code
AVG and P95 Response SizeShows the average and P95 HTTP response size. Use to monitor payload efficiency.
  • telemetry.sdk.language
  • http.response.body.size
  • http.route
  • http.method
  • http.status_code
  • http.server_name
P95 and Heatmap of Job DurationShows the P95 and a heatmap of Job Duration by messaging destination, messaging system, and server name. Provides insights into status async job runners.
  • http.server_name
  • telemetry.sdk.language
  • duration_ms
  • messaging.destination
  • messaging.system
Jobs ExecutedShows count of root traces with messaging system and destination. Use to assess overall performance of the async job operations.
  • http.server_name
  • messaging.destination
  • messaging.system
  • telemetry.sdk.language
  • messaging.destination_kind
DB connection Count Per MinShows the connection count per minute where database connection event is “open”. Use to gain visibility into connection pooling efficiency.
  • telemetry.sdk.language
  • db.operation
  • db.system
  • db.name
  • db.connection.event

RDS

The RDS Board Template provides insight to monitor and optimize performance for AWS RDS databases.
This template relies on AWS Metrics streams provided by AWS Cloudwatch. Data is streamed from an AWS Kinesis Data Firehose to an endpoint compatible with CloudWatch Metric Streams. To use this template, you must provision a metrics stream for EC2 instances that you wish to monitor.
Query NameQuery DescriptionRequired Fields
Number of ConnectionsShows the number of connections to RDS instances.
  • amazonaws.com/AWS/RDS/DatabaseConnections.count
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
Database LoadShows the level of session activity on RDS instances.
  • amazonaws.com/AWS/RDS/DBLoad.max
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
Disk Queue DepthShows the number of outstanding input/output waiting to access the disk. High queue depth can indicate the workload is generating more read/write requests than underlying storage can handle.
  • amazonaws.com/AWS/RDS/DiskQueueDepth.max
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • amazonaws.com/AWS/RDS/DiskQueueDepth.count
Freeable MemoryShows the minimum freeable memory per database instance. Use to identify memory pressure in RDS instances.
  • amazonaws.com/AWS/RDS/FreeableMemory.min
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • amazonaws.com/AWS/RDS/FreeableMemory.count
Read/Write OperationsShows the read and write operations per second that the RDS instance is performing. Use to diagnose bottlenecks, optimize workloads, and manage cost.
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • amazonaws.com/AWS/RDS/WriteIOPS.max
  • amazonaws.com/AWS/RDS/ReadIOPS.max
CPU UtilizationShows maximum CPU utilization across database instance identifiers.
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • amazonaws.com/AWS/RDS/CPUUtilization.max
Free Storage SpaceShows the amount of free storage space per database instance.
  • amazonaws.com/AWS/RDS/FreeStorageSpace.max
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
Burst BalanceShows the burst capacity per database instance. Lower burst capacity can affect input/output performance. Use for capacity planning and to optimize database performance.
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • amazonaws.com/AWS/RDS/BurstBalance.sum
Read/Write LatencyVisualizes Read/Write latency per database instance. Use for troubleshooting slow queries, inefficient indexes, or locking issues.
  • amazonaws.com/AWS/RDS/WriteLatency.sum
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • amazonaws.com/AWS/RDS/ReadLatency.sum
Transaction Log Disk UsageShows the amount of storage consumed by database transaction logs. Use to prevent storage exhaustion.
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/RDS/TransactionLogsDiskUsage.max
Checkpoint LagShows checkpoint lag. Use to determine latency between leader and followers in replication.
  • amazonaws.com/AWS/RDS/CheckpointLag.max
  • Dimensions.DBInstanceIdentifier
Swap UsageShows swap activity (from RAM to disk) per RDS instance. Use for identifying performance issues related to memory pressure.
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/RDS/SwapUsage.max
  • Dimensions.DBInstanceIdentifier
Network ThroughputShows the rate at which network data is being sent from RDS instances. Use to identify excessive data transfer or increased query latencies.
  • amazonaws.com/AWS/RDS/NetworkTransmitThroughput.max
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • cloud.region

Honeycomb Features

Refinery Operations

For teams using Refinery to sample their data, the Refinery Board Template provides an overview of sampling operations.
Refinery emits metrics that provide insights into its health, trace throughput, and sampling statistics. Required fields in the Refinery Board Template map to these metrics and populate automatically when sent to Honeycomb. To learn more about these fields, visit Refinery Configuration.
Query NameQuery DescriptionRequired Fields
Stress Relief StatusShows the current stress level on the Refinery cluster.
  • stress_level
  • stress_relief_activated
  • hostname or host.name
Dropped From StressShows how many traces are being dropped due to stress on the Refinery cluster.
  • dropped_from_stress
  • hostname or host.name
Stress Relief LogShows reasons why Refinery is going into stress relief.
  • StressRelief
  • reason
  • msg
  • hostname or host.name
Cache HealthShows metrics for cache health.
  • collect_cache_buffer_overrun
  • memory_inuse
  • collect_cache_entries_max or collect_cache_entries.max
  • collect_cache_capacity
  • num_goroutines
  • process_uptime_seconds
  • hostname or host.name
Cache EjectionsShows number of traces ejected from cache.
  • trace_send_ejected_full
  • trace_send_ejected_memsize
  • hostname or host.name
IntercommunicationsShows total events from outside Refinery and events redirected from a peer.
  • incoming_router_span
  • peer_router_batch
  • hostname or host.name
Receive BuffersShows receive buffer operations.
  • incoming_router_dropped
  • peer_router_dropped
  • hostname or host.name
Peer Send BuffersShow metrics for the queue used to buffer spans to send to peer nodes.
  • libhoney_peer_queue_overflow
  • libhoney_peer_send_errors
  • hostname or host.name
Upstream Send BuffersShows metrics for the queue used to buffer spans to send to Honeycomb.
  • libhoney_upstream_queue_length
  • libhoney_upstream_enqueue_errors
  • libhoney_upstream_response_errors
  • libhoney_upstream_send_errors
  • libhoney_upstream_send_retries
  • hostname or host.name
EMADynamicSampler PerformanceShows EMADynamicSampler sampling effectiveness.
  • emadynamic_sample_rate_avg
  • emadynamic_keyspace_size
  • emadynamic_num_kept
  • emadynamic_num_dropped
EMAThroughputSampler PerformanceShows EMAThroughputSampler sampling effectiveness.
  • emathroughput_sample_rate_avg
  • emathroughput_keyspace_size
  • emathroughput_num_kept
  • emathroughput_num_dropped
WindowedThroughput PerformanceShows WindowedThroughput sampling effectiveness.
  • windowedthroughput_sample_rate_avg
  • windowedthroughput_keyspace_size
  • windowedthroughput_num_kept
  • windowedthroughput_num_dropped
TotalThroughputSampler PerformanceShows TotalThroughputSampler sampling effectiveness.
  • totalthroughput_sample_rate_avg
  • etotalthroughput_keyspace_size
  • totalthroughput_num_kept
  • totalthroughput_num_dropped
DynamicSampler PerformanceShows DynamicSampler sampling effectiveness.
  • dynamic_sample_rate_avg
  • dynamic_keyspace_size
  • dynamic_num_kept
  • dynamic_num_dropped
RulesBasedSampler PerformanceShows RulesBasedSampler sampling effectiveness.
  • rulesbased_sample_rate_avg
  • rulesbased_num_kept
  • rulesbased_num_dropped
Trace IndicatorsShows total traces sent before completion and span received for a trace already sent.
  • trace_sent_cache_hit
  • trace_send_no_root
Sampling DecisionsShows total traces accepted and sent or dropped.
  • trace_accepted
  • trace_send_dropped
  • trace_send_kept
Refinery Send Event Error LogsShows errors when sending events to its peers or upstream to our API server.
  • msg
  • dataset
  • api_host
  • error
Refinery Handler Event Error LogsShows errors when receiving or parsing events being sent to a node.
  • msg
  • dataset
  • api_host
  • error.err
  • error.msg
Refinery Events Exceeding Max SizeShows errors when events are too large to be sent to Honeycomb.
  • msg
  • dataset
  • api_host
  • error

Activity Log Security

The Activity Log Security Board Template includes queries that track API Key activity.
Honeycomb automatically creates the required fields for the Activity Log Board Templates when it generates Activity Log events.
Query NameQuery DescriptionRequired Fields
API Key Added PermissionsShows when permissions are added to an existing API key.
  • resource.type
  • resource.changed_fields
  • environment.slug
API Key Activities by UserDisplays the number of changes to API keys broken down by user.
  • key_type
  • environment.slug
  • user.email
  • resource.action
Authentication Type by UserDisplays which type of authentication is used for each user.
  • authentication_method
  • user.email

Activity Log Leaderboard

The Activity Log Leaderboard Board Template includes queries that highlight advanced and frequent usage of Honeycomb by your team.
Honeycomb automatically creates the required fields for the Activity Log Board Templates when it generates Activity Log events.
Query NameQuery DescriptionRequired Fields
Queries by UserShows which environments are being queried.
  • resource.type
  • user.email
Complex Queries by UserShows which users frequently use Visualize, Where, and Having clauses.
  • resource.type
  • SUM( IF(EXISTS($query.having), 3, 0), REG_COUNT($query.where, `,`), REG_COUNT($query.visualize, `,`))
  • user.email
Top Query VisualizationsShows the most commonly used visualizations.
  • resource.type
  • SUM( IF(EXISTS($query.having), 3, 0), REG_COUNT($query.where, `,`), REG_COUNT($query.visualize, `,`))
  • query.visualize
Top TinkerersLists which users perform the most updates to SLOs, Triggers, and Calculated Fields.
  • resource.type
  • user.email
Queries by DatasetShows which datasets are being queried the most.
  • resource.type
  • environment.slug
  • dataset.slug
Queries by EnvironmentShows a count of run queries as grouped by environment.
  • resource.type
  • environment.slug

Activity Log Trigger and SLO Activity

The Activity Log Trigger and SLO Activity Board Template includes queries related to trigger and SLO activations and modifications.
Honeycomb automatically creates the required fields for the Activity Log Board Templates when it generates Activity Log events.
Query NameQuery DescriptionRequired Fields
Trigger State ChangesShows instances when triggers have been triggered or resolved.
  • resource.type
  • resource.action
  • name
Trigger ModificationsShows creations, modifications, and deletions of triggers.
  • resource.type
  • resource.action
Most Updated TriggersShows triggers that received the most changes recently.
  • resource.type
  • resource.action
  • name
Top Updated SLOs by Update TypeShows creations, modifications, and deletions of SLOs and the supporting SLI (Calculated Field).
  • resource.type
  • resource.action
  • environment.slug
  • resource.changed_fields
  • name
  • user.email
SLOs Created and DeletedShows creation and deletion of SLOs.
  • resource.type
  • resource.action
  • environment.slug
  • name
  • resource.changed_fields
  • user.email
SLI Expression Changes by SLOShows when SLIs (Calculated Fields) related to SLOs have been changed.
  • resource.type
  • resource.action
  • resource.changed_fields
  • environment.slug
  • name
  • sli.expression
  • before.sli.expression
  • user.email

Artificial Intelligence

Anthropic Usage & Cost Monitoring

The Anthropic Usage & Cost Monitoring Board Template provides comprehensive insights into your Anthropic API usage and costs, including token consumption, feature usage, and cost attribution across models, workspaces, and API keys.
This Board Template requires data from the Anthropic Usage & Cost Monitoring integration. The integration uses a custom OpenTelemetry collector to collect usage metrics and cost data from the Anthropic Admin API.
Key visualizations include: Usage Analytics:
  • Token Usage Over Time: Track input, output, and cache token consumption trends
  • Usage by Model: Compare token usage across different Claude models
  • Workspace Usage Distribution: Monitor usage patterns across different workspaces
  • API Key Activity: Track usage by individual API keys for access control insights
Cost Monitoring:
  • Daily Cost Trends: Monitor spending patterns over time
  • Cost by Model: Understand which models drive the highest costs
  • Workspace Cost Attribution: Allocate costs across different teams or projects
  • Cost per Token Analysis: Calculate cost efficiency metrics
Performance Insights:
  • Cache Hit Rate: Monitor cache utilization to optimize costs
  • Feature Usage: Track web search and other feature utilization
  • Service Tier Distribution: Analyze usage across different API service tiers
The Board Template automatically populates when the Anthropic Usage Receiver sends metrics and logs to Honeycomb. Required fields include model, workspace_id, service_tier, and cost-related attributes like amount_minor_units and description.

Troubleshooting

To explore common issues when working with Board Templates, visit Common Issues with Visualization: Board Templates.