Use Board Templates

Get instant insights into your system with Board Templates.

Tip

This functionality is available only for teams using Honeycomb’s current data model. If you use Honeycomb Classic, we recommend migrating to Honeycomb Environments, so you can take advantage of its expanded data model and future product updates.

What is a Board Template?

Board Templates are pre-configured Boards that come with ready-made queries and visualizations, providing valuable insights with minimal set up. Use a template as starting point to create a Board.

Templates are designed for specific use cases and built around industry best practices, ensuring effective configurations for tracking key metrics and visualizing data accurately.

Board Templates At a Glance

Choose from a variety of templates to quickly gain insights across different areas of your system:

General:
- Service Health: Provides insights into service health, including request volumes and where slowest requests occur.
- MySQL Operations: Provides insights into MySQL database operations, including thread count by type, query rate, resource usage, and row/table locks.
- Redis: Provides insights into Redis primary and replica nodes, including command activity, latency/volume and execution time, expired keys, and CPU consumption.
- Airflow: Provides an overview of data workflow performance. Monitoring Airflow operations can highlight problems which may occur in the process of running data pipelines.
- Kafka: Provides insight into Kafka brokers, topics, partition, and consumers.
- Linux Host: Provides useful queries for monitoring Linux hosts. It provides insights into CPU, memory, disk, filesystem, and network utilization on the configured hosts.
- Postgres: Provides insight into Postgres’s operations, including active connections, database size, table count, and transaction throughput.
- Spring Boot: Provides insight into application health and performance metrics for your Spring Boot microservices.
- Django: Provides insight into application heath and performance metrics for your Django application.
- Rails: Provides queries to help investigate the performance and health of your Rails application.
Frontend Investigation
- Real User Monitoring (RUM): Displays real user monitoring data for frontend applications, including performance and user experience insights.
- Android Auto-Instrumentation: Displays auto-instrumentation data for Android applications provided by the Honeycomb OpenTelemetry Android SDK.
- iOS Auto-Instrumentation: Displays auto-instrumentation data for iOS applications provided by the Honeycomb OpenTelemetry Swift SDK.
Kubernetes:
- Kubernetes Pod Metrics: Helps you investigate pod performance and resource usage within Kubernetes clusters.
- Kubernetes Node Metrics: Helps you investigate node performance and resource usage within Kubernetes clusters.
- Kubernetes Workload Health: Helps you investigate application problems related to Kubernetes workloads.
OpenTelemetry:
- OpenTelemetry Collector Operations: Provides metrics emitted by the OpenTelemetry Collector during operation.
- OpenTelemetry Java Metrics: Offers insights into Java Virtual Machine (JVM) health and performance via metrics reported by OpenTelemetry Java Agent or Honeycomb OpenTelemetry Distribution for Java.
Amazon Web Services (AWS):
- AWS Lambda Health: Provides information about AWS Lambda function health, including invocations, errors, throttles, and concurrency.
- EC2 Health: Provides information about AWS EC2 instance, status failures, and EBS read/write operations.
- ALB/ELB Health: Provides information about AWS Load Balancers, including Load Balancer’s health, status codes, active connections, and requests.
- SQS: Provides insight into critical AWS SQS operations.
- RDS: Provides insight to monitor and optimize performance for AWS RDS databases.
Honeycomb Features:
- Refinery Operations: Shows an overview of sampling operations, including trace throughput and sampling statistics. Automatically populated by Refinery metrics sent to Honeycomb.
- Activity Log Security: Displays queries showing API Key activity.
- Activity Log Leaderboard: Displays queries showing advanced and frequent Honeycomb usage by your team.
- Activity Log Trigger and SLO Activity: Displays queries related to trigger and SLO activations and modifications.

General

Service Health

The Service Health Board Template offers an overview of your services’ health. It provides insights into request volumes, identifies where the slowest requests are occurring, and more.

Tip

The Service Health Board Template relies on your source data fields being mapped to Honeycomb standard fields. To learn how to map your fields, visit Dataset Definitions.

The Service Health Board Template includes the following queries:

Query Name	Query Description	Required Fields
Trace Counts by Service	Shows total trace volume by service.	Parent span ID or `trace.parent_id` Service name or `service.name` or `service_name`
Trace Counts by HTTP Status Code	Shows total trace volume by status code.	Parent span ID or `trace.parent_id` HTTP Status Code or `http.response.status.code` or `http.status_code`
Trace Duration Heatmap	Shows a heatmap of the duration for all traces.	Span duration or `duration_ms` Parent span ID or `trace.parent_id`
Duration Heatmap	Shows a heatmap of duration across all services.	Span duration or `duration_ms`
Duration by Service	Shows key duration percentiles by service.	Span duration or `duration_ms` Service name or `service.name` or `service_name`
Duration by Route	Shows duration by route or endpoint.	Span duration or `duration_ms` Route or `http.route`
Duration by Name	Shows duration by function name.	Span duration or `duration_ms` Name or `name`
Errors by Service	Shows a count of errors grouped by service.	Error or `error` Service name or `service.name` or `service_name`
Errors by Route	Shows a count of errors grouped by route or endpoint.	Error or `error` Route or `http.route`

MySQL Operations

The MySQL Board Template provides insights into MySQL database operations, including thread count by type, query rate, resource usage, and row/table locks.

Tip

This Board Template relies on the MySQL metrics receiver provided by the OpenTelemetry Collector Contrib distribution. View OpenTelemetry documentation for set up instructions.

The MySQL Board Template includes the following queries:

Query Name	Query Description	Required Fields
Server Status	Shows server uptime. Use to track server restarts.	`mysql.uptime` `mysql.instance.endpoint`
Buffer Pool Pages	Shows the number of pages in the InnoDB buffer pool by type. Use to understand buffer pool utilization.	`mysql.instance.endpoint` `kind` `mysql.buffer_pool.pages`
Buffer Pool Data Pages	Shows the number of data pages in the InnoDB buffer pool by status (clean or dirty). Use to track page writes to disk.	`mysql.buffer_pool.data_pages` `mysql.instance.endpoint` `status`
Buffer Pool Page Flushes	Shows the rate of page flush requests from the InnoDB buffer pool. Use to help identify input/output pressure.	`mysql.instance.endpoint` `mysql.buffer_pool.page_flushes`
Buffer Pool Operations	Shows buffer pool operations by type. Use to identify patterns in buffer pool usage.	`mysql.instance.endpoint` `operation` `mysql.buffer_pool.operations`
Row and Page Operations	Shows the rate of InnoDB row and page operations. Use to provide insight into database workload and input/output patterns.	`mysql.row_operations` `mysql.page_operations` `mysql.instance.endpoint` `operation`
Doublewrite Rate	Shows the rate of writes to the InnoDB doublewrite buffer. Use to understanding database durability.	`kind` `mysql.double_writes` `mysql.instance.endpoint`
Handler Requests and Thread Status	Shows the rate of requests to various handlers and the state of system threads. Provides insight into how the database is processing queries and allows monitoring of connection usage and thread efficiency.	`mysql.handlers` `mysql.threads` `mysql.instance.endpoint` `kind`
Row and Table Locks	Shows InnoDB lock statistics, and MySQL Table locks. Use to help identify lock contention.	`mysql.row_locks` `mysql.instance.endpoint` `kind` `mysql.locks`
Resource Usage	Shows the rate of opened resources and temporary resources. Use to help identify resource utilization, and the usage of temporary tables or files.	`mysql.tmp_resources` `mysql.instance.endpoint` `resource` `mysql.opened_resources`
Query Rate	Shows query throughput and slow query rates across MySQL instances. Use to pinpoint instances with the highest query load.	`mysql.query.count` `mysql.query.slow.count` `mysql.instance.endpoint`
Thread Count by Type	Shows thread count by type. Use to indicate operations currently being performed by the set of threads executing within the server.	`kind` `mysql.threads` `mysql.instance.endpoint`
Table Open Cache Efficiency	Shows Table Cache Efficiency. Use to monitor filesystem input/output within the instances.	`mysql.table_open_cache` `mysql.instance.endpoint` `status`

Redis

The Redis Board Template provides insights into Redis primary and replica nodes, including command activity, latency/volume and execution time, expired keys, and CPU consumption.

Tip

This Board Template utilizes the Redis receiver provided by the OpenTelemetry Collector Contrib distribution. View OpenTelemetry documentation for set up instructions.

Note that the Redis receiver does not automatically publish some key server attributes, like address or port. The visualizations on this Board Template utilize server address to ensure that visualizing across multiple Redis instances is possible.

The Redis Board Template includes the following queries:

Query Name	Query Description	Required Fields
Cache Connections	Shows connections received and rejected per server. Use to diagnose connectivity issues.	`redis.connections.received` `redis.connections.rejected` `server.address`
Uptime	Shows the number of seconds since a server start by server.	`server.address` `redis.uptime`
Server Durability	Shows the number of write operations that have happened since the last successful RDB snapshot. Use to track durability issues per server.	`redis.rdb.changes_since_last_save` `server.address`
Key Count	Shows the number of keys per database and per server.	`redis.db.keys` `server.address` `db`
Server CPU Time	Shows the CPU consumed by Redis server since server start.	`server.address` `redis.cpu.time`
Client Activity	Shows Redis client activity per server address and activity between connected and blocked clients.	`redis.clients.connected` `redis.clients.blocked` `server.address` `redis.version`
Command Activity	Shows the number of commands processed per second and the number of commands processed by the server. Use to track operational load of servers.	`redis.commands.processed` `redis.commands` `server.address`
Client I/O	Shows the input/output buffers of Redis clients by server. Use to diagnose or troubleshoot input/output issues with clients.	`redis.clients.max_input_buffer` `redis.clients.max_output_buffer` `server.address`
Network Activity	Shows network input/output by server.	`redis.net.output` `server.address` `redis.net.input`
P99 Command Latency	Shows the `P99` of command latency. Use to identify anomalous commands.	`redis.cmd.latency` `cmd` `server.address` `percentile`
Command Volume and Execution Time	Shows the number of calls for a command and the total time for all executions of a command per server.	`redis.cmd.calls` `redis.cmd.usec` `server.address` `cmd`
Average Command Latency	Shows the average latency of commands by server. Use to understand the baseline latency of a command.	`percentile` `redis.cmd.latency` `server.address` `cmd`
Expired Keys	Shows the total number of key expiration events per server.	`redis.keys.expired` `server.address`
Keyspace Hits and Misses	Shows the number of successful and failed key lookups per server.	`redis.keyspace.hits` `redis.keyspace.misses` `server.address`
Memory Profile	Shows memory metrics per server.	`redis.memory.peak` `redis.memory.fragmentation_ratio` `redis.memory.rss` `redis.memory.lua` `server.address` `redis.memory.used`
Primary Replication	Shows the replication offsets per server.	`redis.replication.offset` `redis.replication.backlog_first_byte_offset` `server.address` `redis.slaves.connected`
Follower Replication	Shows the replication offset for follower instances.	`redis.replication.replica_offset` `server.address` `redis.slaves.connected`

Airflow

The Airflow Board Template gives an overview of data workflow performance. Monitoring Airflow operations can highlight problems which may occur in the process of running data pipelines.

Tip

The required fields in the Airflow Board Template are derived from Airflow’s support for OpenTelemetry logs, metrics, and traces.

View our documentation about instrumenting your Python data pipelines and applications.

The Airflow Board Template includes the following queries:

Query Name	Query Description	Required Fields
DAG Processing Import Errors	Shows the sum of the number of errors from trying to parse DAG files by `host.name`. Parsing errors prevent DAGs from being loaded. Tracking these errors helps identify configuration or syntax issues that need immediate attention.	`airflow.dag_processing.import_errors` `host.name`
DAG Processing Import Errors by File Path	Shows the sum of the number of errors during import and parse of DAG files, broken out by DAG File Path and `host.name`. Tracking these errors helps identify configuration or syntax issues with a given file or host.	`host.name` `import_errors` `file_path`
Duration of Tasks (AVG, P95)	Shows the average and `P95` duration of a Task by DAG ID, task ID, and `host.name`. Execution time helps identify which specific tasks are performance bottlenecks, allowing you to optimize your workflows. Note: Uses trace signal type.	`host.name` `meta.signal_type` `duration_ms` `task_id` `dag_id`
DAG Failed Duration (AVG)	Shows the average duration in milliseconds (ms) taken for a DagRun to reach a failed state by DAG ID and `host.name`. Failed DAG runs consume valuable resources. Monitoring this metric helps to identify inefficient failure patterns.	`dag_id` `host.name` `airflow.dagrun.duration.failed`
DAG Success Duration (AVG)	Shows the average duration in milliseconds (ms) for a DagRun to reach success state by DAG ID and `host.name`. Monitoring duration allows you to optimize resource allocation and set appropriate SLAs.	`airflow.dagrun.duration.success` `dag_id` `host.name`
Task Counts	Shows the count of Tasks grouped by DAG ID, task ID, `host.name`, and `state`. Use the overall workflow health and the proportion of tasks experiencing issues to highlight potential issues with Airflow operations. Note: Uses trace signal type.	`host.name` `state` `dag_id` `task_id`
DAG Schedule Delay	Shows the average duration in milliseconds (ms) of delay between the scheduled DagRun start date and the actual DagRun start date, grouped by DAG ID and `host.name`. Use to identify scheduler bottlenecks, resource constraints, or overloaded Airflow instances that prevents timely workflow execution.	`dag_id` `host.name` `airflow.dagrun.schedule_delay`
Scheduler Tasks	Shows the sum of Airflow Scheduler Tasks that are executing or starving by host ID. Use to understand scheduler load, identify periods when the scheduler might be overwhelmed with too many tasks, and ensure task distribution works as expected.	`host.name` `airflow.scheduler.tasks.executable` `airflow.scheduler.tasks.starving`
Executor Tasks	Shows the maximum count of Executor Tasks (queued, running and open slots), grouped by `host.name`. Note that Queued reflects the number of queued tasks on executor, Running reflects the number of running tasks on executor, and Open Slots reflects the number of open slots on executor.	`executor.open_slots` `host.name` `executor.queued_tasks` `executor.running_tasks`
Pool Task Slots by Host	Shows the maximum count of Airflow Pool Slots - Deferred, Queued, Open, Running, Starving and Scheduled by Host. Can be used to monitor resource allocation, identify when pools are at capacity, and optimize your configuration to match your workflow needs.	`airflow.pool.open_slots` `airflow.pool.running_slots` `airflow.pool.starving_tasks` `host.name` `pool_name` `airflow.pool.queued_slots` `airflow.pool.scheduled_slots` `airflow.pool.deferred_slots`

Kafka

The Kafka Board Template provides insight into Kafka brokers, topics, partition, and consumers.

Tip

This Board Template relies on the Kafka Metrics receiver provided by the OpenTelemetry Collector Contrib distribution. View OpenTelemetry documentation for set up instructions.

For relevant Java Virtual Machine (JVM) metrics, the OpenTelemetry Java Agent should be included in Kafka nodes as well.

The Kafka Board Template includes the following queries:

Query Name	Query Description	Required Fields
Number of Active Brokers	Shows the number of active brokers.	`kafka.brokers`
Consumer Group Membership	Shows the number of consumers per broker.	`group` `host.name` `kafka.consumer_group.members`
Consumer Progress Lag vs Offset Rate	Shows the average rate of Kafka consumer group lag and offsets over time, grouped by topic partitions. Use to monitor consumer progress and to detect delays by comparing offset increases to lag.	`host.name` `kafka.consumer_group.lag` `kafka.consumer_group.offset` `topic` `group`
Partition Offset Overview	Shows the rate of change in the oldest and current offsets across Kafka partitions.	`kafka.partition.current_offset` `topic` `host.name` `kafka.partition.oldest_offset`
Partition Count By Topic	Shows the number of partitions for each topic. Use for capacity planning an ensuring proper topic configuration.	`topic` `host.name` `kafka.partition.current_offset` `partition`
Partition Replication Health	Shows the number of in-sync replicas for each partition compared to total replicas. Use to identify under-replicated partitions.	`kafka.partition.replicas_in_sync` `kafka.partition.replicas` `topic` `partition` `host.name`
Consumer Group Lag by Topic	Shows total lag across all partitions for each consumer group and topic combination.	`group` `topic` `kafka.consumer_group.lag_sum`
Partition Balance Analysis	Shows distribution of offsets across partitions for each topic. Use to identify potential partition imbalances.	`kafka.partition.current_offset` `topic` `partition`
High Consumer Lag	Shows high consumer group lag, which may indicate potential consumer issues.	`group` `topic` `host.name` `kafka.consumer_group.lag_sum`
Message Throughput	Shows the approximate message throughput for each topic by measuring the rate of change in offset over time.	`kafka.partition.current_offset` `topic` `host.name`
JVM Thread Count by Cluster and State	Shows the total JVM thread count across Kafka clusters, grouped by thread state. Use to identify thread contention or resource leaks.	`host.name` `jvm.thread.count` `kafka.cluster.alias` `jvm.thread.state` `service.name` `service.instance.id` `jvm.thread.daemon`
JVM Garbage Collection Durations	Shows the median JVM and the `P90` garbage collection durations. Use to understand garbage collection efficiency and memory management health.	`jvm.gc.duration.p50` `jvm.gc.duration.p90` `kafka.cluster.alias` `service.name` `jvm.gc.action` `jvm.gc.name` `host.name`
Max Recent JVM CPU Utilization	Shows the highest CPU utilization within the JVM at a default 30 minute window. Use to identify potential load spikes or bottlenecks that may affect your cluster.	`kafka.cluster.alias` `service.name` `host.name` `jvm.cpu.recent_utilization`
JVM Memory Usage and Commitment	Shows memory usage patterns in clusters, providing a view in how memory is used and committed in the JVM. Use to track inefficient memory usage.	`jvm.memory.used` `jvm.memory.committed` `kafka.cluster.alias` `jvm.memory.type` `jvm.memory.pool.name` `host.name`

Linux Host

The Linux Host Board Template provides useful queries for monitoring Linux hosts. It provides insights into CPU, memory, disk, filesystem, and network utilization on the configured hosts.

This Board Template utilizes the Host Metrics receiver provided by the OpenTelemetry Collector Contrib distribution. View OpenTelemetry documentation for set up instructions.

Tip

Configuration of the hostmetrics receiver for this Board Template requires specific scrapers to be configured, namely:

CPU
Disk
Load
Filesystem
Memory
Network
Paging
Processes
Process

The Linux Host Board Template includes the following queries:

Query Name	Query Description	Required Fields
Process CPU Time Breakdown	Shows the total CPU time consumed by different processes, broken down by process owner and command. Use to identify which processes are consuming the most CPU resources over time.	`process.owner` `process.executable.name` `os.type` `process.cpu.time` `host.name`
Memory Consumption Trends	Shows the average memory usage across host, operating system, and state. Use to monitor and diagnose system memory usage trends.	`state` `os.type` `system.memory.usage` `host.name`
CPU Utilization Trends	Shows the distribution of CPU time spent on user processes, system operations, and idle time. Use to identify which hosts are under load.	`os.type` `system.cpu.time.user` `system.cpu.time.system` `system.cpu.time.idle` `host.name`
Disk I/O	Shows the active Disk input and output based on device. Use to identify high read/write rates.	`system.disk.io.write` `host.name` `device` `os.type` `system.disk.io.read`
Memory Usage by Process	Shows Linux processes by memory usage and virtual memory consumption. Use to troubleshoot resource bottlenecks and optimize memory allocation.	`os.type` `process.memory.usage` `process.memory.virtual` `host.name` `process.command` `process.owner`
Filesystem Usage	Shows filesystem usage across different mount points, devices, and modes. Use for capacity planning and troubleshooting storage issues.	`host.name` `device` `mountpoint` `mode` `os.type` `system.filesystem.usage.used`
Network Metrics	Shows network operations per network interface.	`system.network.io.receive` `system.network.io.transmit` `host.name` `device` `os.type`

Postgres

The Postgres Board Template provides insight into Postgres’s operations, including active connections, database size, table count, and transaction throughput.

The Postgres Board Template includes the following queries:

Query Name	Query Description	Required Fields
Active Connections	Shows the current number of active connections.	`host.name` `postgresql.backends` `postgresql.connection.max`
Database Size	Shows the database size over time. Use to help with capacity planning and identifying unexpected growth patterns.	`postgresql.db_size` `postgresql.database.name` `host.name`
Database and Table Count	Shows visibility into number of databases and tables, which can identify database sprawl.	`postgresql.table.count` `postgresql.database.name` `host.name` `postgresql.database.count`
Transaction Throughput	Shows the rate of commits and rollbacks per database, which provides insight into transaction throughput and success rates.	`postgresql.commits` `postgresql.rollbacks` `postgresql.database.name` `host.name`
Block Read Performance	Shows the the sources of block reads and their rates. Use to diagnose input/output performance issues.	`postgresql.blocks_read` `source` `postgresql.database.name` `postgresql.table.name` `host.name`
Index Usage	Shows the rate of index scans. Use to identify frequently used indexes.	`postgresql.index.name` `host.name` `postgresql.index.scans` `postgresql.table.name`
Database Operations	Shows database operations. Use to provide insight into workload patterns.	`postgresql.operations` `operation` `postgresql.table.name` `postgresql.database.name` `host.name`
Background Writer Activity	Shows buffer writes by source. Use to identify potential input/output bottlenecks.	`source` `host.name` `postgresql.bgwriter.buffers.writes`
Checkpoint Frequency	Shows the rate of checkpoints by type (requested versus scheduled), which can help identify if checkpoints are occurring too frequently.	`host.name` `postgresql.bgwriter.checkpoint.count` `type`
Checkpoint Duration	Shows time spent on checkpoint operations across databases and tables. Longer checkpoint durations can negatively impact database performance.	`postgresql.bgwriter.duration` `host.name` `type`
Table Size	Shows the top 10 largest tables, which may identify tables that require optimization or partitioning.	`postgresql.table.size` `postgresql.table.name`
Index Size	Shows the top 10 largest indexes, which may identify indexes that need rebuilding or optimization.	`postgresql.database.name` `postgresql.table.name` `host.name` `postgresql.index.size` `postgresql.index.name`
Cache Hit Ratio	Shows the sum of block reads satisfied from the buffer cache. A higher number indicates better performance.	`postgresql.blocks_read` `postgresql.database.name` `postgresql.table.name` `host.name` `source`
Replication WAL Delay	Shows time between flushing recent WAL and notification standby servers have completed operation on it. Use to track replication delays.	`host.name` `postgresql.wal.delay` `replication_client`
Replication Data Delay	Shows the amount of data delayed in replication, which can help identify network or performance issues affecting replication.	`postgresql.replication.data_delay` `replication_client` `host.name`
Database Locks by Type	Shows the maximum number of database locks per type. Use for situations where multiple concurrent transactions may cause resource contention.	`host.name` `postgresql.database.locks` `mode` `lock_type`
Postgres Memory Utilization	Shows memory usage and amount of committed memory for postgres processes. Use to identify inefficient processes.	`process.memory.usage` `process.memory.virtual` `process.command` `process.executable.name` `host.name`
Postgres CPU Utilization Trends	Shows CPU utilization for PostgreSQL processes. Use to identify inefficient queries, excessive index scanning, and so on.	`process.cpu.time` `process.command` `host.name`
Number of Postgres Operations	Shows the number of PostgreSQL operations per database and table name.	`postgresql.table.name` `operation` `host.name` `postgresql.operations` `postgresql.database.name`

Spring Boot

The Spring Boot Board Template provides insight into application health and performance metrics for Spring Boot microservices.

Tip

Source data for this Board Template is configured using automatic instrumentation provided by the OpenTelemetry Java Agent SDK. View our Java automatic instrumentation instructions to learn more.

The Spring Boot Board Template includes the following queries:

Query Name	Query Description	Required Fields
Database Usage	Shows database performance metrics. Use to help identify slow-performing queries and connection issues.	`db.client.connections.use_time.avg` `db.client.connections.wait_time.avg` `host.name` `telemetry.sdk.language`
API Endpoint Latency	Shows a heatmap of API endpoint response times. Use to highlight bottlenecks or anomalies in performance.	`http.server.request.duration.avg` `http.route` `http.response.status_code` `http.request.method` `host.name` `telemetry.sdk.language`
Garbage Collection Performance Monitor	Shows maximum, average, and `P95` duration of garbage collection metrics. Use to identify memory allocation patterns that causes application slow down.	`jvm.gc.duration.avg` `jvm.gc.duration.p95` `jvm.gc.action` `host.name` `telemetry.sdk.language` `jvm.gc.duration.max`
Request Per Minute	Shows requests made per minute. Use to observe the traffic patterns and to detect unexpected load or errors.	`host.name` `telemetry.sdk.language` `http.route` `http.request.method` `http.response.status_code`
Heap used vs Heap Max Limit	Shows the JVM memory matrix and compares current memory usage against maximum heap limit. Use to identify out of memory errors.	`jvm.memory.used` `jvm.memory.limit` `host.name` `telemetry.sdk.language`
API Errors	Shows error responses with `status code >= 400`. Use to monitor API health.	`http.route` `host.name` `telemetry.sdk.language` `http.response.status_code`
Response Size Distribution	Shows response payload size. Use to monitor data transfer efficiency, and to identify any unexpectedly large response.	`http.request.method` `http.response.status_code` `host.name` `telemetry.sdk.language` `http.response.body.size` `http.route`
JVM CPU Time Rate	Shows CPU consumption rate metrics. Use to identify processing-intensive operations and to detect performance decline overtime.	`jvm.cpu.time` `host.name` `telemetry.sdk.language` `meta.signal_type`

Django

The Django Board Template provides insight into application heath and performance metrics for a Django application.

Tip

This board utilizes the OpenTelemetry Python API for automatic instrumentation via the OpenTelemetry Python SDK.

View the OpenTelemetry Python API documentation and their Django instrumentation instructions.

The Django Board Template includes the following queries:

Query Name	Query Description	Required Fields
Request Count Per Minute	Shows requests made per minute. Use to observe the traffic patterns and to detect unexpected load or errors.	`telemetry.sdk.language` `http.host` `http.route` `http.method` `http.status_code` `http.server_name`
HTTP Response Duration	Shows the `P95` response duration by route, status code and server name. Highlights Django HTTP performance.	`http.route` `http.method` `http.status_code` `http.server_name` `telemetry.sdk.language` `http.response.body.size` `duration_ms`
HTTP Errors	Shows the count of HTTP errors by route, status code, and `host.name`. Use to assess the success and error rate of APIs.	`http.status_code` `http.server_name` `error` `telemetry.sdk.language` `http.route` `http.method`
Exceptions	Shows exceptions thrown in the service. Use to access overall health of the application.	`http.server_name` `exception.type` `code.namespace` `exception.message` `exception.stacktrace` `telemetry.sdk.language`
AVG and P95 Request Size	Shows the average and P95 HTTP request size to monitor payload efficiency.	`http.server_name` `telemetry.sdk.language` `http.request.body.size` `http.route` `http.method` `http.status_code`
AVG and P95 Response Size	Shows the average and P95 HTTP response size to monitor payload efficiency.	`telemetry.sdk.language` `http.response.body.size` `http.route` `http.method` `http.status_code` `http.server_name`
P95 and Heatmap of Job Duration	Shows the P95 and Heatmap of Job Duration by messaging destination, messaging system, and server name. Provides insights into status async job runners.	`http.server_name` `telemetry.sdk.language` `duration_ms` `messaging.destination` `messaging.system`
Jobs Executed	Shows the count of root traces with messaging system and destination. Can be used to assess overall performance of the async job operations.	`http.server_name` `messaging.destination` `messaging.system` `telemetry.sdk.language` `messaging.destination_kind`
DB connection Count Per Min	Shows the connection count per minute where db connection event is “open”. Helps gain visibility into connection pooling efficiency.	`telemetry.sdk.language` `db.operation` `db.system` `db.name` `db.connection.event`

Rails

The Rails Board Template gives you visibility into Rails behavior, performance, and health. The queries and visualizations help identify slow database queries, inefficient code paths, and other performance bottlenecks.

Tip

The required fields in the Rails Board Template are derived from Ruby and Ruby on Rails support for OpenTelemetry logs, metrics, and traces.

View our documentation on instrumenting your Ruby and Ruby on Rails applications.

The Rails Board Template includes the following queries:

Query Name	Query Description	Required Fields
Requests Served	Shows count of requests served by Rails by `host.name`. Use to provide an overview of traffic volume at a glance.	`host.name` `telemetry.sdk.language` `http.route`
HTTP Response Duration	Shows `P95` response duration by route, controller namespace, controller function, status code, and `host.name`. Use for Rails HTTP performance.	`duration_ms` `http.route` `code.namespace` `code.function` `http.status_code` `host.name` `telemetry.sdk.language`
HTTP Duration Heatmap	Shows a heatmap of HTTP response duration by route, status code and `host.name`. Use to assess and investigate outliers.	`http.status_code` `host.name` `telemetry.sdk.language` `duration_ms` `http.route`
HTTP Errors	Shows count of HTTP errors by route, Controller namespace, status code, and `host.name`. Use to assess success and error rate of Rails web endpoints.	`error` `http.route` `code.namespace` `http.status_code` `host.name` `telemetry.sdk.language`
DB Statement Duration	Shows a heatmap and the `P95` of database duration per database name, operation, statement and `host.name`. A heatmap provides more information to help identify outlier DB statements.	`duration_ms` `db.name` `db.operation` `db.statement` `telemetry.sdk.language`
P95 and Heatmap of Job Duration	Shows `P95` and a heatmap of Job Duration by messaging destination, messaging system, service name, and `host.name`. Provides insights into status of Rails async job runners, such as ActiveJob and Sidekiq.	`duration_ms` `messaging.destination` `messaging.system` `service.name` `host.name` `telemetry.sdk.language`
Exceptions	Shows exceptions thrown by type, code namespace, and `host.name`. Use to assess overall health of your Rails application.	`code.namespace` `host.name` `telemetry.sdk.language` `error` `exception.message` `exception.type`
Jobs Executed	Shows count of root traces with messaging system and destination. Use to assess overall performance of Rails async job operations.	`telemetry.sdk.language` `host.name` `messaging.system` `messaging.destination`

Frontend Investigation

Real User Monitoring (RUM)

The RUM Board Template provides an overview of real user monitoring data from your frontend applications.

Tip

The RUM Board Template relies on your source data fields being mapped to Honeycomb standard fields. To learn how to map your fields, visit Dataset Definitions. To learn more about instrumenting your frontend application, visit Send Browser Data with Honeycomb Web Instrumentation.

The RUM Board Template includes the following queries:

Query Name	Query Description	Required Fields
Largest Contentful Paint (LCP)	Shows ratings based on the render time for the largest content on a page.	`lcp.rating` `name`
Cumulative Layout Shift (CLS)	Shows ratings based on the stability of content layout on a page.	`cls.rating` `name`
Interaction to Next Paint (INP)	Shows ratings based on the responsiveness of a page.	`inp.rating` `name`
LCP P75	Shows the 75th percentile for LCP.	`name` `lcp.value`
CLS P75	Shows the 75th percentile for CLS.	`cls.value` `name`
INP P75	Shows the 75th percentile for INP.	`inp.value` `name`
Total Events by Type	Shows event types ranked by occurrence.	`name` `meta.annotation_type`
Largest Resource Requests	Shows the largest resource requests ranked by the average length of their response content.	`http.response_content_length` `http.url` `name`
Top 5 Endpoints by Request Count	Shows the top 5 endpoints ranked by number of requests.	`http.method` `name` `http.url`
Slowest Requests by Endpoint	Shows the slowest endpoints based on the 75th percentile of request durations.	`http.url` `duration_ms` `name`
Top Landing Pages by Session Count	Shows the most visited landing pages ranked by session count.	`entry_page.path` `name`
Pages With the Most Events	Shows pages with the highest number of events, highlighting the most active pages.	Route or `http.route`

Android Auto-Instrumentation

The Android Auto-Instrumentation Board Template provides an overview of the Honeycomb OpenTelemetry Android SDK auto-instrumentation.

Tip

The Android Auto-Instrumentation Board Template relies on your source data fields being mapped to Honeycomb standard fields. To learn how to map your fields, visit Dataset Definitions. To learn more about instrumenting your frontend application, visit Send Android Data to Honeycomb.

The Android Auto-Instrumentation Board Template includes the following queries:

Query Name	Query Description	Required Fields
Average App Startup Times	Average time the application took to start up. Grouped into cold, warm, and hot startups.	`duration_ms` `name` `start.type`
Total Startup Times Over 1.5s	Number of instances where any startup time surpassed the threshold of 1.5 seconds.	`duration_ms` `name` `start.type`
App’s Memory and Heap Usage	Statistics about the application’s memory and heap usage.	`heap.free` `storage.free`
Average Network Request Time per Screen	Average duration for a screen’s requests to successfully retrieve data.	`duration_ms` `http.request.method` `http.response.status_code` `screen.name`
Screens with the Most Network Requests	Screens that have the most network activity.	`http.request.method` `screen.name`
Top Screens by Total Network Request Failures	Screens with the highest number of failed network requests.	`http.response.status_code` `screen.name`
Screens with Application Not Responding (ANR) Errors	Number of instances where the application is unresponsive for more than 5 seconds.	`exception.stacktrace` `name` `screen.name`
Screens with Slow/Frozen Renders	Screens that take more than 16ms (slow) or more than 700ms (frozen) to render.	`name` `screen.name`
Top App Crashes & Errors	Total number of times the application crashed, excluding ANR events.	`exception.message` `exception.stacktrace` `exception.type` `name`

i0S Auto-Instrumentation

The iOS Auto-Instrumentation Board Template provides an overview of the Honeycomb OpenTelemetry Swift SDK auto-instrumentation.

Tip

The iOS Auto-Instrumentation Board Template relies on your source data fields being mapped to Honeycomb standard fields. To learn how to map your fields, visit Dataset Definitions. To learn more about instrumenting your frontend application, visit Send iOS Data to Honeycomb with Swift.

The iOS Auto-Instrumentation Board Template includes the following queries:

Query Name	Query Description	Required Fields
Monthly Active Users	Total number of distinct users that have used the application in the past month.	`device.id`
Weekly Active Users	Total number of distinct users that have used the application in the past week.	`device.id`
Daily Active Users	Total number of distinct users that have used the application in the past day.	`device.id`
Average App Startup Times	Average time the application took to start up. Grouped into cold, warm, and hot startups.	`metrickit.app_launch.app_resume_time_average` `metrickit.app_launch.optimized_time_to_first_draw_average` `metrickit.app_launch.time_to_first_draw_average` `name`
Total Startup Times Over 1.5s	Total number of instances where any startup time surpassed the threshold of 1.5 seconds.	`metrickit.app_launch.app_resume_time_average` `metrickit.app_launch.optimized_time_to_first_draw_average` `metrickit.app_launch.time_to_first_draw_average` `name`
Abnormal App Exit Ratio	Ratio between abnormal application exits (foreground and background) and total application exits.	`DIV(SUM($metrickit.app_exit.background.abnormal_exit_count, $metrickit.app_exit.foreground.abnormal_exit_count), SUM($metrickit.app_exit.background.normal_app_exit_count, $metrickit.app_exit.foreground.normal_app_exit_count, $metrickit.app_exit.background.abnormal_exit_count, $metrickit.app_exit.foreground.abnormal_exit_count))`
Average App Performance Across All Devices	Statistics on how the resources the application is using perform on average.	`metrickit.cpu.cpu_time` `metrickit.gpu.time` `metrickit.memory.peak_memory_usage` `metrickit.memory.suspended_memory_average` `name`
Average Network Request Time per Screen	Average duration for all the app’s screens to successfully retrieve data.	`duration_ms` `http.request.method` `http.response.status_code` `screen.name`
Screens with the Most Network Requests	Screens that have the most network requests.	`http.request.method` `screen.name`
Top Screens by Total Network Request Failures	Top screens that have failing network requests.	`http.response.status_code` `screen.name`
Long Hanging Screens	Screens that are hanging for more than 0.5 seconds.	`metrickit.app_responsiveness.` `hang_time_average` `name` `screen.name`
Average Screen Hang Times	Length of time each screen hangs on average.	`metrickit.app_responsiveness.hang_time_average` `name` `screen.name`
Most Used OS Versions	Operating systems used by the most users.	`device.id` `os.version`

Kubernetes

Tip

Use the Kubernetes Quick Start to instrument the required fields for Kubernetes Board Templates.

Kubernetes Pod Metrics

The Kubernetes Pod Metrics Board Template includes queries that help you investigate pod performance and resource usage within Kubernetes clusters:

Query Name	Query Description	Required Fields
Pod CPU Usage	Shows the amount of CPU used by each pod in the cluster. CPU is reported as the average core usage measured in cpu units. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers, and 1 hyper-thread on bare-metal Intel processors.	`k8s.pod.cpu.utilization` `k8s.pod.name`
Pod Memory Usage	Shows the amount of memory being used by each Kubernetes pod.	`k8s.pod.memory.usage` `k8s.pod.name`
Pod Uptime Smokestacks	As pod uptime ever-increases, this query uses the smokestack method, which applies a LOG10 to the Pod Uptime metric, and newly started or restarted pods appear more significantly than pods that have been running a long time, which move into a straight line eventually.	`LOG10($k8s.pod.uptime)` `k8s.pod.name` `k8s.pod.uptime`
Unhealthy Pods	Shows trouble that pods may be experiencing during their operating lifecycle. Many of these events are present during start-up and get resolved so the presence of a count isn’t necessarily bad.	`k8s.namespace.name` `k8s.pod.name` `reason`
Pod CPU Utilization vs. Limit	When a CPU Limit is present in a pod configuration, this query shows how much CPU that each pod uses as a percentage against that limit.	`k8s.pod.cpu_limit_utilization` `k8s.pod.name`
Pod CPU Utilization vs. Request	When a CPU Request is present in a pod configuration, this query shows how much CPU that each pod uses as a percentage against that request value.	`k8s.pod.cpu_request_utilization` `k8s.pod.name`
Pod Memory Utilization vs. Limit	When a Memory Limit is present in a pod configuration, this query shows how much memory that each pod uses as a percentage against that limit value.	`k8s.pod.memory_limit_utilization` `k8s.pod.name`
Pod Memory Utilization vs. Request	When a Memory Request is present in a pod configuration, this query shows how much memory that each pod uses as a percentage against that request value.	`k8s.pod.memory_request_utilization` `k8s.pod.name`
Pod Network IO Rates	Displays Network IO RATE_MAX for Transmit and Receive network traffic (in bytes) as a stacked graph, and gives the overall network rate and the individual rate for each node.	`k8s.pod.name` `k8s.pod.network.io.receive` `k8s.pod.network.io.transmit`
Pods With Low Filesystem Availability	Shows any pods where filesystem availability is below 5 GB.	`k8s.pod.filesystem.available` `k8s.pod.name`
Pod Filesystem Usage	Shows the amount of filesystem usage per Kubernetes pod, displayed in a stack graph to show total filesystem usage of all pods.	`k8s.pod.filesystem.usage` `k8s.pod.name`
Pods Per Namespace	Shows the number of pods currently running in each Kubernetes namespace.	`k8s.namespace.name` `k8s.pod.name`
Pods Per Node	Shows the number of pods currently running in each Kubernetes Node.	`k8s.node.name` `k8s.pod.name`
Pod Network Errors	Shows network errors in receive and transmit, grouped by pod.	`k8s.pod.name` `k8s.pod.network.errors.receive` `k8s.pod.network.errors.transmit`
Pods Per Deployment	Shows the number of pods currently deployed in different Kubernetes deployments.	`k8s.deployment.name` `k8s.pod.name`

Kubernetes Node Metrics

The Kubernetes Node Metrics Board Template includes queries that help you investigate node performance and resource usage within Kubernetes clusters:

Query Name	Query Description	Required Fields
Node CPU Usage	Shows the amount of CPU used on each node in the cluster. CPU is reported as the average core usage measured in cpu units. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers, and 1 hyper-thread on bare-metal Intel processors.	`k8s.node.cpu.utilization` `k8s.node.name`
Node Memory Utilization	Shows percent of memory used on each Kubernetes node.	`IF(EXISTS($k8s.node.memory.available), MUL(DIV($k8s.node.memory.working_set, $k8s.node.memory.available), 100))` `k8s.node.memory.available` `k8s.node.memory.usage` `k8s.node.name`
Node Network IO Rates	Displays Network IO RATE_MAX for Transmit and Receive network traffic as a stacked graph, and gives overall network rate and the individual rate for each node.	`k8s.node.name` `k8s.node.network.io.receive` `k8s.node.network.io.transmit`
Unhealthy Nodes	Shows errors that Kubernetes nodes are experiencing.	`k8s.namespace.name` `k8s.node.name` `reason` `severity_text`
Node Filesystem Utilization	Shows percent of filesystem used on each node.	`IF(EXISTS($k8s.node.filesystem.usage),MUL(DIV($k8s.node.filesystem.usage,$k8s.node.filesystem.capacity), 100))` `k8s.node.filesystem.capacity` `k8s.node.filesystem.usage` `k8s.node.name`
Node Uptime Smokestack	As node uptime ever-increases, this query uses the smokestack method, which applies a LOG10 to the Node Uptime metric, and newly started or restarted nodes appear more significantly than nodes that have been running a long time, which move into a straight line eventually.	`LOG10($k8s.node.uptime)` `k8s.node.name` `k8s.node.uptime`
Node Network Errors	Shows network transmit and receive errors for each node.	`k8s.node.name` `k8s.node.network.errors.receive` `k8s.node.network.errors.transmit`
Pods and Containers per Node	Shows the number of pods and the number of containers per node as stacked graphs, and also shows total number of pods and containers across the environment.	`k8s.container.name` `k8s.node.name` `k8s.pod.name`

Kubernetes Workload Health

The Kubernetes Workload Health Board Template includes queries that help you diagnose Kubernetes-related application issues:

Query Name	Query Description	Required Fields
Container Restarts	Shows the total number of restarts per pod, and the rate of restarts of pods where the restart count is greater than zero.	`k8s.container.name` `k8s.container.restarts` `k8s.namespace.name` `k8s.pod.name`
Unhealthy Pods	Shows trouble that pods may be experiencing during their operating lifecycle. Many of these events are present during start-up and get resolved so the presence of a count isn’t necessarily bad.	`k8s.namespace.name` `k8s.pod.name` `reason`
Pending Pods	Shows pods in a “Pending” state.	`k8s.pod.name` `k8s.pod.phase`
Failed Pods	Shows pods in a “Failed” or “Unknown” state.	`k8s.pod.name` `k8s.pod.phase`
Unhealthy Nodes	Shows errors that Kubernetes nodes are experiencing.	`k8s.namespace.name` `reason` `k8s.pod.name` `reason` `severity_text`
Unhealthy Volumes	Shows volume creation and attachment failures.	`k8s.namespace.name` `k8s.pod.name` `reason` `severity_text`
Unscheduled Daemonset Pods	Tracks cases where a pod in a daemonset is not currently running on every node in the cluster as it should be.	`SUB($k8s.daemonset.desired_scheduled_nodes, $k8s.daemonset.current_scheduled_nodes)` `k8s.daemonset.current_scheduled_nodes` `k8s.daemonset.desired_scheduled_nodes` `k8s.daemonset.name` `k8s.namespace.name`
Stateful Set Pod Readiness	Tracks any stateful sets where pods are in an non-ready state that should be in a ready state.	`SUB($k8s.statefulset.desired_pods,$k8s.statefulset.ready_pods)` `k8s.statefulset.desired_pods` `k8s.statefulset.name` `k8s.statefulset.ready_pods`
Deployment Pod Status	Shows Deployments where Pods have not fully deployed. Numbers greater than zero show pods in a deployment that are not yet “ready”.	`SUB($k8s.deployment.desired,$k8s.deployment.available)` `k8s.deployment.available` `k8s.deployment.desired` `k8s.deployment.name`
Job Failures	Tracks the number of failed pods in Kubernetes jobs.	`k8s.job.failed_pods` `k8s.job.name`
Active Cron Jobs	Tracks the number of active pods in each Kubernetes cron job.	`k8s.cronjob.active_jobs` `k8s.cronjob.name`

OpenTelemetry

OpenTelemetry Collector Operations

The OpenTelemetry Collector Operations Board Template includes queries with key metrics emitted by the OpenTelemetry Collector during its operation:

Query Name	Query Description	Required Fields
Exporter Span Failures	Shows when errors happen during enqueueing or sending in exporters.	`net.host.name` `otelcol_exporter_enqueue_failed_spans` `otelcol_exporter_send_failed_spans`
Collector Uptime Smokestacks	Shows the uptime for different pods with a `Log10` to make it clearer where restarts are happening.	`LOG10($otelcol_process_uptime)` `net.host.name` `otelcol_process_uptime`
Exporter Metric Send Failures	Shows when errors happen during sending from exporters.	`net.host.name` `otelcol_exporter_enqueue_failed_metric_points` `otelcol_exporter_send_failed_metric_points`
Exporter Metrics Enqueue Failures	Shows when errors happen during enqueueing in exporters.	`net.host.name` `otelcol_exporter_send_failed_metric_points`
Exporter Log Records Failures	Shows when errors happen during enqueueing or sending in exporters.	`net.host.name` `otelcol_exporter_enqueue_failed_log_records`

OpenTelemetry Java Metrics

The OpenTelemetry Java Metrics Board Template includes queries that help you investigate application issues related to the Java Virtual Machine (JVM).

Metrics for Java applications are sourced from the JVM and reported by the OpenTelemetry Java Agent or Honeycomb OpenTelemetry Distribution for Java.

Query Name	Query Description	Required Fields
JVM Memory Usage (Young Generation)	Shows memory usage for Eden space on the JVM heap, which is where newly created objects are stored. When it fills, a minor Garbage Collection (GC) occurs, moving all “live” objects to the Survivor space. In addition to current memory usage, committed represents the guaranteed available memory, and limit represents maximum usable.	`host.name` `pool` `process.runtime.jvm.memory.committed` `process.runtime.jvm.memory.limit` `process.runtime.jvm.memory.usage` `process.runtime.jvm.memory.usage_after_last_gc` `service.name` `type`
JVM Memory Usage (Old Generation)	Shows memory usage for tenured Gen JVM heap space, which stores long-lived objects. When a Full or Major GC is performed, it is expensive and may pause app execution. Committed represents guaranteed available memory, and limit represents maximum usable memory.	`host.name` `pool` `process.runtime.jvm.memory.committed` `process.runtime.jvm.memory.limit` `process.runtime.jvm.memory.usage` `process.runtime.jvm.memory.usage_after_last_gc` `service.name` `type`
JVM Garbage Collection (GC) Activity	Shows JVM garbage collection activity. JVM GC actions occur periodically to reclaim memory but consume CPU cycles to do so. In the worst cases, a GC can cause the entire JVM to pause, making the application appear unresponsive.	`process.runtime.jvm.gc.duration.count` `action` `gc` `host.name` `process.runtime.jvm.gc.duration.avg` `process.runtime.jvm.gc.duration.max` `service.name`
JVM CPU Utilization	Shows system CPU utilization and 1-minute load average, as captured by the JVM.	`host.name` `process.runtime.jvm.cpu.utilization` `process.runtime.jvm.system.cpu.load_1m` `service.name`
JVM Buffer Memory Usage	Shows usage of buffer memory, which is provided by the OS and is outside the JVM’s heap memory allocation. Buffer memory is used by Java NIO to quickly write data to network or disk.	`host.name` `process.runtime.jvm.buffer.limit` `process.runtime.jvm.buffer.usage` `service.name`
JVM Non-Heap Memory Usage	Shows usage of JVM non-heap memory, which is allocated above and beyond the heap size you’ve configured. JVM non-heap memory is a section of memory in the JVM that stores class information (Metaspace), compiled code cache, thread stack, and so on. It cannot be garbage collected.	`host.name` `pool` `process.runtime.jvm.memory.committed` `process.runtime.jvm.memory.limit` `process.runtime.jvm.memory.usage` `service.name` `type`

AWS

AWS Lambda Health

The AWS Lambda Health Board Template includes queries that monitor the health of AWS Lambda functions, including metrics for invocations, errors, throttles, and concurrency:

Query Name	Query Description	Required Fields
Duration & Execution by ID/Version	Tracks the execution time of Lambda functions, identified by their ID or version. Useful for analyzing the performance and efficiency of different versions or instances of a function over time.	`duration_ms` `faas.execution` `faas.name` `faas.version`
Lambda Invocations by Function	Shows the total number of times each Lambda function is invoked. It helps in tracking the frequency of usage of different functions, enabling a clear understanding of which functions are most or least used.	`FunctionName` `MetricName` `Namespace`
Latency by Function/Metric	Shows the response time for each Lambda function, broken down by specific metrics. Useful for identifying functions that may be experiencing performance issues due to high latency.	`FunctionName` `MetricName` `Namespace` `amazonaws.com/AWS/Lambda/Duration.max` `amazonaws.com/AWS/Lambda/PostRuntimeExtensionsDuration.max`
Function Error Count and Rate	Shows two key pieces of information: the total number of errors encountered by each Lambda function and the error rate, calculated as the ratio of errors to total invocations. Useful for pinpointing functions that are failing or experiencing issues.	`FunctionName` `MetricName` `Namespace` `amazonaws.com/AWS/Lambda/Errors.count`
Lambda Throttles	Shows the instances where Lambda invocations are being throttled, such as when the number of function calls exceeds the concurrency limits. Tracking this helps in managing and optimizing the scalability settings for each function.	`FunctionName` `MetricName` `Namespace` `amazonaws.com/AWS/Lambda/Throttles.count`
Function Concurrency	Monitors the simultaneous execution count of each Lambda function, tracking how many instances of a function are running at the same time.	`FunctionName` `MetricName` `Namespace` `amazonaws.com/AWS/Lambda/ConcurrentExecutions.avg` `amazonaws.com/AWS/Lambda/UnreservedConcurrentExecutions.avg`

EC2 Health

The AWS EC2 Board Template includes queries that monitor the health of AWS EC2 instances, including status failures, disk Read and Write operations, and EBS operations.

The AWS EC2 Board Template includes the following queries:

Query Name	Query Description	Required Fields
CPU Utilization	Shows CPU utilization per EC2 instance.	`amazonaws.com/AWS/EC2/CPUUtilization.max` `Dimensions.InstanceId` `cloud.account.id` `cloud.region`
Network I/O	Shows network input and output per EC2 instance.	`cloud.account.id` `cloud.region` `amazonaws.com/AWS/EC2/NetworkIn.max` `amazonaws.com/AWS/EC2/NetworkPacketsOut.max` `Dimensions.InstanceId`
EBS Read Operations	Shows the number of read operations committed by the instance.	`cloud.account.id` `cloud.region` `amazonaws.com/AWS/EC2/EBSReadOps.max` `Dimensions.InstanceId`
EBS Write Operations	Shows the number of write operations committed by the instance.	`amazonaws.com/AWS/EC2/EBSWriteOps.max` `Dimensions.InstanceId` `cloud.account.id` `cloud.region`
EBS IO Balance	Shows available input and output per second that attached EBS volumes are utilizing. Use to monitor potential throttling on an EBS volume attached to an instance.	`amazonaws.com/AWS/EC2/EBSIOBalance%.max` `Dimensions.InstanceId` `cloud.account.id` `cloud.region`
Instance Metadata Service Outliers	Shows the number of instances that are not currently using IMDSv2. Use to identify potential security issues with EC2 instances.	`amazonaws.com/AWS/EC2/MetadataNoToken.max` `Dimensions.InstanceId` `cloud.account.id` `cloud.region`
EC2 Disk Read/Write	Shows Write and Read operations undertaken by EC2 instances. Use to monitor EBS volume usage.	`amazonaws.com/AWS/EC2/EBSWriteBytes.max` `amazonaws.com/AWS/EC2/EBSReadBytes.max` `Dimensions.InstanceId` `Namespace`
EC2 Instance Status Failures	Shows any EC2 instances that have failed a status check in the provided time period.	`cloud.account.id` `cloud.region` `amazonaws.com/AWS/EC2/StatusCheckFailed.max` `Dimensions.InstanceId`

AWS ALB/ELB Health

The AWS ALB/ELB Board Template includes queries that monitor the Load Balancer’s health, status codes, active connections, and requests.

Tip

This Board Template relies on AWS Metrics streams provided by AWS Cloudwatch. Data is streamed from an AWS Kinesis Data Firehose to an endpoint compatible with CloudWatch Metric Streams. To utilize this Board Template, you will need to provision a metrics stream for EC2 instances that you wish to monitor.

The AWS ALB/ELB Board Template includes the following queries:

Query Name	Query Description	Required Fields
Request Count Per Target	Shows how requests are distributed across targets. Use to diagnose imbalanced traffic in the load balancer.	`cloud.region` `Dimensions.AvailabilityZone` `amazonaws.com/AWS/ApplicationELB/RequestCountPerTarget.count` `Dimensions.LoadBalancer` `Dimensions.TargetGroup` `cloud.account.id`
Healthy vs. Unhealthy Host Count	Shows the number of healthy versus unhealthy hosts per load balancer, which is segmented across target groups and availability zones. Use to quickly spot failing load balancer targets.	`amazonaws.com/AWS/ApplicationELB/HealthyHostCount.max` `amazonaws.com/AWS/ApplicationELB/UnHealthyHostCount.max` `Dimensions.LoadBalancer` `Dimensions.TargetGroup` `cloud.account.id` `Dimensions.AvailabilityZone`
Load Balancer Status Codes	Shows status codes per load balancer. Use to identify routing or traffic management issues.	`cloud.account.id` `cloud.region` `amazonaws.com/AWS/ApplicationELB/HTTPCode_ELB_3XX_Count.count` `amazonaws.com/AWS/ApplicationELB/HTTPCode_ELB_4XX_Count.count` `amazonaws.com/AWS/ApplicationELB/HTTPCode_ELB_5XX_Count.count` `Dimensions.LoadBalancer`
Active Connections	Shows active connections per load balancer.	`amazonaws.com/AWS/ApplicationELB/ActiveConnectionCount.count` `Dimensions.LoadBalancer` `cloud.account.id` `cloud.region`
State Routing	Shows load balancer state routing. Use to identify network configuration errors, unresponsive applications, or health check delays.	`amazonaws.com/AWS/ApplicationELB/UnhealthyStateRouting.max` `Dimensions.LoadBalancer` `Dimensions.TargetGroup` `Dimensions.AvailabilityZone` `cloud.account.id` `cloud.region` `amazonaws.com/AWS/ApplicationELB/HealthyStateRouting.max`
Load Balancer Capacity Units	Shows LCUs consumed during a given period of time. Use to optimize load balancer cost and detecting bottlenecks.	`Dimensions.LoadBalancer` `cloud.account.id` `cloud.region` `amazonaws.com/AWS/ApplicationELB/PeakLCUs.max`
Anomalous Host Count	Shows the number of hosts behaving abnormally. Use to detect and diagnose excessive error rates, latency issues, or inconsistent health check results.	`amazonaws.com/AWS/ApplicationELB/AnomalousHostCount.max` `Dimensions.LoadBalancer` `Dimensions.TargetGroup` `cloud.account.id`
DNS Target State	Shows load balancer DNS target state resolution. Use to identify failing targets and DNS misconfigurations.	`amazonaws.com/AWS/ApplicationELB/HealthyStateDNS.max` `amazonaws.com/AWS/ApplicationELB/HealthyStateDNS.count` `amazonaws.com/AWS/ApplicationELB/UnhealthyStateDNS.max` `Dimensions.LoadBalancer` `Dimensions.TargetGroup` `cloud.account.id` `Dimensions.AvailabilityZone`
TLS Negotiation Errors	Shows the number of TLS negotiation errors per load balancer.	`amazonaws.com/AWS/ApplicationELB/ClientTLSNegotiationErrorCount.count` `Dimensions.LoadBalancer` `Dimensions.AvailabilityZone` `cloud.account.id` `cloud.region`
Connection Error Count	Shows errors on targets. Use to diagnose and troubleshoot misconfigured load balancer targets.	`Dimensions.TargetGroup` `amazonaws.com/AWS/ApplicationELB/TargetConnectionErrorCount.max` `Dimensions.LoadBalancer` `cloud.account.id` `cloud.region`

SQS

The SQS Board Template provides insight into critical AWS SQS operations.

Tip

The SQS Board Template includes the following queries:

Query Name	Query Description	Required Fields
Request Count Per Minute	Shows requests made per minute. Use to observe the traffic patterns and detect unexpected load or errors.	`telemetry.sdk.language` `http.host` `http.route` `http.method` `http.status_code` `http.server_name`
HTTP Response Duration	Shows the `P95` response duration by route, status code, and server name. Use for Django HTTP performance.	`http.route` `http.method` `http.status_code` `http.server_name` `telemetry.sdk.language` `http.response.body.size` `duration_ms`
HTTP Errors	Shows count of HTTP errors by route, status code, and `host.name`. Use to assess success and error rates of APIs.	`http.status_code` `http.server_name` `error` `telemetry.sdk.language` `http.route` `http.method`
Exceptions	Shows exceptions thrown in the service. Use to assess the overall health of the application.	`http.server_name` `exception.type` `code.namespace` `exception.message` `exception.stacktrace` `telemetry.sdk.language`
AVG and P95 Request Size	Shows the average and `P95` HTTP request size. Use to monitor payload efficiency.	`http.server_name` `telemetry.sdk.language` `http.request.body.size` `http.route` `http.method` `http.status_code`
AVG and P95 Response Size	Shows the average and `P95` HTTP response size. Use to monitor payload efficiency.	`telemetry.sdk.language` `http.response.body.size` `http.route` `http.method` `http.status_code` `http.server_name`
P95 and Heatmap of Job Duration	Shows the `P95` and a heatmap of Job Duration by messaging destination, messaging system, and server name. Provides insights into status async job runners.	`http.server_name` `telemetry.sdk.language` `duration_ms` `messaging.destination` `messaging.system`
Jobs Executed	Shows count of root traces with messaging system and destination. Use to assess overall performance of the async job operations.	`http.server_name` `messaging.destination` `messaging.system` `telemetry.sdk.language` `messaging.destination_kind`
DB connection Count Per Min	Shows the connection count per minute where database connection event is “open”. Use to gain visibility into connection pooling efficiency.	`telemetry.sdk.language` `db.operation` `db.system` `db.name` `db.connection.event`

RDS

The RDS Board Template provides insight to monitor and optimize performance for AWS RDS databases.

Tip

The RDS Board Template includes the following queries:

Query Name	Query Description	Required Fields
Number of Connections	Shows the number of connections to RDS instances.	`amazonaws.com/AWS/RDS/DatabaseConnections.count` `Dimensions.DBInstanceIdentifier` `cloud.account.id`
Database Load	Shows the level of session activity on RDS instances.	`amazonaws.com/AWS/RDS/DBLoad.max` `Dimensions.DBInstanceIdentifier` `cloud.account.id`
Disk Queue Depth	Shows the number of outstanding input/output waiting to access the disk. High queue depth can indicate the workload is generating more read/write requests than underlying storage can handle.	`amazonaws.com/AWS/RDS/DiskQueueDepth.max` `Dimensions.DBInstanceIdentifier` `cloud.account.id` `amazonaws.com/AWS/RDS/DiskQueueDepth.count`
Freeable Memory	Shows the minimum freeable memory per database instance. Use to identify memory pressure in RDS instances.	`amazonaws.com/AWS/RDS/FreeableMemory.min` `Dimensions.DBInstanceIdentifier` `cloud.account.id` `amazonaws.com/AWS/RDS/FreeableMemory.count`
Read/Write Operations	Shows the read and write operations per second that the RDS instance is performing. Use to diagnose bottlenecks, optimize workloads, and manage cost.	`Dimensions.DBInstanceIdentifier` `cloud.account.id` `amazonaws.com/AWS/RDS/WriteIOPS.max` `amazonaws.com/AWS/RDS/ReadIOPS.max`
CPU Utilization	Shows maximum CPU utilization across database instance identifiers.	`Dimensions.DBInstanceIdentifier` `cloud.account.id` `amazonaws.com/AWS/RDS/CPUUtilization.max`
Free Storage Space	Shows the amount of free storage space per database instance.	`amazonaws.com/AWS/RDS/FreeStorageSpace.max` `Dimensions.DBInstanceIdentifier` `cloud.account.id`
Burst Balance	Shows the burst capacity per database instance. Lower burst capacity can affect input/output performance. Use for capacity planning and to optimize database performance.	`Dimensions.DBInstanceIdentifier` `cloud.account.id` `amazonaws.com/AWS/RDS/BurstBalance.sum`
Read/Write Latency	Visualizes Read/Write latency per database instance. Use for troubleshooting slow queries, inefficient indexes, or locking issues.	`amazonaws.com/AWS/RDS/WriteLatency.sum` `Dimensions.DBInstanceIdentifier` `cloud.account.id` `amazonaws.com/AWS/RDS/ReadLatency.sum`
Transaction Log Disk Usage	Shows the amount of storage consumed by database transaction logs. Use to prevent storage exhaustion.	`Dimensions.DBInstanceIdentifier` `cloud.account.id` `cloud.region` `amazonaws.com/AWS/RDS/TransactionLogsDiskUsage.max`
Checkpoint Lag	Shows checkpoint lag. Use to determine latency between leader and followers in replication.	`amazonaws.com/AWS/RDS/CheckpointLag.max` `Dimensions.DBInstanceIdentifier`
Swap Usage	Shows swap activity (from RAM to disk) per RDS instance. Use for identifying performance issues related to memory pressure.	`cloud.account.id` `cloud.region` `amazonaws.com/AWS/RDS/SwapUsage.max` `Dimensions.DBInstanceIdentifier`
Network Throughput	Shows the rate at which network data is being sent from RDS instances. Use to identify excessive data transfer or increased query latencies.	`amazonaws.com/AWS/RDS/NetworkTransmitThroughput.max` `Dimensions.DBInstanceIdentifier` `cloud.account.id` `cloud.region`

Honeycomb Features

Refinery Operations

For teams using Refinery to sample their data, the Refinery Board Template provides an overview of sampling operations.

Tip

Refinery emits metrics that provide insights into its health, trace throughput, and sampling statistics. Required fields in the Refinery Board Template map to these metrics and populate automatically when sent to Honeycomb. To learn more about these fields, visit Refinery Configuration.

The Refinery Board Template includes the following queries:

Query Name	Query Description	Required Fields
Stress Relief Status	Shows the current stress level on the Refinery cluster.	`stress_level` `stress_relief_activated` `hostname` or `host.name`
Dropped From Stress	Shows how many traces are being dropped due to stress on the Refinery cluster.	`dropped_from_stress` `hostname` or `host.name`
Stress Relief Log	Shows reasons why Refinery is going into stress relief.	`StressRelief` `reason` `msg` `hostname` or `host.name`
Cache Health	Shows metrics for cache health.	`collect_cache_buffer_overrun` `memory_inuse` `collect_cache_entries_max` or `collect_cache_entries.max` `collect_cache_capacity` `num_goroutines` `process_uptime_seconds` `hostname` or `host.name`
Cache Ejections	Shows number of traces ejected from cache.	`trace_send_ejected_full` `trace_send_ejected_memsize` `hostname` or `host.name`
Intercommunications	Shows total events from outside Refinery and events redirected from a peer.	`incoming_router_span` `peer_router_batch` `hostname` or `host.name`
Receive Buffers	Shows receive buffer operations.	`incoming_router_dropped` `peer_router_dropped` `hostname` or `host.name`
Peer Send Buffers	Show metrics for the queue used to buffer spans to send to peer nodes.	`libhoney_peer_queue_overflow` `libhoney_peer_send_errors` `hostname` or `host.name`
Upstream Send Buffers	Shows metrics for the queue used to buffer spans to send to Honeycomb.	`libhoney_upstream_queue_length` `libhoney_upstream_enqueue_errors` `libhoney_upstream_response_errors` `libhoney_upstream_send_errors` `libhoney_upstream_send_retries` `hostname` or `host.name`
EMADynamicSampler Performance	Shows EMADynamicSampler sampling effectiveness.	`emadynamic_sample_rate_avg` `emadynamic_keyspace_size` `emadynamic_num_kept` `emadynamic_num_dropped`
EMAThroughputSampler Performance	Shows EMAThroughputSampler sampling effectiveness.	`emathroughput_sample_rate_avg` `emathroughput_keyspace_size` `emathroughput_num_kept` `emathroughput_num_dropped`
WindowedThroughput Performance	Shows WindowedThroughput sampling effectiveness.	`windowedthroughput_sample_rate_avg` `windowedthroughput_keyspace_size` `windowedthroughput_num_kept` `windowedthroughput_num_dropped`
TotalThroughputSampler Performance	Shows TotalThroughputSampler sampling effectiveness.	`totalthroughput_sample_rate_avg` `etotalthroughput_keyspace_size` `totalthroughput_num_kept` `totalthroughput_num_dropped`
DynamicSampler Performance	Shows DynamicSampler sampling effectiveness.	`dynamic_sample_rate_avg` `dynamic_keyspace_size` `dynamic_num_kept` `dynamic_num_dropped`
RulesBasedSampler Performance	Shows RulesBasedSampler sampling effectiveness.	`rulesbased_sample_rate_avg` `rulesbased_num_kept` `rulesbased_num_dropped`
Trace Indicators	Shows total traces sent before completion and span received for a trace already sent.	`trace_sent_cache_hit` `trace_send_no_root`
Sampling Decisions	Shows total traces accepted and sent or dropped.	`trace_accepted` `trace_send_dropped` `trace_send_kept`
Refinery Send Event Error Logs	Shows errors when sending events to its peers or upstream to our API server.	`msg` `dataset` `api_host` `error`
Refinery Handler Event Error Logs	Shows errors when receiving or parsing events being sent to a node.	`msg` `dataset` `api_host` `error.err` `error.msg`
Refinery Events Exceeding Max Size	Shows errors when events are too large to be sent to Honeycomb.	`msg` `dataset` `api_host` `error`

Activity Log Security

Tip

Honeycomb automatically creates the required fields for the Activity Log Board Templates when it generates Activity Log events.

The Activity Log Security Board Template includes queries that track API Key activity:

Query Name	Query Description	Required Fields
API Key Added Permissions	Shows when permissions are added to an existing API key.	`resource.type` `resource.changed_fields` `environment.slug`
API Key Activities by User	Displays the number of changes to API keys broken down by user.	`key_type` `environment.slug` `user.email` `resource.action`
Authentication Type by User	Displays which type of authentication is used for each user.	`authentication_method` `user.email`

Activity Log Leaderboard

Tip

Honeycomb automatically creates the required fields for the Activity Log Board Templates when it generates Activity Log events.

The Activity Log Leaderboard Board Template includes queries that highlight advanced and frequent usage of Honeycomb by your team:

Query Name	Query Description	Required Fields
Queries by User	Shows which environments are being queried.	`resource.type` `user.email`
Complex Queries by User	Shows which users frequently use Visualize, Where, and Having clauses.	`resource.type` SUM( IF(EXISTS($query.having), 3, 0), REG_COUNT($query.where, `,`), REG_COUNT($query.visualize, `,`)) `user.email`
Top Query Visualizations	Shows the most commonly used visualizations.	`resource.type` SUM( IF(EXISTS($query.having), 3, 0), REG_COUNT($query.where, `,`), REG_COUNT($query.visualize, `,`)) `query.visualize`
Top Tinkerers	Lists which users perform the most updates to SLOs, Triggers, and Calculated Fields.	`resource.type` `user.email`
Queries by Dataset	Shows which datasets are being queried the most.	`resource.type` `environment.slug` `dataset.slug`
Queries by Environment	Shows a count of run queries as grouped by environment.	`resource.type` `environment.slug`

Activity Log Trigger and SLO Activity

Tip

Honeycomb automatically creates the required fields for the Activity Log Board Templates when it generates Activity Log events.

The Activity Log Trigger and SLO Activity Board Template includes queries related to trigger and SLO activations and modifications:

Query Name	Query Description	Required Fields
Trigger State Changes	Shows instances when triggers have been triggered or resolved.	`resource.type` `resource.action` `name`
Trigger Modifications	Shows creations, modifications, and deletions of triggers.	`resource.type` `resource.action`
Most Updated Triggers	Shows triggers that received the most changes recently.	`resource.type` `resource.action` `name`
Top Updated SLOs by Update Type	Shows creations, modifications, and deletions of SLOs and the supporting SLI (Calculated Field).	`resource.type` `resource.action` `environment.slug` `resource.changed_fields` `name` `user.email`
SLOs Created and Deleted	Shows creation and deletion of SLOs.	`resource.type` `resource.action` `environment.slug` `name` `resource.changed_fields` `user.email`
SLI Expression Changes by SLO	Shows when SLIs (Calculated Fields) related to SLOs have been changed.	`resource.type` `resource.action` `resource.changed_fields` `environment.slug` `name` `sli.expression` `before.sli.expression` `user.email`

Troubleshooting

To explore common issues when working with Board Templates, visit Common Issues with Visualization: Board Templates.

Honeycomb.io Documentation

Use Board Templates

What is a Board Template?

Board Templates At a Glance

General

Service Health

MySQL Operations

Redis

Airflow

Kafka

Linux Host

Postgres

Spring Boot

Django

Rails

Frontend Investigation

Real User Monitoring (RUM)

Android Auto-Instrumentation

i0S Auto-Instrumentation

Kubernetes

Kubernetes Pod Metrics

Kubernetes Node Metrics

Kubernetes Workload Health

OpenTelemetry

OpenTelemetry Collector Operations

OpenTelemetry Java Metrics

AWS

AWS Lambda Health

EC2 Health

AWS ALB/ELB Health

SQS

RDS

Honeycomb Features

Refinery Operations

Activity Log Security

Activity Log Leaderboard

Activity Log Trigger and SLO Activity

Troubleshooting