Use Board Templates

Get instant insights into your system with Board Templates.

Tip
This functionality is available only for teams using Honeycomb’s current data model. If you use Honeycomb Classic, we recommend migrating to Honeycomb Environments, so you can take advantage of its expanded data model and future product updates.

What is a Board Template? 

Board Templates are pre-configured Boards that come with ready-made queries and visualizations, providing valuable insights with minimal set up. Use a template as starting point to create a Board.

Templates are designed for specific use cases and built around industry best practices, ensuring effective configurations for tracking key metrics and visualizing data accurately.

Board Templates At a Glance 

Choose from a variety of templates to quickly gain insights across different areas of your system:

  • General:
    • Service Health: Provides insights into service health, including request volumes and where slowest requests occur.
    • Real User Monitoring (RUM): Displays real user monitoring data for frontend applications, including performance and user experience insights.
    • MySQL Operations: Provides insights into MySQL database operations, including thread count by type, query rate, resource usage, and row/table locks.
    • Redis: Provides insights into Redis primary and replica nodes, including command activity, latency/volume and execution time, expired keys, and CPU consumption.
    • Airflow: Provides an overview of data workflow performance. Monitoring Airflow operations can highlight problems which may occur in the process of running data pipelines.
    • Kafka: Provides insight into Kafka brokers, topics, partition, and consumers.
    • Linux Host: Provides useful queries for monitoring Linux hosts. It provides insights into CPU, memory, disk, filesystem, and network utilization on the configured hosts.
    • Postgres: Provides insight into Postgres’s operations, including active connections, database size, table count, and transaction throughput.
    • Spring Boot: Provides insight into application health and performance metrics for your Spring Boot microservices.
    • Django: Provides insight into application heath and performance metrics for your Django application.
    • Rails: Provides queries to help investigate the performance and health of your Rails application.
  • Kubernetes:
    • Kubernetes Pod Metrics: Helps you investigate pod performance and resource usage within Kubernetes clusters.
    • Kubernetes Node Metrics: Helps you investigate node performance and resource usage within Kubernetes clusters.
    • Kubernetes Workload Health: Helps you investigate application problems related to Kubernetes workloads.
  • OpenTelemetry:
    • OpenTelemetry Collector Operations: Provides metrics emitted by the OpenTelemetry Collector during operation.
    • OpenTelemetry Java Metrics: Offers insights into Java Virtual Machine (JVM) health and performance via metrics reported by OpenTelemetry Java Agent or Honeycomb OpenTelemetry Distribution for Java.
  • Amazon Web Services (AWS):
    • AWS Lambda Health: Provides information about AWS Lambda function health, including invocations, errors, throttles, and concurrency.
    • EC2 Health: Provides information about AWS EC2 instance, status failures, and EBS read/write operations.
    • ALB/ELB Health: Provides information about AWS Load Balancers, including Load Balancer’s health, status codes, active connections, and requests.
    • SQS: Provides insight into critical AWS SQS operations.
    • RDS: Provides insight to monitor and optimize performance for AWS RDS databases.
  • Honeycomb Features:
    • Refinery Operations: Shows an overview of sampling operations, including trace throughput and sampling statistics. Automatically populated by Refinery metrics sent to Honeycomb.
    • Activity Log Security: Displays queries showing API Key activity.
    • Activity Log Leaderboard: Displays queries showing advanced and frequent Honeycomb usage by your team.
    • Activity Log Trigger and SLO Activity: Displays queries related to trigger and SLO activations and modifications.

General 

Service Health 

The Service Health Board Template offers an overview of your services’ health. It provides insights into request volumes, identifies where the slowest requests are occurring, and more.

Tip
The Service Health Board Template relies on your source data fields being mapped to Honeycomb standard fields. To learn how to map your fields, visit Dataset Definitions.

The Service Health Board Template includes the following queries:

Query Name Query Description Required Fields
Trace Counts by Service Shows total trace volume by service.
  • Parent span ID or trace.parent_id
  • Service name or service.name or service_name
Trace Counts by HTTP Status Code Shows total trace volume by status code.
  • Parent span ID or trace.parent_id
  • HTTP Status Code or http.response.status.code or http.status_code
Trace Duration Heatmap Shows a heatmap of the duration for all traces.
  • Span duration or duration_ms
  • Parent span ID or trace.parent_id
Duration Heatmap Shows a heatmap of duration across all services.
  • Span duration or duration_ms
Duration by Service Shows key duration percentiles by service.
  • Span duration or duration_ms
  • Service name or service.name or service_name
Duration by Route Shows duration by route or endpoint.
  • Span duration or duration_ms
  • Route or http.route
Duration by Name Shows duration by function name.
  • Span duration or duration_ms
  • Name or name
Errors by Service Shows a count of errors grouped by service.
  • Error or error
  • Service name or service.name or service_name
Errors by Route Shows a count of errors grouped by route or endpoint.
  • Error or error
  • Route or http.route

Real User Monitoring (RUM) 

The RUM Board Template provides an overview of real user monitoring data from your frontend applications.

Tip
The RUM Board Template relies on your source data fields being mapped to Honeycomb standard fields. To learn how to map your fields, visit Dataset Definitions. To learn more about instrumenting your frontend application, visit Send Browser Data with Honeycomb Web Instrumentation.

The RUM Board Template includes the following queries:

Query Name Query Description Required Fields
Largest Contentful Paint (LCP) Shows ratings based on the render time for the largest content on a page.
  • lcp.rating
  • name
Cumulative Layout Shift (CLS) Shows ratings based on the stability of content layout on a page.
  • cls.rating
  • name
Interaction to Next Paint (INP) Shows ratings based on the responsiveness of a page.
  • inp.rating
  • name
LCP P75 Shows the 75th percentile for LCP.
  • name
  • lcp.value
CLS P75 Shows the 75th percentile for CLS.
  • cls.value
  • name
INP P75 Shows the 75th percentile for INP.
  • inp.value
  • name
Total Events by Type Shows event types ranked by occurrence.
  • name
  • meta.annotation_type
Largest Resource Requests Shows the largest resource requests ranked by the average length of their response content.
  • http.response_content_length
  • http.url
  • name
Top 5 Endpoints by Request Count Shows the top 5 endpoints ranked by number of requests.
  • http.method
  • name
  • http.url
Slowest Requests by Endpoint Shows the slowest endpoints based on the 75th percentile of request durations.
  • http.url
  • duration_ms
  • name
Top Landing Pages by Session Count Shows the most visited landing pages ranked by session count.
  • entry_page.path
  • name
Pages With the Most Events Shows pages with the highest number of events, highlighting the most active pages.
  • Route or http.route

MySQL Operations 

The MySQL Board Template provides insights into MySQL database operations, including thread count by type, query rate, resource usage, and row/table locks.

Tip
This Board Template relies on the MySQL metrics receiver provided by the OpenTelemetry Collector Contrib distribution. View OpenTelemetry documentation for set up instructions.

The MySQL Board Template includes the following queries:

Query Name Query Description Required Fields
Server Status Shows server uptime. Use to track server restarts.
  • mysql.uptime
  • mysql.instance.endpoint
Buffer Pool Pages Shows the number of pages in the InnoDB buffer pool by type. Use to understand buffer pool utilization.
  • mysql.instance.endpoint
  • kind
  • mysql.buffer_pool.pages
Buffer Pool Data Pages Shows the number of data pages in the InnoDB buffer pool by status (clean or dirty). Use to track page writes to disk.
  • mysql.buffer_pool.data_pages
  • mysql.instance.endpoint
  • status
Buffer Pool Page Flushes Shows the rate of page flush requests from the InnoDB buffer pool. Use to help identify input/output pressure.
  • mysql.instance.endpoint
  • mysql.buffer_pool.page_flushes
Buffer Pool Operations Shows buffer pool operations by type. Use to identify patterns in buffer pool usage.
  • mysql.instance.endpoint
  • operation
  • mysql.buffer_pool.operations
Row and Page Operations Shows the rate of InnoDB row and page operations. Use to provide insight into database workload and input/output patterns.
  • mysql.row_operations
  • mysql.page_operations
  • mysql.instance.endpoint
  • operation
Doublewrite Rate Shows the rate of writes to the InnoDB doublewrite buffer. Use to understanding database durability.
  • kind
  • mysql.double_writes
  • mysql.instance.endpoint
Handler Requests and Thread Status Shows the rate of requests to various handlers and the state of system threads. Provides insight into how the database is processing queries and allows monitoring of connection usage and thread efficiency.
  • mysql.handlers
  • mysql.threads
  • mysql.instance.endpoint
  • kind
Row and Table Locks Shows InnoDB lock statistics, and MySQL Table locks. Use to help identify lock contention.
  • mysql.row_locks
  • mysql.instance.endpoint
  • kind
  • mysql.locks
Resource Usage Shows the rate of opened resources and temporary resources. Use to help identify resource utilization, and the usage of temporary tables or files.
  • mysql.tmp_resources
  • mysql.instance.endpoint
  • resource
  • mysql.opened_resources
Query Rate Shows query throughput and slow query rates across MySQL instances. Use to pinpoint instances with the highest query load.
  • mysql.query.count
  • mysql.query.slow.count
  • mysql.instance.endpoint
Thread Count by Type Shows thread count by type. Use to indicate operations currently being performed by the set of threads executing within the server.
  • kind
  • mysql.threads
  • mysql.instance.endpoint
Table Open Cache Efficiency Shows Table Cache Efficiency. Use to monitor filesystem input/output within the instances.
  • mysql.table_open_cache
  • mysql.instance.endpoint
  • status

Redis 

The Redis Board Template provides insights into Redis primary and replica nodes, including command activity, latency/volume and execution time, expired keys, and CPU consumption.

Tip

This Board Template utilizes the Redis receiver provided by the OpenTelemetry Collector Contrib distribution. View OpenTelemetry documentation for set up instructions.

Note that the Redis receiver does not automatically publish some key server attributes, like address or port. The visualizations on this Board Template utilize server address to ensure that visualizing across multiple Redis instances is possible.

The Redis Board Template includes the following queries:

Query Name Query Description Required Fields
Cache Connections Shows connections received and rejected per server. Use to diagnose connectivity issues.
  • redis.connections.received
  • redis.connections.rejected
  • server.address
Uptime Shows the number of seconds since a server start by server.
  • server.address
  • redis.uptime
Server Durability Shows the number of write operations that have happened since the last successful RDB snapshot. Use to track durability issues per server.
  • redis.rdb.changes_since_last_save
  • server.address
Key Count Shows the number of keys per database and per server.
  • redis.db.keys
  • server.address
  • db
Server CPU Time Shows the CPU consumed by Redis server since server start.
  • server.address
  • redis.cpu.time
Client Activity Shows Redis client activity per server address and activity between connected and blocked clients.
  • redis.clients.connected
  • redis.clients.blocked
  • server.address
  • redis.version
Command Activity Shows the number of commands processed per second and the number of commands processed by the server. Use to track operational load of servers.
  • redis.commands.processed
  • redis.commands
  • server.address
Client I/O Shows the input/output buffers of Redis clients by server. Use to diagnose or troubleshoot input/output issues with clients.
  • redis.clients.max_input_buffer
  • redis.clients.max_output_buffer
  • server.address
Network Activity Shows network input/output by server.
  • redis.net.output
  • server.address
  • redis.net.input
P99 Command Latency Shows the P99 of command latency. Use to identify anomalous commands.
  • redis.cmd.latency
  • cmd
  • server.address
  • percentile
Command Volume and Execution Time Shows the number of calls for a command and the total time for all executions of a command per server.
  • redis.cmd.calls
  • redis.cmd.usec
  • server.address
  • cmd
Average Command Latency Shows the average latency of commands by server. Use to understand the baseline latency of a command.
  • percentile
  • redis.cmd.latency
  • server.address
  • cmd
Expired Keys Shows the total number of key expiration events per server.
  • redis.keys.expired
  • server.address
Keyspace Hits and Misses Shows the number of successful and failed key lookups per server.
  • redis.keyspace.hits
  • redis.keyspace.misses
  • server.address
Memory Profile Shows memory metrics per server.
  • redis.memory.peak
  • redis.memory.fragmentation_ratio
  • redis.memory.rss
  • redis.memory.lua
  • server.address
  • redis.memory.used
Primary Replication Shows the replication offsets per server.
  • redis.replication.offset
  • redis.replication.backlog_first_byte_offset
  • server.address
  • redis.slaves.connected
Follower Replication Shows the replication offset for follower instances.
  • redis.replication.replica_offset
  • server.address
  • redis.slaves.connected

Airflow 

The Airflow Board Template gives an overview of data workflow performance. Monitoring Airflow operations can highlight problems which may occur in the process of running data pipelines.

Tip

The required fields in the Airflow Board Template are derived from Airflow’s support for OpenTelemetry logs, metrics, and traces.

View our documentation about instrumenting your Python data pipelines and applications.

The Airflow Board Template includes the following queries:

Query Name Query Description Required Fields
DAG Processing Import Errors Shows the sum of the number of errors from trying to parse DAG files by host.name. Parsing errors prevent DAGs from being loaded. Tracking these errors helps identify configuration or syntax issues that need immediate attention.
  • airflow.dag_processing.import_errors
  • host.name
DAG Processing Import Errors by File Path Shows the sum of the number of errors during import and parse of DAG files, broken out by DAG File Path and host.name. Tracking these errors helps identify configuration or syntax issues with a given file or host.
  • host.name
  • import_errors
  • file_path
Duration of Tasks (AVG, P95) Shows the average and P95 duration of a Task by DAG ID, task ID, and host.name. Execution time helps identify which specific tasks are performance bottlenecks, allowing you to optimize your workflows. Note: Uses trace signal type.
  • host.name
  • meta.signal_type
  • duration_ms
  • task_id
  • dag_id
DAG Failed Duration (AVG) Shows the average duration in milliseconds (ms) taken for a DagRun to reach a failed state by DAG ID and host.name. Failed DAG runs consume valuable resources. Monitoring this metric helps to identify inefficient failure patterns.
  • dag_id
  • host.name
  • airflow.dagrun.duration.failed
DAG Success Duration (AVG) Shows the average duration in milliseconds (ms) for a DagRun to reach success state by DAG ID and host.name. Monitoring duration allows you to optimize resource allocation and set appropriate SLAs.
  • airflow.dagrun.duration.success
  • dag_id
  • host.name
Task Counts Shows the count of Tasks grouped by DAG ID, task ID, host.name, and state. Use the overall workflow health and the proportion of tasks experiencing issues to highlight potential issues with Airflow operations. Note: Uses trace signal type.
  • host.name
  • state
  • dag_id
  • task_id
DAG Schedule Delay Shows the average duration in milliseconds (ms) of delay between the scheduled DagRun start date and the actual DagRun start date, grouped by DAG ID and host.name. Use to identify scheduler bottlenecks, resource constraints, or overloaded Airflow instances that prevents timely workflow execution.
  • dag_id
  • host.name
  • airflow.dagrun.schedule_delay
Scheduler Tasks Shows the sum of Airflow Scheduler Tasks that are executing or starving by host ID. Use to understand scheduler load, identify periods when the scheduler might be overwhelmed with too many tasks, and ensure task distribution works as expected.
  • host.name
  • airflow.scheduler.tasks.executable
  • airflow.scheduler.tasks.starving
Executor Tasks Shows the maximum count of Executor Tasks (queued, running and open slots), grouped by host.name. Note that Queued reflects the number of queued tasks on executor, Running reflects the number of running tasks on executor, and Open Slots reflects the number of open slots on executor.
  • executor.open_slots
  • host.name
  • executor.queued_tasks
  • executor.running_tasks
Pool Task Slots by Host Shows the maximum count of Airflow Pool Slots - Deferred, Queued, Open, Running, Starving and Scheduled by Host. Can be used to monitor resource allocation, identify when pools are at capacity, and optimize your configuration to match your workflow needs.
  • airflow.pool.open_slots
  • airflow.pool.running_slots
  • airflow.pool.starving_tasks
  • host.name
  • pool_name
  • airflow.pool.queued_slots
  • airflow.pool.scheduled_slots
  • airflow.pool.deferred_slots

Kafka 

The Kafka Board Template provides insight into Kafka brokers, topics, partition, and consumers.

Tip

This Board Template relies on the Kafka Metrics receiver provided by the OpenTelemetry Collector Contrib distribution. View OpenTelemetry documentation for set up instructions.

For relevant Java Virtual Machine (JVM) metrics, the OpenTelemetry Java Agent should be included in Kafka nodes as well.

The Kafka Board Template includes the following queries:

Query Name Query Description Required Fields
Number of Active Brokers Shows the number of active brokers.
  • kafka.brokers
Consumer Group Membership Shows the number of consumers per broker.
  • group
  • host.name
  • kafka.consumer_group.members
Consumer Progress Lag vs Offset Rate Shows the average rate of Kafka consumer group lag and offsets over time, grouped by topic partitions. Use to monitor consumer progress and to detect delays by comparing offset increases to lag.
  • host.name
  • kafka.consumer_group.lag
  • kafka.consumer_group.offset
  • topic
  • group
Partition Offset Overview Shows the rate of change in the oldest and current offsets across Kafka partitions.
  • kafka.partition.current_offset
  • topic
  • host.name
  • kafka.partition.oldest_offset
Partition Count By Topic Shows the number of partitions for each topic. Use for capacity planning an ensuring proper topic configuration.
  • topic
  • host.name
  • kafka.partition.current_offset
  • partition
Partition Replication Health Shows the number of in-sync replicas for each partition compared to total replicas. Use to identify under-replicated partitions.
  • kafka.partition.replicas_in_sync
  • kafka.partition.replicas
  • topic
  • partition
  • host.name
Consumer Group Lag by Topic Shows total lag across all partitions for each consumer group and topic combination.
  • group
  • topic
  • kafka.consumer_group.lag_sum
Partition Balance Analysis Shows distribution of offsets across partitions for each topic. Use to identify potential partition imbalances.
  • kafka.partition.current_offset
  • topic
  • partition
High Consumer Lag Shows high consumer group lag, which may indicate potential consumer issues.
  • group
  • topic
  • host.name
  • kafka.consumer_group.lag_sum
Message Throughput Shows the approximate message throughput for each topic by measuring the rate of change in offset over time.
  • kafka.partition.current_offset
  • topic
  • host.name
JVM Thread Count by Cluster and State Shows the total JVM thread count across Kafka clusters, grouped by thread state. Use to identify thread contention or resource leaks.
  • host.name
  • jvm.thread.count
  • kafka.cluster.alias
  • jvm.thread.state
  • service.name
  • service.instance.id
  • jvm.thread.daemon
JVM Garbage Collection Durations Shows the median JVM and the P90 garbage collection durations. Use to understand garbage collection efficiency and memory management health.
  • jvm.gc.duration.p50
  • jvm.gc.duration.p90
  • kafka.cluster.alias
  • service.name
  • jvm.gc.action
  • jvm.gc.name
  • host.name
Max Recent JVM CPU Utilization Shows the highest CPU utilization within the JVM at a default 30 minute window. Use to identify potential load spikes or bottlenecks that may affect your cluster.
  • kafka.cluster.alias
  • service.name
  • host.name
  • jvm.cpu.recent_utilization
JVM Memory Usage and Commitment Shows memory usage patterns in clusters, providing a view in how memory is used and committed in the JVM. Use to track inefficient memory usage.
  • jvm.memory.used
  • jvm.memory.committed
  • kafka.cluster.alias
  • jvm.memory.type
  • jvm.memory.pool.name
  • host.name

Linux Host 

The Linux Host Board Template provides useful queries for monitoring Linux hosts. It provides insights into CPU, memory, disk, filesystem, and network utilization on the configured hosts.

This Board Template utilizes the Host Metrics receiver provided by the OpenTelemetry Collector Contrib distribution. View OpenTelemetry documentation for set up instructions.

Tip

Configuration of the hostmetrics receiver for this Board Template requires specific scrapers to be configured, namely:

  • CPU
  • Disk
  • Load
  • Filesystem
  • Memory
  • Network
  • Paging
  • Processes
  • Process

The Linux Host Board Template includes the following queries:

Query Name Query Description Required Fields
Process CPU Time Breakdown Shows the total CPU time consumed by different processes, broken down by process owner and command. Use to identify which processes are consuming the most CPU resources over time.
  • process.owner
  • process.executable.name
  • os.type
  • process.cpu.time
  • host.name
Memory Consumption Trends Shows the average memory usage across host, operating system, and state. Use to monitor and diagnose system memory usage trends.
  • state
  • os.type
  • system.memory.usage
  • host.name
CPU Utilization Trends Shows the distribution of CPU time spent on user processes, system operations, and idle time. Use to identify which hosts are under load.
  • os.type
  • system.cpu.time.user
  • system.cpu.time.system
  • system.cpu.time.idle
  • host.name
Disk I/O Shows the active Disk input and output based on device. Use to identify high read/write rates.
  • system.disk.io.write
  • host.name
  • device
  • os.type
  • system.disk.io.read
Memory Usage by Process Shows Linux processes by memory usage and virtual memory consumption. Use to troubleshoot resource bottlenecks and optimize memory allocation.
  • os.type
  • process.memory.usage
  • process.memory.virtual
  • host.name
  • process.command
  • process.owner
Filesystem Usage Shows filesystem usage across different mount points, devices, and modes. Use for capacity planning and troubleshooting storage issues.
  • host.name
  • device
  • mountpoint
  • mode
  • os.type
  • system.filesystem.usage.used
Network Metrics Shows network operations per network interface.
  • system.network.io.receive
  • system.network.io.transmit
  • host.name
  • device
  • os.type

Postgres 

The Postgres Board Template provides insight into Postgres’s operations, including active connections, database size, table count, and transaction throughput.

The Postgres Board Template includes the following queries:

Query Name Query Description Required Fields
Active Connections Shows the current number of active connections.
  • host.name
  • postgresql.backends
  • postgresql.connection.max
Database Size Shows the database size over time. Use to help with capacity planning and identifying unexpected growth patterns.
  • postgresql.db_size
  • postgresql.database.name
  • host.name
Database and Table Count Shows visibility into number of databases and tables, which can identify database sprawl.
  • postgresql.table.count
  • postgresql.database.name
  • host.name
  • postgresql.database.count
Transaction Throughput Shows the rate of commits and rollbacks per database, which provides insight into transaction throughput and success rates.
  • postgresql.commits
  • postgresql.rollbacks
  • postgresql.database.name
  • host.name
Block Read Performance Shows the the sources of block reads and their rates. Use to diagnose input/output performance issues.
  • postgresql.blocks_read
  • source
  • postgresql.database.name
  • postgresql.table.name
  • host.name
Index Usage Shows the rate of index scans. Use to identify frequently used indexes.
  • postgresql.index.name
  • host.name
  • postgresql.index.scans
  • postgresql.table.name
Database Operations Shows database operations. Use to provide insight into workload patterns.
  • postgresql.operations
  • operation
  • postgresql.table.name
  • postgresql.database.name
  • host.name
Background Writer Activity Shows buffer writes by source. Use to identify potential input/output bottlenecks.
  • source
  • host.name
  • postgresql.bgwriter.buffers.writes
Checkpoint Frequency Shows the rate of checkpoints by type (requested versus scheduled), which can help identify if checkpoints are occurring too frequently.
  • host.name
  • postgresql.bgwriter.checkpoint.count
  • type
Checkpoint Duration Shows time spent on checkpoint operations across databases and tables. Longer checkpoint durations can negatively impact database performance.
  • postgresql.bgwriter.duration
  • host.name
  • type
Table Size Shows the top 10 largest tables, which may identify tables that require optimization or partitioning.
  • postgresql.table.size
  • postgresql.table.name
Index Size Shows the top 10 largest indexes, which may identify indexes that need rebuilding or optimization.
  • postgresql.database.name
  • postgresql.table.name
  • host.name
  • postgresql.index.size
  • postgresql.index.name
Cache Hit Ratio Shows the sum of block reads satisfied from the buffer cache. A higher number indicates better performance.
  • postgresql.blocks_read
  • postgresql.database.name
  • postgresql.table.name
  • host.name
  • source
Replication WAL Delay Shows time between flushing recent WAL and notification standby servers have completed operation on it. Use to track replication delays.
  • host.name
  • postgresql.wal.delay
  • replication_client
Replication Data Delay Shows the amount of data delayed in replication, which can help identify network or performance issues affecting replication.
  • postgresql.replication.data_delay
  • replication_client
  • host.name
Database Locks by Type Shows the maximum number of database locks per type. Use for situations where multiple concurrent transactions may cause resource contention.
  • host.name
  • postgresql.database.locks
  • mode
  • lock_type
Postgres Memory Utilization Shows memory usage and amount of committed memory for postgres processes. Use to identify inefficient processes.
  • process.memory.usage
  • process.memory.virtual
  • process.command
  • process.executable.name
  • host.name
Postgres CPU Utilization Trends Shows CPU utilization for PostgreSQL processes. Use to identify inefficient queries, excessive index scanning, and so on.
  • process.cpu.time
  • process.command
  • host.name
Number of Postgres Operations Shows the number of PostgreSQL operations per database and table name.
  • postgresql.table.name
  • operation
  • host.name
  • postgresql.operations
  • postgresql.database.name

Spring Boot 

The Spring Boot Board Template provides insight into application health and performance metrics for Spring Boot microservices.

Tip
Source data for this Board Template is configured using automatic instrumentation provided by the OpenTelemetry Java Agent SDK. View our Java automatic instrumentation instructions to learn more.

The Spring Boot Board Template includes the following queries:

Query Name Query Description Required Fields
Database Usage Shows database performance metrics. Use to help identify slow-performing queries and connection issues.
  • db.client.connections.use_time.avg
  • db.client.connections.wait_time.avg
  • host.name
  • telemetry.sdk.language
API Endpoint Latency Shows a heatmap of API endpoint response times. Use to highlight bottlenecks or anomalies in performance.
  • http.server.request.duration.avg
  • http.route
  • http.response.status_code
  • http.request.method
  • host.name
  • telemetry.sdk.language
Garbage Collection Performance Monitor Shows maximum, average, and P95 duration of garbage collection metrics. Use to identify memory allocation patterns that causes application slow down.
  • jvm.gc.duration.avg
  • jvm.gc.duration.p95
  • jvm.gc.action
  • host.name
  • telemetry.sdk.language
  • jvm.gc.duration.max
Request Per Minute Shows requests made per minute. Use to observe the traffic patterns and to detect unexpected load or errors.
  • host.name
  • telemetry.sdk.language
  • http.route
  • http.request.method
  • http.response.status_code
Heap used vs Heap Max Limit Shows the JVM memory matrix and compares current memory usage against maximum heap limit. Use to identify out of memory errors.
  • jvm.memory.used
  • jvm.memory.limit
  • host.name
  • telemetry.sdk.language
API Errors Shows error responses with status code >= 400. Use to monitor API health.
  • http.route
  • host.name
  • telemetry.sdk.language
  • http.response.status_code
Response Size Distribution Shows response payload size. Use to monitor data transfer efficiency, and to identify any unexpectedly large response.
  • http.request.method
  • http.response.status_code
  • host.name
  • telemetry.sdk.language
  • http.response.body.size
  • http.route
JVM CPU Time Rate Shows CPU consumption rate metrics. Use to identify processing-intensive operations and to detect performance decline overtime.
  • jvm.cpu.time
  • host.name
  • telemetry.sdk.language
  • meta.signal_type

Django 

The Django Board Template provides insight into application heath and performance metrics for a Django application.

Tip

This board utilizes the OpenTelemetry Python API for automatic instrumentation via the OpenTelemetry Python SDK.

View the OpenTelemetry Python API documentation and their Django instrumentation instructions.

The Django Board Template includes the following queries:

Query Name Query Description Required Fields
Request Count Per Minute Shows requests made per minute. Use to observe the traffic patterns and to detect unexpected load or errors.
  • telemetry.sdk.language
  • http.host
  • http.route
  • http.method
  • http.status_code
  • http.server_name
HTTP Response Duration Shows the P95 response duration by route, status code and server name. Highlights Django HTTP performance.
  • http.route
  • http.method
  • http.status_code
  • http.server_name
  • telemetry.sdk.language
  • http.response.body.size
  • duration_ms
HTTP Errors Shows the count of HTTP errors by route, status code, and host.name. Use to assess the success and error rate of APIs.
  • http.status_code
  • http.server_name
  • error
  • telemetry.sdk.language
  • http.route
  • http.method
Exceptions Shows exceptions thrown in the service. Use to access overall health of the application.
  • http.server_name
  • exception.type
  • code.namespace
  • exception.message
  • exception.stacktrace
  • telemetry.sdk.language
AVG and P95 Request Size Shows the average and P95 HTTP request size to monitor payload efficiency.
  • http.server_name
  • telemetry.sdk.language
  • http.request.body.size
  • http.route
  • http.method
  • http.status_code
AVG and P95 Response Size Shows the average and P95 HTTP response size to monitor payload efficiency.
  • telemetry.sdk.language
  • http.response.body.size
  • http.route
  • http.method
  • http.status_code
  • http.server_name
P95 and Heatmap of Job Duration Shows the P95 and Heatmap of Job Duration by messaging destination, messaging system, and server name. Provides insights into status async job runners.
  • http.server_name
  • telemetry.sdk.language
  • duration_ms
  • messaging.destination
  • messaging.system
Jobs Executed Shows the count of root traces with messaging system and destination. Can be used to assess overall performance of the async job operations.
  • http.server_name
  • messaging.destination
  • messaging.system
  • telemetry.sdk.language
  • messaging.destination_kind
DB connection Count Per Min Shows the connection count per minute where db connection event is “open”. Helps gain visibility into connection pooling efficiency.
  • telemetry.sdk.language
  • db.operation
  • db.system
  • db.name
  • db.connection.event

Rails 

The Rails Board Template gives you visibility into Rails behavior, performance, and health. The queries and visualizations help identify slow database queries, inefficient code paths, and other performance bottlenecks.

Tip

The required fields in the Rails Board Template are derived from Ruby and Ruby on Rails support for OpenTelemetry logs, metrics, and traces.

View our documentation on instrumenting your Ruby and Ruby on Rails applications.

The Rails Board Template includes the following queries:

Query Name Query Description Required Fields
Requests Served Shows count of requests served by Rails by host.name. Use to provide an overview of traffic volume at a glance.
  • host.name
  • telemetry.sdk.language
  • http.route
HTTP Response Duration Shows P95 response duration by route, controller namespace, controller function, status code, and host.name. Use for Rails HTTP performance.
  • duration_ms
  • http.route
  • code.namespace
  • code.function
  • http.status_code
  • host.name
  • telemetry.sdk.language
HTTP Duration Heatmap Shows a heatmap of HTTP response duration by route, status code and host.name. Use to assess and investigate outliers.
  • http.status_code
  • host.name
  • telemetry.sdk.language
  • duration_ms
  • http.route
HTTP Errors Shows count of HTTP errors by route, Controller namespace, status code, and host.name. Use to assess success and error rate of Rails web endpoints.
  • error
  • http.route
  • code.namespace
  • http.status_code
  • host.name
  • telemetry.sdk.language
DB Statement Duration Shows a heatmap and the P95 of database duration per database name, operation, statement and host.name. A heatmap provides more information to help identify outlier DB statements.
  • duration_ms
  • db.name
  • db.operation
  • db.statement
  • telemetry.sdk.language
P95 and Heatmap of Job Duration Shows P95 and a heatmap of Job Duration by messaging destination, messaging system, service name, and host.name. Provides insights into status of Rails async job runners, such as ActiveJob and Sidekiq.
  • duration_ms
  • messaging.destination
  • messaging.system
  • service.name
  • host.name
  • telemetry.sdk.language
Exceptions Shows exceptions thrown by type, code namespace, and host.name. Use to assess overall health of your Rails application.
  • code.namespace
  • host.name
  • telemetry.sdk.language
  • error
  • exception.message
  • exception.type
Jobs Executed Shows count of root traces with messaging system and destination. Use to assess overall performance of Rails async job operations.
  • telemetry.sdk.language
  • host.name
  • messaging.system
  • messaging.destination

Kubernetes 

Tip
Use the Kubernetes Quick Start to instrument the required fields for Kubernetes Board Templates.

Kubernetes Pod Metrics 

The Kubernetes Pod Metrics Board Template includes queries that help you investigate pod performance and resource usage within Kubernetes clusters:

Query Name Query Description Required Fields
Pod CPU Usage Shows the amount of CPU used by each pod in the cluster. CPU is reported as the average core usage measured in cpu units. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers, and 1 hyper-thread on bare-metal Intel processors.
  • k8s.pod.cpu.utilization
  • k8s.pod.name
Pod Memory Usage Shows the amount of memory being used by each Kubernetes pod.
  • k8s.pod.memory.usage
  • k8s.pod.name
Pod Uptime Smokestacks As pod uptime ever-increases, this query uses the smokestack method, which applies a LOG10 to the Pod Uptime metric, and newly started or restarted pods appear more significantly than pods that have been running a long time, which move into a straight line eventually.
  • LOG10($k8s.pod.uptime)
  • k8s.pod.name
  • k8s.pod.uptime
Unhealthy Pods Shows trouble that pods may be experiencing during their operating lifecycle. Many of these events are present during start-up and get resolved so the presence of a count isn’t necessarily bad.
  • k8s.namespace.name
  • k8s.pod.name
  • reason
Pod CPU Utilization vs. Limit When a CPU Limit is present in a pod configuration, this query shows how much CPU that each pod uses as a percentage against that limit.
  • k8s.pod.cpu_limit_utilization
  • k8s.pod.name
Pod CPU Utilization vs. Request When a CPU Request is present in a pod configuration, this query shows how much CPU that each pod uses as a percentage against that request value.
  • k8s.pod.cpu_request_utilization
  • k8s.pod.name
Pod Memory Utilization vs. Limit When a Memory Limit is present in a pod configuration, this query shows how much memory that each pod uses as a percentage against that limit value.
  • k8s.pod.memory_limit_utilization
  • k8s.pod.name
Pod Memory Utilization vs. Request When a Memory Request is present in a pod configuration, this query shows how much memory that each pod uses as a percentage against that request value.
  • k8s.pod.memory_request_utilization
  • k8s.pod.name
Pod Network IO Rates Displays Network IO RATE_MAX for Transmit and Receive network traffic (in bytes) as a stacked graph, and gives the overall network rate and the individual rate for each node.
  • k8s.pod.name
  • k8s.pod.network.io.receive
  • k8s.pod.network.io.transmit
Pods With Low Filesystem Availability Shows any pods where filesystem availability is below 5 GB.
  • k8s.pod.filesystem.available
  • k8s.pod.name
Pod Filesystem Usage Shows the amount of filesystem usage per Kubernetes pod, displayed in a stack graph to show total filesystem usage of all pods.
  • k8s.pod.filesystem.usage
  • k8s.pod.name
Pods Per Namespace Shows the number of pods currently running in each Kubernetes namespace.
  • k8s.namespace.name
  • k8s.pod.name
Pods Per Node Shows the number of pods currently running in each Kubernetes Node.
  • k8s.node.name
  • k8s.pod.name
Pod Network Errors Shows network errors in receive and transmit, grouped by pod.
  • k8s.pod.name
  • k8s.pod.network.errors.receive
  • k8s.pod.network.errors.transmit
Pods Per Deployment Shows the number of pods currently deployed in different Kubernetes deployments.
  • k8s.deployment.name
  • k8s.pod.name

Kubernetes Node Metrics 

The Kubernetes Node Metrics Board Template includes queries that help you investigate node performance and resource usage within Kubernetes clusters:

Query Name Query Description Required Fields
Node CPU Usage Shows the amount of CPU used on each node in the cluster. CPU is reported as the average core usage measured in cpu units. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers, and 1 hyper-thread on bare-metal Intel processors.
  • k8s.node.cpu.utilization
  • k8s.node.name
Node Memory Utilization Shows percent of memory used on each Kubernetes node.
  • IF(EXISTS($k8s.node.memory.available), MUL(DIV($k8s.node.memory.working_set, $k8s.node.memory.available), 100))
  • k8s.node.memory.available
  • k8s.node.memory.usage
  • k8s.node.name
Node Network IO Rates Displays Network IO RATE_MAX for Transmit and Receive network traffic as a stacked graph, and gives overall network rate and the individual rate for each node.
  • k8s.node.name
  • k8s.node.network.io.receive
  • k8s.node.network.io.transmit
Unhealthy Nodes Shows errors that Kubernetes nodes are experiencing.
  • k8s.namespace.name
  • k8s.node.name
  • reason
  • severity_text
Node Filesystem Utilization Shows percent of filesystem used on each node.
  • IF(EXISTS($k8s.node.filesystem.usage),MUL(DIV($k8s.node.filesystem.usage,$k8s.node.filesystem.capacity), 100))
  • k8s.node.filesystem.capacity
  • k8s.node.filesystem.usage
  • k8s.node.name
Node Uptime Smokestack As node uptime ever-increases, this query uses the smokestack method, which applies a LOG10 to the Node Uptime metric, and newly started or restarted nodes appear more significantly than nodes that have been running a long time, which move into a straight line eventually.
  • LOG10($k8s.node.uptime)
  • k8s.node.name
  • k8s.node.uptime
Node Network Errors Shows network transmit and receive errors for each node.
  • k8s.node.name
  • k8s.node.network.errors.receive
  • k8s.node.network.errors.transmit
Pods and Containers per Node Shows the number of pods and the number of containers per node as stacked graphs, and also shows total number of pods and containers across the environment.
  • k8s.container.name
  • k8s.node.name
  • k8s.pod.name

Kubernetes Workload Health 

The Kubernetes Workload Health Board Template includes queries that help you diagnose Kubernetes-related application issues:

Query Name Query Description Required Fields
Container Restarts Shows the total number of restarts per pod, and the rate of restarts of pods where the restart count is greater than zero.
  • k8s.container.name
  • k8s.container.restarts
  • k8s.namespace.name
  • k8s.pod.name
Unhealthy Pods Shows trouble that pods may be experiencing during their operating lifecycle. Many of these events are present during start-up and get resolved so the presence of a count isn’t necessarily bad.
  • k8s.namespace.name
  • k8s.pod.name
  • reason
Pending Pods Shows pods in a “Pending” state.
  • k8s.pod.name
  • k8s.pod.phase
Failed Pods Shows pods in a “Failed” or “Unknown” state.
  • k8s.pod.name
  • k8s.pod.phase
Unhealthy Nodes Shows errors that Kubernetes nodes are experiencing.
  • k8s.namespace.name
  • reason
  • k8s.pod.name
  • reason
  • severity_text
Unhealthy Volumes Shows volume creation and attachment failures.
  • k8s.namespace.name
  • k8s.pod.name
  • reason
  • severity_text
Unscheduled Daemonset Pods Tracks cases where a pod in a daemonset is not currently running on every node in the cluster as it should be.
  • SUB($k8s.daemonset.desired_scheduled_nodes, $k8s.daemonset.current_scheduled_nodes)
  • k8s.daemonset.current_scheduled_nodes
  • k8s.daemonset.desired_scheduled_nodes
  • k8s.daemonset.name
  • k8s.namespace.name
Stateful Set Pod Readiness Tracks any stateful sets where pods are in an non-ready state that should be in a ready state.
  • SUB($k8s.statefulset.desired_pods,$k8s.statefulset.ready_pods)
  • k8s.statefulset.desired_pods
  • k8s.statefulset.name
  • k8s.statefulset.ready_pods
Deployment Pod Status Shows Deployments where Pods have not fully deployed. Numbers greater than zero show pods in a deployment that are not yet “ready”.
  • SUB($k8s.deployment.desired,$k8s.deployment.available)
  • k8s.deployment.available
  • k8s.deployment.desired
  • k8s.deployment.name
Job Failures Tracks the number of failed pods in Kubernetes jobs.
  • k8s.job.failed_pods
  • k8s.job.name
Active Cron Jobs Tracks the number of active pods in each Kubernetes cron job.
  • k8s.cronjob.active_jobs
  • k8s.cronjob.name

OpenTelemetry 

OpenTelemetry Collector Operations 

The OpenTelemetry Collector Operations Board Template includes queries with key metrics emitted by the OpenTelemetry Collector during its operation:

Query Name Query Description Required Fields
Exporter Span Failures Shows when errors happen during enqueueing or sending in exporters.
  • net.host.name
  • otelcol_exporter_enqueue_failed_spans
  • otelcol_exporter_send_failed_spans
Collector Uptime Smokestacks Shows the uptime for different pods with a Log10 to make it clearer where restarts are happening.
  • LOG10($otelcol_process_uptime)
  • net.host.name
  • otelcol_process_uptime
Exporter Metric Send Failures Shows when errors happen during sending from exporters.
  • net.host.name
  • otelcol_exporter_enqueue_failed_metric_points
  • otelcol_exporter_send_failed_metric_points
Exporter Metrics Enqueue Failures Shows when errors happen during enqueueing in exporters.
  • net.host.name
  • otelcol_exporter_send_failed_metric_points
Exporter Log Records Failures Shows when errors happen during enqueueing or sending in exporters.
  • net.host.name
  • otelcol_exporter_enqueue_failed_log_records

OpenTelemetry Java Metrics 

The OpenTelemetry Java Metrics Board Template includes queries that help you investigate application issues related to the Java Virtual Machine (JVM).

Metrics for Java applications are sourced from the JVM and reported by the OpenTelemetry Java Agent or Honeycomb OpenTelemetry Distribution for Java.

Query Name Query Description Required Fields
JVM Memory Usage (Young Generation) Shows memory usage for Eden space on the JVM heap, which is where newly created objects are stored. When it fills, a minor Garbage Collection (GC) occurs, moving all “live” objects to the Survivor space. In addition to current memory usage, committed represents the guaranteed available memory, and limit represents maximum usable.
  • host.name
  • pool
  • process.runtime.jvm.memory.committed
  • process.runtime.jvm.memory.limit
  • process.runtime.jvm.memory.usage
  • process.runtime.jvm.memory.usage_after_last_gc
  • service.name
  • type
JVM Memory Usage (Old Generation) Shows memory usage for tenured Gen JVM heap space, which stores long-lived objects. When a Full or Major GC is performed, it is expensive and may pause app execution. Committed represents guaranteed available memory, and limit represents maximum usable memory.
  • host.name
  • pool
  • process.runtime.jvm.memory.committed
  • process.runtime.jvm.memory.limit
  • process.runtime.jvm.memory.usage
  • process.runtime.jvm.memory.usage_after_last_gc
  • service.name
  • type
JVM Garbage Collection (GC) Activity Shows JVM garbage collection activity. JVM GC actions occur periodically to reclaim memory but consume CPU cycles to do so. In the worst cases, a GC can cause the entire JVM to pause, making the application appear unresponsive.
  • process.runtime.jvm.gc.duration.count
  • action
  • gc
  • host.name
  • process.runtime.jvm.gc.duration.avg
  • process.runtime.jvm.gc.duration.max
  • service.name
JVM CPU Utilization Shows system CPU utilization and 1-minute load average, as captured by the JVM.
  • host.name
  • process.runtime.jvm.cpu.utilization
  • process.runtime.jvm.system.cpu.load_1m
  • service.name
JVM Buffer Memory Usage Shows usage of buffer memory, which is provided by the OS and is outside the JVM’s heap memory allocation. Buffer memory is used by Java NIO to quickly write data to network or disk.
  • host.name
  • process.runtime.jvm.buffer.limit
  • process.runtime.jvm.buffer.usage
  • service.name
JVM Non-Heap Memory Usage Shows usage of JVM non-heap memory, which is allocated above and beyond the heap size you’ve configured. JVM non-heap memory is a section of memory in the JVM that stores class information (Metaspace), compiled code cache, thread stack, and so on. It cannot be garbage collected.
  • host.name
  • pool
  • process.runtime.jvm.memory.committed
  • process.runtime.jvm.memory.limit
  • process.runtime.jvm.memory.usage
  • service.name
  • type

AWS 

AWS Lambda Health 

The AWS Lambda Health Board Template includes queries that monitor the health of AWS Lambda functions, including metrics for invocations, errors, throttles, and concurrency:

Query Name Query Description Required Fields
Duration & Execution by ID/Version Tracks the execution time of Lambda functions, identified by their ID or version. Useful for analyzing the performance and efficiency of different versions or instances of a function over time.
  • duration_ms
  • faas.execution
  • faas.name
  • faas.version
Lambda Invocations by Function Shows the total number of times each Lambda function is invoked. It helps in tracking the frequency of usage of different functions, enabling a clear understanding of which functions are most or least used.
  • FunctionName
  • MetricName
  • Namespace
Latency by Function/Metric Shows the response time for each Lambda function, broken down by specific metrics. Useful for identifying functions that may be experiencing performance issues due to high latency.
  • FunctionName
  • MetricName
  • Namespace
  • amazonaws.com/AWS/Lambda/Duration.max
  • amazonaws.com/AWS/Lambda/PostRuntimeExtensionsDuration.max
Function Error Count and Rate Shows two key pieces of information: the total number of errors encountered by each Lambda function and the error rate, calculated as the ratio of errors to total invocations. Useful for pinpointing functions that are failing or experiencing issues.
  • FunctionName
  • MetricName
  • Namespace
  • amazonaws.com/AWS/Lambda/Errors.count
Lambda Throttles Shows the instances where Lambda invocations are being throttled, such as when the number of function calls exceeds the concurrency limits. Tracking this helps in managing and optimizing the scalability settings for each function.
  • FunctionName
  • MetricName
  • Namespace
  • amazonaws.com/AWS/Lambda/Throttles.count
Function Concurrency Monitors the simultaneous execution count of each Lambda function, tracking how many instances of a function are running at the same time.
  • FunctionName
  • MetricName
  • Namespace
  • amazonaws.com/AWS/Lambda/ConcurrentExecutions.avg
  • amazonaws.com/AWS/Lambda/UnreservedConcurrentExecutions.avg

EC2 Health 

The AWS EC2 Board Template includes queries that monitor the health of AWS EC2 instances, including status failures, disk Read and Write operations, and EBS operations.

The AWS EC2 Board Template includes the following queries:

Query Name Query Description Required Fields
CPU Utilization Shows CPU utilization per EC2 instance.
  • amazonaws.com/AWS/EC2/CPUUtilization.max
  • Dimensions.InstanceId
  • cloud.account.id
  • cloud.region
Network I/O Shows network input and output per EC2 instance.
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/EC2/NetworkIn.max
  • amazonaws.com/AWS/EC2/NetworkPacketsOut.max
  • Dimensions.InstanceId
EBS Read Operations Shows the number of read operations committed by the instance.
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/EC2/EBSReadOps.max
  • Dimensions.InstanceId
EBS Write Operations Shows the number of write operations committed by the instance.
  • amazonaws.com/AWS/EC2/EBSWriteOps.max
  • Dimensions.InstanceId
  • cloud.account.id
  • cloud.region
EBS IO Balance Shows available input and output per second that attached EBS volumes are utilizing. Use to monitor potential throttling on an EBS volume attached to an instance.
  • amazonaws.com/AWS/EC2/EBSIOBalance%.max
  • Dimensions.InstanceId
  • cloud.account.id
  • cloud.region
Instance Metadata Service Outliers Shows the number of instances that are not currently using IMDSv2. Use to identify potential security issues with EC2 instances.
  • amazonaws.com/AWS/EC2/MetadataNoToken.max
  • Dimensions.InstanceId
  • cloud.account.id
  • cloud.region
EC2 Disk Read/Write Shows Write and Read operations undertaken by EC2 instances. Use to monitor EBS volume usage.
  • amazonaws.com/AWS/EC2/EBSWriteBytes.max
  • amazonaws.com/AWS/EC2/EBSReadBytes.max
  • Dimensions.InstanceId
  • Namespace
EC2 Instance Status Failures Shows any EC2 instances that have failed a status check in the provided time period.
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/EC2/StatusCheckFailed.max
  • Dimensions.InstanceId

AWS ALB/ELB Health 

The AWS ALB/ELB Board Template includes queries that monitor the Load Balancer’s health, status codes, active connections, and requests.

Tip
This Board Template relies on AWS Metrics streams provided by AWS Cloudwatch. Data is streamed from an AWS Kinesis Data Firehose to an endpoint compatible with CloudWatch Metric Streams. To utilize this Board Template, you will need to provision a metrics stream for EC2 instances that you wish to monitor.

The AWS ALB/ELB Board Template includes the following queries:

Query Name Query Description Required Fields
Request Count Per Target Shows how requests are distributed across targets. Use to diagnose imbalanced traffic in the load balancer.
  • cloud.region
  • Dimensions.AvailabilityZone
  • amazonaws.com/AWS/ApplicationELB/RequestCountPerTarget.count
  • Dimensions.LoadBalancer
  • Dimensions.TargetGroup
  • cloud.account.id
Healthy vs. Unhealthy Host Count Shows the number of healthy versus unhealthy hosts per load balancer, which is segmented across target groups and availability zones. Use to quickly spot failing load balancer targets.
  • amazonaws.com/AWS/ApplicationELB/HealthyHostCount.max
  • amazonaws.com/AWS/ApplicationELB/UnHealthyHostCount.max
  • Dimensions.LoadBalancer
  • Dimensions.TargetGroup
  • cloud.account.id
  • Dimensions.AvailabilityZone
Load Balancer Status Codes Shows status codes per load balancer. Use to identify routing or traffic management issues.
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/ApplicationELB/HTTPCode_ELB_3XX_Count.count
  • amazonaws.com/AWS/ApplicationELB/HTTPCode_ELB_4XX_Count.count
  • amazonaws.com/AWS/ApplicationELB/HTTPCode_ELB_5XX_Count.count
  • Dimensions.LoadBalancer
Active Connections Shows active connections per load balancer.
  • amazonaws.com/AWS/ApplicationELB/ActiveConnectionCount.count
  • Dimensions.LoadBalancer
  • cloud.account.id
  • cloud.region
State Routing Shows load balancer state routing. Use to identify network configuration errors, unresponsive applications, or health check delays.
  • amazonaws.com/AWS/ApplicationELB/UnhealthyStateRouting.max
  • Dimensions.LoadBalancer
  • Dimensions.TargetGroup
  • Dimensions.AvailabilityZone
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/ApplicationELB/HealthyStateRouting.max
Load Balancer Capacity Units Shows LCUs consumed during a given period of time. Use to optimize load balancer cost and detecting bottlenecks.
  • Dimensions.LoadBalancer
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/ApplicationELB/PeakLCUs.max
Anomalous Host Count Shows the number of hosts behaving abnormally. Use to detect and diagnose excessive error rates, latency issues, or inconsistent health check results.
  • amazonaws.com/AWS/ApplicationELB/AnomalousHostCount.max
  • Dimensions.LoadBalancer
  • Dimensions.TargetGroup
  • cloud.account.id
DNS Target State Shows load balancer DNS target state resolution. Use to identify failing targets and DNS misconfigurations.
  • amazonaws.com/AWS/ApplicationELB/HealthyStateDNS.max
  • amazonaws.com/AWS/ApplicationELB/HealthyStateDNS.count
  • amazonaws.com/AWS/ApplicationELB/UnhealthyStateDNS.max
  • Dimensions.LoadBalancer
  • Dimensions.TargetGroup
  • cloud.account.id
  • Dimensions.AvailabilityZone
TLS Negotiation Errors Shows the number of TLS negotiation errors per load balancer.
  • amazonaws.com/AWS/ApplicationELB/ClientTLSNegotiationErrorCount.count
  • Dimensions.LoadBalancer
  • Dimensions.AvailabilityZone
  • cloud.account.id
  • cloud.region
Connection Error Count Shows errors on targets. Use to diagnose and troubleshoot misconfigured load balancer targets.
  • Dimensions.TargetGroup
  • amazonaws.com/AWS/ApplicationELB/TargetConnectionErrorCount.max
  • Dimensions.LoadBalancer
  • cloud.account.id
  • cloud.region

SQS 

The SQS Board Template provides insight into critical AWS SQS operations.

Tip
This Board Template relies on AWS Metrics streams provided by AWS Cloudwatch. Data is streamed from an AWS Kinesis Data Firehose to an endpoint compatible with CloudWatch Metric Streams. To utilize this Board Template, you will need to provision a metrics stream for EC2 instances that you wish to monitor.

The SQS Board Template includes the following queries:

Query Name Query Description Required Fields
Request Count Per Minute Shows requests made per minute. Use to observe the traffic patterns and detect unexpected load or errors.
  • telemetry.sdk.language
  • http.host
  • http.route
  • http.method
  • http.status_code
  • http.server_name
HTTP Response Duration Shows the P95 response duration by route, status code, and server name. Use for Django HTTP performance.
  • http.route
  • http.method
  • http.status_code
  • http.server_name
  • telemetry.sdk.language
  • http.response.body.size
  • duration_ms
HTTP Errors Shows count of HTTP errors by route, status code, and host.name. Use to assess success and error rates of APIs.
  • http.status_code
  • http.server_name
  • error
  • telemetry.sdk.language
  • http.route
  • http.method
Exceptions Shows exceptions thrown in the service. Use to assess the overall health of the application.
  • http.server_name
  • exception.type
  • code.namespace
  • exception.message
  • exception.stacktrace
  • telemetry.sdk.language
AVG and P95 Request Size Shows the average and P95 HTTP request size. Use to monitor payload efficiency.
  • http.server_name
  • telemetry.sdk.language
  • http.request.body.size
  • http.route
  • http.method
  • http.status_code
AVG and P95 Response Size Shows the average and P95 HTTP response size. Use to monitor payload efficiency.
  • telemetry.sdk.language
  • http.response.body.size
  • http.route
  • http.method
  • http.status_code
  • http.server_name
P95 and Heatmap of Job Duration Shows the P95 and a heatmap of Job Duration by messaging destination, messaging system, and server name. Provides insights into status async job runners.
  • http.server_name
  • telemetry.sdk.language
  • duration_ms
  • messaging.destination
  • messaging.system
Jobs Executed Shows count of root traces with messaging system and destination. Use to assess overall performance of the async job operations.
  • http.server_name
  • messaging.destination
  • messaging.system
  • telemetry.sdk.language
  • messaging.destination_kind
DB connection Count Per Min Shows the connection count per minute where database connection event is “open”. Use to gain visibility into connection pooling efficiency.
  • telemetry.sdk.language
  • db.operation
  • db.system
  • db.name
  • db.connection.event

RDS 

The RDS Board Template provides insight to monitor and optimize performance for AWS RDS databases.

Tip
This Board Template relies on AWS Metrics streams provided by AWS Cloudwatch. Data is streamed from an AWS Kinesis Data Firehose to an endpoint compatible with CloudWatch Metric Streams. To utilize this Board Template, you will need to provision a metrics stream for EC2 instances that you wish to monitor.

The RDS Board Template includes the following queries:

Query Name Query Description Required Fields
Number of Connections Shows the number of connections to RDS instances.
  • amazonaws.com/AWS/RDS/DatabaseConnections.count
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
Database Load Shows the level of session activity on RDS instances.
  • amazonaws.com/AWS/RDS/DBLoad.max
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
Disk Queue Depth Shows the number of outstanding input/output waiting to access the disk. High queue depth can indicate the workload is generating more read/write requests than underlying storage can handle.
  • amazonaws.com/AWS/RDS/DiskQueueDepth.max
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • amazonaws.com/AWS/RDS/DiskQueueDepth.count
Freeable Memory Shows the minimum freeable memory per database instance. Use to identify memory pressure in RDS instances.
  • amazonaws.com/AWS/RDS/FreeableMemory.min
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • amazonaws.com/AWS/RDS/FreeableMemory.count
Read/Write Operations Shows the read and write operations per second that the RDS instance is performing. Use to diagnose bottlenecks, optimize workloads, and manage cost.
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • amazonaws.com/AWS/RDS/WriteIOPS.max
  • amazonaws.com/AWS/RDS/ReadIOPS.max
CPU Utilization Shows maximum CPU utilization across database instance identifiers.
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • amazonaws.com/AWS/RDS/CPUUtilization.max
Free Storage Space Shows the amount of free storage space per database instance.
  • amazonaws.com/AWS/RDS/FreeStorageSpace.max
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
Burst Balance Shows the burst capacity per database instance. Lower burst capacity can affect input/output performance. Use for capacity planning and to optimize database performance.
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • amazonaws.com/AWS/RDS/BurstBalance.sum
Read/Write Latency Visualizes Read/Write latency per database instance. Use for troubleshooting slow queries, inefficient indexes, or locking issues.
  • amazonaws.com/AWS/RDS/WriteLatency.sum
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • amazonaws.com/AWS/RDS/ReadLatency.sum
Transaction Log Disk Usage Shows the amount of storage consumed by database transaction logs. Use to prevent storage exhaustion.
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/RDS/TransactionLogsDiskUsage.max
Checkpoint Lag Shows checkpoint lag. Use to determine latency between leader and followers in replication.
  • amazonaws.com/AWS/RDS/CheckpointLag.max
  • Dimensions.DBInstanceIdentifier
Swap Usage Shows swap activity (from RAM to disk) per RDS instance. Use for identifying performance issues related to memory pressure.
  • cloud.account.id
  • cloud.region
  • amazonaws.com/AWS/RDS/SwapUsage.max
  • Dimensions.DBInstanceIdentifier
Network Throughput Shows the rate at which network data is being sent from RDS instances. Use to identify excessive data transfer or increased query latencies.
  • amazonaws.com/AWS/RDS/NetworkTransmitThroughput.max
  • Dimensions.DBInstanceIdentifier
  • cloud.account.id
  • cloud.region

Honeycomb Features 

Refinery Operations 

For teams using Refinery to sample their data, the Refinery Board Template provides an overview of sampling operations.

Tip
Refinery emits metrics that provide insights into its health, trace throughput, and sampling statistics. Required fields in the Refinery Board Template map to these metrics and populate automatically when sent to Honeycomb. To learn more about these fields, visit Refinery Configuration.

The Refinery Board Template includes the following queries:

Query Name Query Description Required Fields
Stress Relief Status Shows the current stress level on the Refinery cluster.
  • stress_level
  • stress_relief_activated
  • hostname or host.name
Dropped From Stress Shows how many traces are being dropped due to stress on the Refinery cluster.
  • dropped_from_stress
  • hostname or host.name
Stress Relief Log Shows reasons why Refinery is going into stress relief.
  • StressRelief
  • reason
  • msg
  • hostname or host.name
Cache Health Shows metrics for cache health.
  • collect_cache_buffer_overrun
  • memory_inuse
  • collect_cache_entries_max or collect_cache_entries.max
  • collect_cache_capacity
  • num_goroutines
  • process_uptime_seconds
  • hostname or host.name
Cache Ejections Shows number of traces ejected from cache.
  • trace_send_ejected_full
  • trace_send_ejected_memsize
  • hostname or host.name
Intercommunications Shows total events from outside Refinery and events redirected from a peer.
  • incoming_router_span
  • peer_router_batch
  • hostname or host.name
Receive Buffers Shows receive buffer operations.
  • incoming_router_dropped
  • peer_router_dropped
  • hostname or host.name
Peer Send Buffers Show metrics for the queue used to buffer spans to send to peer nodes.
  • libhoney_peer_queue_overflow
  • libhoney_peer_send_errors
  • hostname or host.name
Upstream Send Buffers Shows metrics for the queue used to buffer spans to send to Honeycomb.
  • libhoney_upstream_queue_length
  • libhoney_upstream_enqueue_errors
  • libhoney_upstream_response_errors
  • libhoney_upstream_send_errors
  • libhoney_upstream_send_retries
  • hostname or host.name
EMADynamicSampler Performance Shows EMADynamicSampler sampling effectiveness.
  • emadynamic_sample_rate_avg
  • emadynamic_keyspace_size
  • emadynamic_num_kept
  • emadynamic_num_dropped
EMAThroughputSampler Performance Shows EMAThroughputSampler sampling effectiveness.
  • emathroughput_sample_rate_avg
  • emathroughput_keyspace_size
  • emathroughput_num_kept
  • emathroughput_num_dropped
WindowedThroughput Performance Shows WindowedThroughput sampling effectiveness.
  • windowedthroughput_sample_rate_avg
  • windowedthroughput_keyspace_size
  • windowedthroughput_num_kept
  • windowedthroughput_num_dropped
TotalThroughputSampler Performance Shows TotalThroughputSampler sampling effectiveness.
  • totalthroughput_sample_rate_avg
  • etotalthroughput_keyspace_size
  • totalthroughput_num_kept
  • totalthroughput_num_dropped
DynamicSampler Performance Shows DynamicSampler sampling effectiveness.
  • dynamic_sample_rate_avg
  • dynamic_keyspace_size
  • dynamic_num_kept
  • dynamic_num_dropped
RulesBasedSampler Performance Shows RulesBasedSampler sampling effectiveness.
  • rulesbased_sample_rate_avg
  • rulesbased_num_kept
  • rulesbased_num_dropped
Trace Indicators Shows total traces sent before completion and span received for a trace already sent.
  • trace_sent_cache_hit
  • trace_send_no_root
Sampling Decisions Shows total traces accepted and sent or dropped.
  • trace_accepted
  • trace_send_dropped
  • trace_send_kept
Refinery Send Event Error Logs Shows errors when sending events to its peers or upstream to our API server.
  • msg
  • dataset
  • api_host
  • error
Refinery Handler Event Error Logs Shows errors when receiving or parsing events being sent to a node.
  • msg
  • dataset
  • api_host
  • error.err
  • error.msg
Refinery Events Exceeding Max Size Shows errors when events are too large to be sent to Honeycomb.
  • msg
  • dataset
  • api_host
  • error

Activity Log Security 

Tip
Honeycomb automatically creates the required fields for the Activity Log Board Templates when it generates Activity Log events.

The Activity Log Security Board Template includes queries that track API Key activity:

Query Name Query Description Required Fields
API Key Added Permissions Shows when permissions are added to an existing API key.
  • resource.type
  • resource.changed_fields
  • environment.slug
API Key Activities by User Displays the number of changes to API keys broken down by user.
  • key_type
  • environment.slug
  • user.email
  • resource.action
Authentication Type by User Displays which type of authentication is used for each user.
  • authentication_method
  • user.email

Activity Log Leaderboard 

Tip
Honeycomb automatically creates the required fields for the Activity Log Board Templates when it generates Activity Log events.

The Activity Log Leaderboard Board Template includes queries that highlight advanced and frequent usage of Honeycomb by your team:

Query Name Query Description Required Fields
Queries by User Shows which environments are being queried.
  • resource.type
  • user.email
Complex Queries by User Shows which users frequently use Visualize, Where, and Having clauses.
  • resource.type
  • SUM( IF(EXISTS($query.having), 3, 0), REG_COUNT($query.where, `,`), REG_COUNT($query.visualize, `,`))
  • user.email
Top Query Visualizations Shows the most commonly used visualizations.
  • resource.type
  • SUM( IF(EXISTS($query.having), 3, 0), REG_COUNT($query.where, `,`), REG_COUNT($query.visualize, `,`))
  • query.visualize
Top Tinkerers Lists which users perform the most updates to SLOs, Triggers, and Calculated Fields.
  • resource.type
  • user.email
Queries by Dataset Shows which datasets are being queried the most.
  • resource.type
  • environment.slug
  • dataset.slug
Queries by Environment Shows a count of run queries as grouped by environment.
  • resource.type
  • environment.slug

Activity Log Trigger and SLO Activity 

Tip
Honeycomb automatically creates the required fields for the Activity Log Board Templates when it generates Activity Log events.

The Activity Log Trigger and SLO Activity Board Template includes queries related to trigger and SLO activations and modifications:

Query Name Query Description Required Fields
Trigger State Changes Shows instances when triggers have been triggered or resolved.
  • resource.type
  • resource.action
  • name
Trigger Modifications Shows creations, modifications, and deletions of triggers.
  • resource.type
  • resource.action
Most Updated Triggers Shows triggers that received the most changes recently.
  • resource.type
  • resource.action
  • name
Top Updated SLOs by Update Type Shows creations, modifications, and deletions of SLOs and the supporting SLI (Calculated Field).
  • resource.type
  • resource.action
  • environment.slug
  • resource.changed_fields
  • name
  • user.email
SLOs Created and Deleted Shows creation and deletion of SLOs.
  • resource.type
  • resource.action
  • environment.slug
  • name
  • resource.changed_fields
  • user.email
SLI Expression Changes by SLO Shows when SLIs (Calculated Fields) related to SLOs have been changed.
  • resource.type
  • resource.action
  • resource.changed_fields
  • environment.slug
  • name
  • sli.expression
  • before.sli.expression
  • user.email

Troubleshooting 

To explore common issues when working with Board Templates, visit Common Issues with Visualization: Board Templates.