Get instant insights into your system with Board Templates.
This functionality is available only for teams using Honeycomb’s current data model.
If you use Honeycomb Classic, we recommend migrating to Honeycomb Environments, so you can take advantage of its expanded data model and future product updates.
What is a Board Template?
Board Templates are pre-configured Boards that come with ready-made queries and visualizations, providing valuable insights with minimal set up.
Use a template as starting point to create a Board.
Templates are designed for specific use cases and built around industry best practices, ensuring effective configurations for tracking key metrics and visualizing data accurately.
Board Templates At a Glance
Choose from a variety of templates to quickly gain insights across different areas of your system:
- General:
- Service Health: Insight into service health, including request volumes and where slowest requests occur.
- Airflow: Overview of data workflow performance. Monitoring Airflow operations can highlight problems which may occur in the process of running data pipelines.
- Kafka: Insight into Kafka brokers, topics, partition, and consumers.
- Linux Host: Useful queries for monitoring Linux hosts, including CPU, memory, disk, filesystem, and network utilization on the configured hosts.
- Spring Boot: Insight into application health and performance metrics for your Spring Boot microservices.
- Django: Insight into application heath and performance metrics for your Django application.
- Rails: Queries to help investigate the performance and health of your Rails application.
- RabbitMQ: Visualizations for core RabbitMQ metrics and client signals.
- My Services: Application Performance Monitoring (APM) metrics for a variety of services and frameworks.
- Data Stores
- MySQL Operations: Insight into MySQL database operations, including thread count by type, query rate, resource usage, and row/table locks.
- Redis: Insight into Redis primary and replica nodes, including command activity, latency/volume and execution time, expired keys, and CPU consumption.
- Postgres: Insight into Postgres’s operations, including active connections, database size, table count, and transaction throughput.
- MongoDB: Metrics-driven visualizations for monitoring MongoDB nodes.
- SQL Server: Useful metrics for monitoring SQL Server database operations.
- Frontend Investigation
- Real User Monitoring (RUM): Real user monitoring data for frontend applications, including performance and user experience insights.
- Android Auto-Instrumentation: Auto-instrumentation data for Android applications provided by the Honeycomb OpenTelemetry Android SDK.
- iOS Auto-Instrumentation: Auto-instrumentation data for iOS applications provided by the Honeycomb OpenTelemetry Swift SDK.
- Kubernetes:
- Kubernetes Pod Metrics: Queries and visualizations that help you investigate pod performance and resource usage within Kubernetes clusters.
- Kubernetes Node Metrics: Queries and visualizations that help you investigate node performance and resource usage within Kubernetes clusters.
- Kubernetes Workload Health: Queries and visualizations that help you investigate application problems related to Kubernetes workloads.
- OpenTelemetry:
- OpenTelemetry Collector Operations: Metrics emitted by the OpenTelemetry Collector during operation.
- OpenTelemetry Java Metrics: Insights into Java Virtual Machine (JVM) health and performance via metrics reported by OpenTelemetry Java Agent or Honeycomb OpenTelemetry Distribution for Java.
- Amazon Web Services (AWS):
- AWS Lambda Health: Information about AWS Lambda function health, including invocations, errors, throttles, and concurrency.
- EC2 Health: Information about AWS EC2 instance, status failures, and EBS read/write operations.
- ALB/ELB Health: Information about AWS Load Balancers, including Load Balancer’s health, status codes, active connections, and requests.
- SQS: Insight into critical AWS SQS operations.
- RDS: Insight to monitor and optimize performance for AWS RDS databases.
- Artificial Intelligence:
- Anthropic Usage & Cost Monitoring: Comprehensive insights into Anthropic API usage and costs, including token consumption, feature usage, and cost attribution across models, workspaces, and API keys.
- Honeycomb Features:
- Refinery Operations: Overview of sampling operations, including trace throughput and sampling statistics.
Automatically populated by Refinery metrics sent to Honeycomb.
- Activity Log Security: Queries showing API Key activity.
- Activity Log Leaderboard: Queries showing advanced and frequent Honeycomb usage by your team.
- Activity Log Trigger and SLO Activity: Queries related to trigger and SLO activations and modifications.
General
Service Health
The Service Health Board Template offers an overview of your services’ health.
It provides insights into request volumes, identifies where the slowest requests are occurring, and more.
This template relies on your source data fields being mapped to Honeycomb standard fields.
To learn how to map your fields, visit Dataset Definitions.
| Query Name | Query Description | Required Fields |
|---|
| Trace Counts by Service | Shows total trace volume by service. | - Parent span ID or
trace.parent_id - Service name or
service.name or service_name
|
| Trace Counts by HTTP Status Code | Shows total trace volume by status code. | - Parent span ID or
trace.parent_id - HTTP Status Code or
http.response.status.code or http.status_code
|
| Trace Duration Heatmap | Shows a heatmap of the duration for all traces. | - Span duration or
duration_ms - Parent span ID or
trace.parent_id
|
| Duration Heatmap | Shows a heatmap of duration across all services. | - Span duration or
duration_ms
|
| Duration by Service | Shows key duration percentiles by service. | - Span duration or
duration_ms - Service name or
service.name or service_name
|
| Duration by Route | Shows duration by route or endpoint. | - Span duration or
duration_ms - Route or
http.route
|
| Duration by Name | Shows duration by function name. | - Span duration or
duration_ms - Name or
name
|
| Errors by Service | Shows a count of errors grouped by service. | - Error or
error - Service name or
service.name or service_name
|
| Errors by Route | Shows a count of errors grouped by route or endpoint. | - Error or
error - Route or
http.route
|
Airflow
The Airflow Board Template gives an overview of data workflow performance.
Monitoring Airflow operations can highlight problems which may occur in the process of running data pipelines.
| Query Name | Query Description | Required Fields |
|---|
| DAG Processing Import Errors | Shows the sum of the number of errors from trying to parse DAG files by host.name. Parsing errors prevent DAGs from being loaded. Tracking these errors helps identify configuration or syntax issues that need immediate attention. | airflow.dag_processing.import_errorshost.name
|
| DAG Processing Import Errors by File Path | Shows the sum of the number of errors during import and parse of DAG files, broken out by DAG File Path and host.name. Tracking these errors helps identify configuration or syntax issues with a given file or host. | host.nameimport_errorsfile_path
|
| Duration of Tasks (AVG, P95) | Shows the average and P95 duration of a Task by DAG ID, task ID, and host.name. Execution time helps identify which specific tasks are performance bottlenecks, allowing you to optimize your workflows. Note: Uses trace signal type. | host.namemeta.signal_typeduration_mstask_iddag_id
|
| DAG Failed Duration (AVG) | Shows the average duration in milliseconds (ms) taken for a DagRun to reach a failed state by DAG ID and host.name. Failed DAG runs consume valuable resources. Monitoring this metric helps to identify inefficient failure patterns. | dag_idhost.nameairflow.dagrun.duration.failed
|
| DAG Success Duration (AVG) | Shows the average duration in milliseconds (ms) for a DagRun to reach success state by DAG ID and host.name. Monitoring duration allows you to optimize resource allocation and set appropriate SLAs. | airflow.dagrun.duration.successdag_idhost.name
|
| Task Counts | Shows the count of Tasks grouped by DAG ID, task ID, host.name, and state. Use the overall workflow health and the proportion of tasks experiencing issues to highlight potential issues with Airflow operations. Note: Uses trace signal type. | host.namestatedag_idtask_id
|
| DAG Schedule Delay | Shows the average duration in milliseconds (ms) of delay between the scheduled DagRun start date and the actual DagRun start date, grouped by DAG ID and host.name. Use to identify scheduler bottlenecks, resource constraints, or overloaded Airflow instances that prevents timely workflow execution. | dag_idhost.nameairflow.dagrun.schedule_delay
|
| Scheduler Tasks | Shows the sum of Airflow Scheduler Tasks that are executing or starving by host ID. Use to understand scheduler load, identify periods when the scheduler might be overwhelmed with too many tasks, and ensure task distribution works as expected. | host.nameairflow.scheduler.tasks.executableairflow.scheduler.tasks.starving
|
| Executor Tasks | Shows the maximum count of Executor Tasks (queued, running and open slots), grouped by host.name. Note that Queued reflects the number of queued tasks on executor, Running reflects the number of running tasks on executor, and Open Slots reflects the number of open slots on executor. | executor.open_slotshost.nameexecutor.queued_tasksexecutor.running_tasks
|
| Pool Task Slots by Host | Shows the maximum count of Airflow Pool Slots - Deferred, Queued, Open, Running, Starving and Scheduled by Host. Can be used to monitor resource allocation, identify when pools are at capacity, and optimize your configuration to match your workflow needs. | airflow.pool.open_slotsairflow.pool.running_slotsairflow.pool.starving_taskshost.namepool_nameairflow.pool.queued_slotsairflow.pool.scheduled_slotsairflow.pool.deferred_slots
|
Kafka
The Kafka Board Template provides insight into Kafka brokers, topics, partition, and consumers.
This template relies on the Kafka Metrics receiver provided by the OpenTelemetry Collector Contrib distribution.
To learn how to set up this receiver, visit Kafka metrics receiver documentation in the OpenTelemetry Collector Contrib repo.
To receive relevant Java Virtual Machine (JVM) metrics, include the OpenTelemetry Java Agent in Kafka nodes as well.
| Query Name | Query Description | Required Fields |
|---|
| Number of Active Brokers | Shows the number of active brokers. | |
| Consumer Group Membership | Shows the number of consumers per broker. | grouphost.namekafka.consumer_group.members
|
| Consumer Progress Lag vs Offset Rate | Shows the average rate of Kafka consumer group lag and offsets over time, grouped by topic partitions. Use to monitor consumer progress and to detect delays by comparing offset increases to lag. | host.namekafka.consumer_group.lagkafka.consumer_group.offsettopicgroup
|
| Partition Offset Overview | Shows the rate of change in the oldest and current offsets across Kafka partitions. | kafka.partition.current_offsettopichost.namekafka.partition.oldest_offset
|
| Partition Count By Topic | Shows the number of partitions for each topic. Use for capacity planning an ensuring proper topic configuration. | topichost.namekafka.partition.current_offsetpartition
|
| Partition Replication Health | Shows the number of in-sync replicas for each partition compared to total replicas. Use to identify under-replicated partitions. | kafka.partition.replicas_in_synckafka.partition.replicastopicpartitionhost.name
|
| Consumer Group Lag by Topic | Shows total lag across all partitions for each consumer group and topic combination. | grouptopickafka.consumer_group.lag_sum
|
| Partition Balance Analysis | Shows distribution of offsets across partitions for each topic. Use to identify potential partition imbalances. | kafka.partition.current_offsettopicpartition
|
| High Consumer Lag | Shows high consumer group lag, which may indicate potential consumer issues. | grouptopichost.namekafka.consumer_group.lag_sum
|
| Message Throughput | Shows the approximate message throughput for each topic by measuring the rate of change in offset over time. | kafka.partition.current_offsettopichost.name
|
| JVM Thread Count by Cluster and State | Shows the total JVM thread count across Kafka clusters, grouped by thread state. Use to identify thread contention or resource leaks. | host.namejvm.thread.countkafka.cluster.aliasjvm.thread.stateservice.nameservice.instance.idjvm.thread.daemon
|
| JVM Garbage Collection Durations | Shows the median JVM and the P90 garbage collection durations. Use to understand garbage collection efficiency and memory management health. | jvm.gc.duration.p50jvm.gc.duration.p90kafka.cluster.aliasservice.namejvm.gc.actionjvm.gc.namehost.name
|
| Max Recent JVM CPU Utilization | Shows the highest CPU utilization within the JVM at a default 30 minute window. Use to identify potential load spikes or bottlenecks that may affect your cluster. | kafka.cluster.aliasservice.namehost.namejvm.cpu.recent_utilization
|
| JVM Memory Usage and Commitment | Shows memory usage patterns in clusters, providing a view in how memory is used and committed in the JVM. Use to track inefficient memory usage. | jvm.memory.usedjvm.memory.committedkafka.cluster.aliasjvm.memory.typejvm.memory.pool.namehost.name
|
Linux Host
The Linux Host Board Template provides useful queries for monitoring Linux hosts.
It provides insights into CPU, memory, disk, filesystem, and network utilization on the configured hosts.
This template uses the Host Metrics receiver provided by the OpenTelemetry Collector Contrib distribution.
To learn how to set up this receiver, visit the Host Metrics receiver documentation in the OpenTelemetry Collector Contrib repo.When configuring the hostmetrics receiver for this Board Template, include these scrapers:
- CPU
- Disk
- Load
- Filesystem
- Memory
- Network
- Paging
- Processes
- Process
| Query Name | Query Description | Required Fields |
|---|
| Process CPU Time Breakdown | Shows the total CPU time consumed by different processes, broken down by process owner and command. Use to identify which processes are consuming the most CPU resources over time. | process.ownerprocess.executable.nameos.typeprocess.cpu.timehost.name
|
| Memory Consumption Trends | Shows the average memory usage across host, operating system, and state. Use to monitor and diagnose system memory usage trends. | stateos.typesystem.memory.usagehost.name
|
| CPU Utilization Trends | Shows the distribution of CPU time spent on user processes, system operations, and idle time. Use to identify which hosts are under load. | os.typesystem.cpu.time.usersystem.cpu.time.systemsystem.cpu.time.idlehost.name
|
| Disk I/O | Shows the active Disk input and output based on device. Use to identify high read/write rates. | system.disk.io.writehost.namedeviceos.typesystem.disk.io.read
|
| Memory Usage by Process | Shows Linux processes by memory usage and virtual memory consumption. Use to troubleshoot resource bottlenecks and optimize memory allocation. | os.typeprocess.memory.usageprocess.memory.virtualhost.nameprocess.commandprocess.owner
|
| Filesystem Usage | Shows filesystem usage across different mount points, devices, and modes. Use for capacity planning and troubleshooting storage issues. | host.namedevicemountpointmodeos.typesystem.filesystem.usage.used
|
| Network Metrics | Shows network operations per network interface. | system.network.io.receivesystem.network.io.transmithost.namedeviceos.type
|
Spring Boot
The Spring Boot Board Template provides insight into application health and performance metrics for Spring Boot microservices.
| Query Name | Query Description | Required Fields |
|---|
| Database Usage | Shows database performance metrics. Use to help identify slow-performing queries and connection issues. | db.client.connections.use_time.avgdb.client.connections.wait_time.avghost.nametelemetry.sdk.language
|
| API Endpoint Latency | Shows a heatmap of API endpoint response times. Use to highlight bottlenecks or anomalies in performance. | http.server.request.duration.avghttp.routehttp.response.status_codehttp.request.methodhost.nametelemetry.sdk.language
|
| Garbage Collection Performance Monitor | Shows maximum, average, and P95 duration of garbage collection metrics. Use to identify memory allocation patterns that causes application slow down. | jvm.gc.duration.avgjvm.gc.duration.p95jvm.gc.actionhost.nametelemetry.sdk.languagejvm.gc.duration.max
|
| Request Per Minute | Shows requests made per minute. Use to observe the traffic patterns and to detect unexpected load or errors. | host.nametelemetry.sdk.languagehttp.routehttp.request.methodhttp.response.status_code
|
| Heap used vs Heap Max Limit | Shows the JVM memory matrix and compares current memory usage against maximum heap limit. Use to identify out of memory errors. | jvm.memory.usedjvm.memory.limithost.nametelemetry.sdk.language
|
| API Errors | Shows error responses with status code >= 400. Use to monitor API health. | http.routehost.nametelemetry.sdk.languagehttp.response.status_code
|
| Response Size Distribution | Shows response payload size. Use to monitor data transfer efficiency, and to identify any unexpectedly large response. | http.request.methodhttp.response.status_codehost.nametelemetry.sdk.languagehttp.response.body.sizehttp.route
|
| JVM CPU Time Rate | Shows CPU consumption rate metrics. Use to identify processing-intensive operations and to detect performance decline overtime. | jvm.cpu.timehost.nametelemetry.sdk.languagemeta.signal_type
|
Django
The Django Board Template provides insight into application heath and performance metrics for a Django application.
| Query Name | Query Description | Required Fields |
|---|
| Request Count Per Minute | Shows requests made per minute. Use to observe the traffic patterns and to detect unexpected load or errors. | telemetry.sdk.languagehttp.hosthttp.routehttp.methodhttp.status_codehttp.server_name
|
| HTTP Response Duration | Shows the P95 response duration by route, status code and server name. Highlights Django HTTP performance. | http.routehttp.methodhttp.status_codehttp.server_nametelemetry.sdk.languagehttp.response.body.sizeduration_ms
|
| HTTP Errors | Shows the count of HTTP errors by route, status code, and host.name. Use to assess the success and error rate of APIs. | http.status_codehttp.server_nameerrortelemetry.sdk.languagehttp.routehttp.method
|
| Exceptions | Shows exceptions thrown in the service. Use to access overall health of the application. | http.server_nameexception.typecode.namespaceexception.messageexception.stacktracetelemetry.sdk.language
|
| AVG and P95 Request Size | Shows the average and P95 HTTP request size to monitor payload efficiency. | http.server_nametelemetry.sdk.languagehttp.request.body.sizehttp.routehttp.methodhttp.status_code
|
| AVG and P95 Response Size | Shows the average and P95 HTTP response size to monitor payload efficiency. | telemetry.sdk.languagehttp.response.body.sizehttp.routehttp.methodhttp.status_codehttp.server_name
|
| P95 and Heatmap of Job Duration | Shows the P95 and Heatmap of Job Duration by messaging destination, messaging system, and server name. Provides insights into status async job runners. | http.server_nametelemetry.sdk.languageduration_msmessaging.destinationmessaging.system
|
| Jobs Executed | Shows the count of root traces with messaging system and destination. Can be used to assess overall performance of the async job operations. | http.server_namemessaging.destinationmessaging.systemtelemetry.sdk.languagemessaging.destination_kind
|
| DB connection Count Per Min | Shows the connection count per minute where db connection event is “open”. Helps gain visibility into connection pooling efficiency. | telemetry.sdk.languagedb.operationdb.systemdb.namedb.connection.event
|
Rails
The Rails Board Template gives you visibility into Rails behavior, performance, and health.
The queries and visualizations help identify slow database queries, inefficient code paths, and other performance bottlenecks.
| Query Name | Query Description | Required Fields |
|---|
| Requests Served | Shows count of requests served by Rails by host.name. Use to provide an overview of traffic volume at a glance. | host.nametelemetry.sdk.languagehttp.route
|
| HTTP Response Duration | Shows P95 response duration by route, controller namespace, controller function, status code, and host.name. Use for Rails HTTP performance. | duration_mshttp.routecode.namespacecode.functionhttp.status_codehost.nametelemetry.sdk.language
|
| HTTP Duration Heatmap | Shows a heatmap of HTTP response duration by route, status code and host.name. Use to assess and investigate outliers. | http.status_codehost.nametelemetry.sdk.languageduration_mshttp.route
|
| HTTP Errors | Shows count of HTTP errors by route, Controller namespace, status code, and host.name. Use to assess success and error rate of Rails web endpoints. | errorhttp.routecode.namespacehttp.status_codehost.nametelemetry.sdk.language
|
| DB Statement Duration | Shows a heatmap and the P95 of database duration per database name, operation, statement and host.name. A heatmap provides more information to help identify outlier DB statements. | duration_msdb.namedb.operationdb.statementtelemetry.sdk.language
|
| P95 and Heatmap of Job Duration | Shows P95 and a heatmap of Job Duration by messaging destination, messaging system, service name, and host.name. Provides insights into status of Rails async job runners, such as ActiveJob and Sidekiq. | duration_msmessaging.destinationmessaging.systemservice.namehost.nametelemetry.sdk.language
|
| Exceptions | Shows exceptions thrown by type, code namespace, and host.name. Use to assess overall health of your Rails application. | code.namespacehost.nametelemetry.sdk.languageerrorexception.messageexception.type
|
| Jobs Executed | Shows count of root traces with messaging system and destination. Use to assess overall performance of Rails async job operations. | telemetry.sdk.languagehost.namemessaging.systemmessaging.destination
|
RabbitMQ
The RabbitMQ Board contains visualizations for core RabbitMQ metrics and client signals.
This Board uses the RabbitMQ receiver provided by the opentelemetry-collector-contrib distribution.
To learn how to set up this receiver, visit the RabbitMQ documentation in OpenTelemetry’s Collector Contrib repo.
When configuring RabbitMQ, enable the management plugin to use the receiver.By default, the RabbitMQ receiver disables several key metrics for resource and connectivity utilization.
To learn more, visit the RabbitMQ metrics documentation in OpenTelemetry’s Collector Contrib repo.
| Query Name | Description | Required Fields |
|---|
| Message Stats | Visualizes the number of messages published to the number of current messages on queues, per node. | host.namerabbitmq.message.currentrabbitmq.message.publishedrabbitmq.node.namerabbitmq.vhost.name
|
| Connectivity Profile | Visualizes the number of channels created over the number of channels closed, per node. Helpful for identifying channel leaks, potential resource exhaustion, or other connectivity issues. | host.namerabbitmq.node.channel.closedrabbitmq.node.channel.createdrabbitmq.node.connection.closedrabbitmq.node.connection.createdrabbitmq.node.name
|
| File Descriptor Utilization | Visualizes File Descriptors (FDs). Useful for identifying resource limitations. | host.namerabbitmq.node.fd.totalrabbitmq.node.fd.usedrabbitmq.node.name
|
| Consumer Activity | Visualizes the number of consumers attached to each queue. | host.namerabbitmq.consumer.count, rabbitmq.node.namerabbitmq.queue.name
|
| System Resource Pressure | Visualizes core metrics for system resources, including memory and file descriptor utilization. | host.namerabbitmq.node.fd.totalrabbitmq.node.fd.usedrabbitmq.node.mem.limitrabbitmq.node.name
|
| Queue Health | Visualizes queue lengths and counts to catch congestion. | host.namerabbitmq.message.currentrabbitmq.node.namerabbitmq.queue.name
|
My Services
The My Services template provides Application Performance Monitoring (APM) metrics for a variety of services and frameworks.
This template relies solely on semantic conventions and traces to provide a general overview of APM for HTTP-driven services.
It should work with a variety of frameworks, languages, and runtimes.To learn how to generate telemetry data for this Board, visit Honeycomb OpenTelemetry documentation.
| Query Name | Description | Required Fields |
|---|
| Total Requests | Visualizes the number of requests for a service. | |
| Request Distribution | Visualizes the number of requests by status code. | http.routehttp.status_codeservice.name
|
| P95 Request Latency | Visualizes latency excluding the slowest 5% of responses. | duration_mshttp.routeservice.name
|
| Average Latency | Visualizes the average latency per endpoint. | duration_mshttp.routeservice.name
|
| Error Trend | Visualizes the number of errors by route and status code. | errorhttp.routehttp.status_codeservice.name
|
| Successful Response Counts - 2xx | Visualizes all requests in the 2xx range. | http.status_codeservice.name
|
| Client Error Response Counts - 4xx | Visualizes 4xx HTTP status codes. | http.status_codeservice.name
|
| Server Error Response Counts - 5xx | Visualizes HTTP responses in the 5xx range. | http.status_codeservice.name
|
| Errors | Visualizes the number of errors emitted over the selected time frame. | errorhttp.routehttp.status_codehttp.status_textservice.name
|
| Redirection Response Count - 3xx | Visualizes HTTP status codes in the 3xx range. | http.status_codeservice.name
|
| Duration | Visualizes request duration in a heatmap. | |
Data Stores
MySQL Operations
The MySQL Board Template provides insights into MySQL database operations, including thread count by type, query rate, resource usage, and row/table locks.
This template relies on the MySQL metrics receiver provided by the OpenTelemetry Collector Contrib distribution.
To learn how to set up this receiver, visit MySQL Receiver documentation in the OpenTelemetry Collector Contrib repo.
| Query Name | Query Description | Required Fields |
|---|
| Server Status | Shows server uptime. Use to track server restarts. | mysql.uptimemysql.instance.endpoint
|
| Buffer Pool Pages | Shows the number of pages in the InnoDB buffer pool by type. Use to understand buffer pool utilization. | mysql.instance.endpointkindmysql.buffer_pool.pages
|
| Buffer Pool Data Pages | Shows the number of data pages in the InnoDB buffer pool by status (clean or dirty). Use to track page writes to disk. | mysql.buffer_pool.data_pagesmysql.instance.endpointstatus
|
| Buffer Pool Page Flushes | Shows the rate of page flush requests from the InnoDB buffer pool. Use to help identify input/output pressure. | mysql.instance.endpointmysql.buffer_pool.page_flushes
|
| Buffer Pool Operations | Shows buffer pool operations by type. Use to identify patterns in buffer pool usage. | mysql.instance.endpointoperationmysql.buffer_pool.operations
|
| Row and Page Operations | Shows the rate of InnoDB row and page operations. Use to provide insight into database workload and input/output patterns. | mysql.row_operationsmysql.page_operationsmysql.instance.endpointoperation
|
| Doublewrite Rate | Shows the rate of writes to the InnoDB doublewrite buffer. Use to understanding database durability. | kindmysql.double_writesmysql.instance.endpoint
|
| Handler Requests and Thread Status | Shows the rate of requests to various handlers and the state of system threads. Provides insight into how the database is processing queries and allows monitoring of connection usage and thread efficiency. | mysql.handlersmysql.threadsmysql.instance.endpointkind
|
| Row and Table Locks | Shows InnoDB lock statistics, and MySQL Table locks. Use to help identify lock contention. | mysql.row_locksmysql.instance.endpointkindmysql.locks
|
| Resource Usage | Shows the rate of opened resources and temporary resources. Use to help identify resource utilization, and the usage of temporary tables or files. | mysql.tmp_resourcesmysql.instance.endpointresourcemysql.opened_resources
|
| Query Rate | Shows query throughput and slow query rates across MySQL instances. Use to pinpoint instances with the highest query load. | mysql.query.countmysql.query.slow.countmysql.instance.endpoint
|
| Thread Count by Type | Shows thread count by type. Use to indicate operations currently being performed by the set of threads executing within the server. | kindmysql.threadsmysql.instance.endpoint
|
| Table Open Cache Efficiency | Shows Table Cache Efficiency. Use to monitor filesystem input/output within the instances. | mysql.table_open_cachemysql.instance.endpointstatus
|
Redis
The Redis Board Template provides insights into Redis primary and replica nodes, including command activity, latency/volume and execution time, expired keys, and CPU consumption.
This template uses the Redis receiver provided by the OpenTelemetry Collector Contrib distribution.
To learn how to set up this receiver, visit the Redis receiver documentation in the OpenTelemetry Collector Contrib repo.The Redis receiver does not automatically publish some key server attributes, like address or port.
The visualizations on this Board Template use server address to ensure that visualizing across multiple Redis instances is possible.
| Query Name | Query Description | Required Fields |
|---|
| Cache Connections | Shows connections received and rejected per server. Use to diagnose connectivity issues. | redis.connections.receivedredis.connections.rejectedserver.address
|
| Uptime | Shows the number of seconds since a server start by server. | server.addressredis.uptime
|
| Server Durability | Shows the number of write operations that have happened since the last successful RDB snapshot. Use to track durability issues per server. | redis.rdb.changes_since_last_saveserver.address
|
| Key Count | Shows the number of keys per database and per server. | redis.db.keysserver.addressdb
|
| Server CPU Time | Shows the CPU consumed by Redis server since server start. | server.addressredis.cpu.time
|
| Client Activity | Shows Redis client activity per server address and activity between connected and blocked clients. | redis.clients.connectedredis.clients.blockedserver.addressredis.version
|
| Command Activity | Shows the number of commands processed per second and the number of commands processed by the server. Use to track operational load of servers. | redis.commands.processedredis.commandsserver.address
|
| Client I/O | Shows the input/output buffers of Redis clients by server. Use to diagnose or troubleshoot input/output issues with clients. | redis.clients.max_input_bufferredis.clients.max_output_bufferserver.address
|
| Network Activity | Shows network input/output by server. | redis.net.outputserver.addressredis.net.input
|
| P99 Command Latency | Shows the P99 of command latency. Use to identify anomalous commands. | redis.cmd.latencycmdserver.addresspercentile
|
| Command Volume and Execution Time | Shows the number of calls for a command and the total time for all executions of a command per server. | redis.cmd.callsredis.cmd.usecserver.addresscmd
|
| Average Command Latency | Shows the average latency of commands by server. Use to understand the baseline latency of a command. | percentileredis.cmd.latencyserver.addresscmd
|
| Expired Keys | Shows the total number of key expiration events per server. | redis.keys.expiredserver.address
|
| Keyspace Hits and Misses | Shows the number of successful and failed key lookups per server. | redis.keyspace.hitsredis.keyspace.missesserver.address
|
| Memory Profile | Shows memory metrics per server. | redis.memory.peakredis.memory.fragmentation_ratioredis.memory.rssredis.memory.luaserver.addressredis.memory.used
|
| Primary Replication | Shows the replication offsets per server. | redis.replication.offsetredis.replication.backlog_first_byte_offsetserver.addressredis.slaves.connected
|
| Follower Replication | Shows the replication offset for follower instances. | redis.replication.replica_offsetserver.addressredis.slaves.connected
|
Postgres
The Postgres Board Template provides insight into Postgres’s operations, including active connections, database size, table count, and transaction throughput.
| Query Name | Query Description | Required Fields |
|---|
| Active Connections | Shows the current number of active connections. | host.namepostgresql.backendspostgresql.connection.max
|
| Database Size | Shows the database size over time. Use to help with capacity planning and identifying unexpected growth patterns. | postgresql.db_sizepostgresql.database.namehost.name
|
| Database and Table Count | Shows visibility into number of databases and tables, which can identify database sprawl. | postgresql.table.countpostgresql.database.namehost.namepostgresql.database.count
|
| Transaction Throughput | Shows the rate of commits and rollbacks per database, which provides insight into transaction throughput and success rates. | postgresql.commitspostgresql.rollbackspostgresql.database.namehost.name
|
| Block Read Performance | Shows the the sources of block reads and their rates. Use to diagnose input/output performance issues. | postgresql.blocks_readsourcepostgresql.database.namepostgresql.table.namehost.name
|
| Index Usage | Shows the rate of index scans. Use to identify frequently used indexes. | postgresql.index.namehost.namepostgresql.index.scanspostgresql.table.name
|
| Database Operations | Shows database operations. Use to provide insight into workload patterns. | postgresql.operationsoperationpostgresql.table.namepostgresql.database.namehost.name
|
| Background Writer Activity | Shows buffer writes by source. Use to identify potential input/output bottlenecks. | sourcehost.namepostgresql.bgwriter.buffers.writes
|
| Checkpoint Frequency | Shows the rate of checkpoints by type (requested versus scheduled), which can help identify if checkpoints are occurring too frequently. | host.namepostgresql.bgwriter.checkpoint.counttype
|
| Checkpoint Duration | Shows time spent on checkpoint operations across databases and tables. Longer checkpoint durations can negatively impact database performance. | postgresql.bgwriter.durationhost.nametype
|
| Table Size | Shows the top 10 largest tables, which may identify tables that require optimization or partitioning. | postgresql.table.sizepostgresql.table.name
|
| Index Size | Shows the top 10 largest indexes, which may identify indexes that need rebuilding or optimization. | postgresql.database.namepostgresql.table.namehost.namepostgresql.index.sizepostgresql.index.name
|
| Cache Hit Ratio | Shows the sum of block reads satisfied from the buffer cache. A higher number indicates better performance. | postgresql.blocks_readpostgresql.database.namepostgresql.table.namehost.namesource
|
| Replication WAL Delay | Shows time between flushing recent WAL and notification standby servers have completed operation on it. Use to track replication delays. | host.namepostgresql.wal.delayreplication_client
|
| Replication Data Delay | Shows the amount of data delayed in replication, which can help identify network or performance issues affecting replication. | postgresql.replication.data_delayreplication_clienthost.name
|
| Database Locks by Type | Shows the maximum number of database locks per type. Use for situations where multiple concurrent transactions may cause resource contention. | host.namepostgresql.database.locksmodelock_type
|
| Postgres Memory Utilization | Shows memory usage and amount of committed memory for postgres processes. Use to identify inefficient processes. | process.memory.usageprocess.memory.virtualprocess.commandprocess.executable.namehost.name
|
| Postgres CPU Utilization Trends | Shows CPU utilization for PostgreSQL processes. Use to identify inefficient queries, excessive index scanning, and so on. | process.cpu.timeprocess.commandhost.name
|
| Number of Postgres Operations | Shows the number of PostgreSQL operations per database and table name. | postgresql.table.nameoperationhost.namepostgresql.operationspostgresql.database.name
|
MongoDB
The MongoDB template contains metrics-driven visualizations for monitoring MongoDB nodes.
This Board leverages metrics collected via the MongoDB receiver provided by the OpenTelemetry Collector Contrib distribution.
The MongoDB receiver enables observability into key performance, resource utilization, and replication metrics for MongoDB clusters and nodes.
To configure this receiver, visit MongoDB receiver documentation in the OpenTelemetry Collector Contrib repo.
| Query Name | Description | Required Fields |
|---|
| Server Health | Shows health status by server. | mongod.statusmongodb.server.name
|
| Count of Active Connections | Shows current active connections. Useful for identifying leaks, connection saturation, or performance bottlenecks. | mongodb.connections.currentmongodb.server.namehost.name
|
| Available Connections | Visualizes the number of available connections. | mongodb.connections.availablemongodb.server.namehost.name
|
| Network I/O | Visualizes bytes received and transmitted per server. | mongodb.network.bytes.inmongodb.network.bytes.outmongodb.server.name
|
| Database Count | Visualizes the number of databases per host. | mongodb.database.counthost.name
|
| Collections | Shows the number of collections per server and database. | mongodb.collection.countmongodb.database.namehost.name
|
| Cache Hit Ratio | Displays cache hits and misses. | mongodb.cache.hitsmongodb.cache.misseshost.name
|
| Document Operations | Visualizes document operations by server and database. | mongodb.document.operations.ratemongodb.server.namehost.name
|
| Memory Usage by Type | Profiles memory usage by type. Useful for identifying low query performance or high latency due to memory utilization. High read usage coupled with low write usage generally indicates a healthy memory profile. | mongodb.memory.usagemongodb.server.namehost.name
|
| Index Utilization | Tracks how indexes are being accessed across different collections. | mongodb.index.accessesmongodb.server.namehost.name
|
| Read Write Operations | Shows the number of reads and writes currently being processed. | mongodb.operations.readsmongodb.operations.writesmongodb.server.name
|
| Database Locks | Visualizes locks and lock types per database. | mongodb.locksmongodb.lock.time_msmongodb.database.namehost.name
|
| Activity Overview | Visualizes DB activity by server. Useful for identifying read vs. load and throughput. | mongodb.commands.countmongodb.server.namemongodb.operations.insertsmongodb.operations.updatesmongodb.operations.deleteshost.name
|
| Replication Overview | Visualizes replication operations per server. | mongodb.replication.oplog.insert.countmongodb.replication.oplog.update.countmongodb.replication.oplog.delete.countmongodb.server.name
|
SQL Server
The SQL Server Board template contains useful metrics for monitoring SQL Server database operations.
This template leverages metrics gathered primarily by the SQL Server Receiver provided by the
opentelemetry-collector-contrib distribution.
The SQL Server receiver provides insights into query execution, connection states, memory usage, and throughput across SQL Server instances and databases.
To learn how to set up this receiver, visit the SQL Server Receiver documentation in the OpenTelemetry Collector Contrib repo.
| Query Name | Description | Required Fields |
|---|
| Batch Requests Rate | Shows total request rate (per second). Useful for diagnosing busy or idle instances. | host.namesqlserver.batch_requests.rate
|
| Lock Await Rate | Shows the total rate of locks requests resulting in a wait. | host.namesqlserver.locks.await.rate
|
| Buffer Efficiency | Shows buffer efficiency of cache lookups without having to read from disk. Drops in this value indicate inefficient queries. | host.namesqlserver.buffer.page.lookups.ratesqlserver.buffer.page.reads.rate
|
| Query Plan Activity Rate | Visualizes the rate at which SQL Server generates new query execution plans when other existing plans are discarded and regenerated. | host.namesqlserver.batch.sql_compilations.ratesqlserver.batch.sql_recompilations.rate
|
| Read I/O Throughput | Visualizes the total I/O throughput per host. Useful for identifying I/O operations or slow queries. | host.namesqlserver.io.read.bytessqlserver.io.read.operations.rate
|
| Active Connections | Visualizes active user connections. | host.namesqlserver.connection.count
|
| Memory Utilization | Visualizes the average amount of memory utilized per host, in KiB. | host.namesqlserver.memory.usage
|
| Table Count | Visualizes the number of tables per host. | host.namesqlserver.database.namesqlserver.table.count
|
| Database Latency | Visualizes the rate of wait counts across all waits resulting in I/O. | host.namesqlserver.io.wait.countsqlserver.io.wait.time
|
| Execution Errors | Visualizes the number of execution errors. | host.namesqlserver.errors.count
|
| Rollback Rate | Visualizes the number of rollbacks. | host.namesqlserver.transactions.rollback.rate
|
| CPU and I/O Duration | Visualizes CPU activity by category and queries. | host.namesqlserver.performance.cpusqlserver.performance.io
|
| Server Operations | Visualizes the number of operations issued. | host.namesqlserver.batch.transactions.rate
|
| Database By Status | Visualizes the status of databases per host. Useful for quickly diagnosing unexpected database states. | host.namesqlserver.database.statesqlserver.database.name
|
| Top Current Queries | Visualizes the most recent queries on the host. | host.namesqlserver.query.text
|
Frontend Investigation
Real User Monitoring (RUM)
The RUM Board Template provides an overview of real user monitoring data from your frontend applications.
| Query Name | Query Description | Required Fields |
|---|
| Largest Contentful Paint (LCP) | Shows ratings based on the render time for the largest content on a page. | |
| Cumulative Layout Shift (CLS) | Shows ratings based on the stability of content layout on a page. | |
| Interaction to Next Paint (INP) | Shows ratings based on the responsiveness of a page. | |
| LCP P75 | Shows the 75th percentile for LCP. | |
| CLS P75 | Shows the 75th percentile for CLS. | |
| INP P75 | Shows the 75th percentile for INP. | |
| Total Events by Type | Shows event types ranked by occurrence. | |
| Largest Resource Requests | Shows the largest resource requests ranked by the average length of their response content. | http.response_content_lengthhttp.urlname
|
| Top 5 Endpoints by Request Count | Shows the top 5 endpoints ranked by number of requests. | |
| Slowest Requests by Endpoint | Shows the slowest endpoints based on the 75th percentile of request durations. | |
| Top Landing Pages by Session Count | Shows the most visited landing pages ranked by session count. | |
| Pages With the Most Events | Shows pages with the highest number of events, highlighting the most active pages. | |
Android Auto-Instrumentation
The Android Auto-Instrumentation Board Template provides an overview of the Honeycomb OpenTelemetry Android SDK auto-instrumentation.
This template relies on your source data fields being mapped to Honeycomb standard fields.
To learn how to map your fields, visit Dataset Definitions.
To learn more about instrumenting your frontend application, visit Send Android Data to Honeycomb.
| Query Name | Query Description | Required Fields |
|---|
| Average App Startup Times | Average time the application took to start up. Grouped into cold, warm, and hot startups. | duration_msnamestart.type
|
| Total Startup Times Over 1.5s | Number of instances where any startup time surpassed the threshold of 1.5 seconds. | duration_msnamestart.type
|
| App’s Memory and Heap Usage | Statistics about the application’s memory and heap usage. | |
| Average Network Request Time per Screen | Average duration for a screen’s requests to successfully retrieve data. | duration_mshttp.request.methodhttp.response.status_codescreen.name
|
| Screens with the Most Network Requests | Screens that have the most network activity. | http.request.methodscreen.name
|
| Top Screens by Total Network Request Failures | Screens with the highest number of failed network requests. | http.response.status_codescreen.name
|
| Screens with Application Not Responding (ANR) Errors | Number of instances where the application is unresponsive for more than 5 seconds. | exception.stacktracenamescreen.name
|
| Screens with Slow/Frozen Renders | Screens that take more than 16ms (slow) or more than 700ms (frozen) to render. | |
| Top App Crashes & Errors | Total number of times the application crashed, excluding ANR events. | exception.messageexception.stacktraceexception.typename
|
i0S Auto-Instrumentation
The iOS Auto-Instrumentation Board Template provides an overview of the Honeycomb OpenTelemetry Swift SDK auto-instrumentation.
| Query Name | Query Description | Required Fields |
|---|
| Monthly Active Users | Total number of distinct users that have used the application in the past month. | |
| Weekly Active Users | Total number of distinct users that have used the application in the past week. | |
| Daily Active Users | Total number of distinct users that have used the application in the past day. | |
| Average App Startup Times | Average time the application took to start up. Grouped into cold, warm, and hot startups. | metrickit.app_launch.app_resume_time_averagemetrickit.app_launch.optimized_time_to_first_draw_averagemetrickit.app_launch.time_to_first_draw_averagename
|
| Total Startup Times Over 1.5s | Total number of instances where any startup time surpassed the threshold of 1.5 seconds. | metrickit.app_launch.app_resume_time_averagemetrickit.app_launch.optimized_time_to_first_draw_averagemetrickit.app_launch.time_to_first_draw_averagename
|
| Abnormal App Exit Ratio | Ratio between abnormal application exits (foreground and background) and total application exits. | DIV(SUM($metrickit.app_exit.background.abnormal_exit_count, $metrickit.app_exit.foreground.abnormal_exit_count), SUM($metrickit.app_exit.background.normal_app_exit_count, $metrickit.app_exit.foreground.normal_app_exit_count, $metrickit.app_exit.background.abnormal_exit_count, $metrickit.app_exit.foreground.abnormal_exit_count))
|
| Average App Performance Across All Devices | Statistics on how the resources the application is using perform on average. | metrickit.cpu.cpu_timemetrickit.gpu.timemetrickit.memory.peak_memory_usagemetrickit.memory.suspended_memory_averagename
|
| Average Network Request Time per Screen | Average duration for all the app’s screens to successfully retrieve data. | duration_mshttp.request.methodhttp.response.status_codescreen.name
|
| Screens with the Most Network Requests | Screens that have the most network requests. | http.request.methodscreen.name
|
| Top Screens by Total Network Request Failures | Top screens that have failing network requests. | http.response.status_codescreen.name
|
| Long Hanging Screens | Screens that are hanging for more than 0.5 seconds. | metrickit.app_responsiveness.hang_time_averagenamescreen.name
|
| Average Screen Hang Times | Length of time each screen hangs on average. | metrickit.app_responsiveness.hang_time_averagenamescreen.name
|
| Most Used OS Versions | Operating systems used by the most users. | |
Kubernetes
Kubernetes Pod Metrics
The Kubernetes Pod Metrics Board Template includes queries that help you investigate pod performance and resource usage within Kubernetes clusters:
| Query Name | Query Description | Required Fields |
|---|
| Pod CPU Usage | Shows the amount of CPU used by each pod in the cluster. CPU is reported as the average core usage measured in cpu units. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers, and 1 hyper-thread on bare-metal Intel processors. | k8s.pod.cpu.utilizationk8s.pod.name
|
| Pod Memory Usage | Shows the amount of memory being used by each Kubernetes pod. | k8s.pod.memory.usagek8s.pod.name
|
| Pod Uptime Smokestacks | As pod uptime ever-increases, this query uses the smokestack method, which applies a LOG10 to the Pod Uptime metric, and newly started or restarted pods appear more significantly than pods that have been running a long time, which move into a straight line eventually. | LOG10($k8s.pod.uptime)k8s.pod.namek8s.pod.uptime
|
| Unhealthy Pods | Shows trouble that pods may be experiencing during their operating lifecycle. Many of these events are present during start-up and get resolved so the presence of a count isn’t necessarily bad. | k8s.namespace.namek8s.pod.namereason
|
| Pod CPU Utilization vs. Limit | When a CPU Limit is present in a pod configuration, this query shows how much CPU that each pod uses as a percentage against that limit. | k8s.pod.cpu_limit_utilizationk8s.pod.name
|
| Pod CPU Utilization vs. Request | When a CPU Request is present in a pod configuration, this query shows how much CPU that each pod uses as a percentage against that request value. | k8s.pod.cpu_request_utilizationk8s.pod.name
|
| Pod Memory Utilization vs. Limit | When a Memory Limit is present in a pod configuration, this query shows how much memory that each pod uses as a percentage against that limit value. | k8s.pod.memory_limit_utilizationk8s.pod.name
|
| Pod Memory Utilization vs. Request | When a Memory Request is present in a pod configuration, this query shows how much memory that each pod uses as a percentage against that request value. | k8s.pod.memory_request_utilizationk8s.pod.name
|
| Pod Network IO Rates | Displays Network IO RATE_MAX for Transmit and Receive network traffic (in bytes) as a stacked graph, and gives the overall network rate and the individual rate for each node. | k8s.pod.namek8s.pod.network.io.receivek8s.pod.network.io.transmit
|
| Pods With Low Filesystem Availability | Shows any pods where filesystem availability is below 5 GB. | k8s.pod.filesystem.availablek8s.pod.name
|
| Pod Filesystem Usage | Shows the amount of filesystem usage per Kubernetes pod, displayed in a stack graph to show total filesystem usage of all pods. | k8s.pod.filesystem.usagek8s.pod.name
|
| Pods Per Namespace | Shows the number of pods currently running in each Kubernetes namespace. | k8s.namespace.namek8s.pod.name
|
| Pods Per Node | Shows the number of pods currently running in each Kubernetes Node. | k8s.node.namek8s.pod.name
|
| Pod Network Errors | Shows network errors in receive and transmit, grouped by pod. | k8s.pod.namek8s.pod.network.errors.receivek8s.pod.network.errors.transmit
|
| Pods Per Deployment | Shows the number of pods currently deployed in different Kubernetes deployments. | k8s.deployment.namek8s.pod.name
|
Kubernetes Node Metrics
The Kubernetes Node Metrics Board Template includes queries that help you investigate node performance and resource usage within Kubernetes clusters:
| Query Name | Query Description | Required Fields |
|---|
| Node CPU Usage | Shows the amount of CPU used on each node in the cluster. CPU is reported as the average core usage measured in cpu units. One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers, and 1 hyper-thread on bare-metal Intel processors. | k8s.node.cpu.utilizationk8s.node.name
|
| Node Memory Utilization | Shows percent of memory used on each Kubernetes node. | IF(EXISTS($k8s.node.memory.available), MUL(DIV($k8s.node.memory.working_set, $k8s.node.memory.available), 100))k8s.node.memory.availablek8s.node.memory.usagek8s.node.name
|
| Node Network IO Rates | Displays Network IO RATE_MAX for Transmit and Receive network traffic as a stacked graph, and gives overall network rate and the individual rate for each node. | k8s.node.namek8s.node.network.io.receivek8s.node.network.io.transmit
|
| Unhealthy Nodes | Shows errors that Kubernetes nodes are experiencing. | k8s.namespace.namek8s.node.namereasonseverity_text
|
| Node Filesystem Utilization | Shows percent of filesystem used on each node. | IF(EXISTS($k8s.node.filesystem.usage),MUL(DIV($k8s.node.filesystem.usage,$k8s.node.filesystem.capacity), 100))k8s.node.filesystem.capacityk8s.node.filesystem.usagek8s.node.name
|
| Node Uptime Smokestack | As node uptime ever-increases, this query uses the smokestack method, which applies a LOG10 to the Node Uptime metric, and newly started or restarted nodes appear more significantly than nodes that have been running a long time, which move into a straight line eventually. | LOG10($k8s.node.uptime)k8s.node.namek8s.node.uptime
|
| Node Network Errors | Shows network transmit and receive errors for each node. | k8s.node.namek8s.node.network.errors.receivek8s.node.network.errors.transmit
|
| Pods and Containers per Node | Shows the number of pods and the number of containers per node as stacked graphs, and also shows total number of pods and containers across the environment. | k8s.container.namek8s.node.namek8s.pod.name
|
Kubernetes Workload Health
The Kubernetes Workload Health Board Template includes queries that help you diagnose Kubernetes-related application issues:
| Query Name | Query Description | Required Fields |
|---|
| Container Restarts | Shows the total number of restarts per pod, and the rate of restarts of pods where the restart count is greater than zero. | k8s.container.namek8s.container.restartsk8s.namespace.namek8s.pod.name
|
| Unhealthy Pods | Shows trouble that pods may be experiencing during their operating lifecycle. Many of these events are present during start-up and get resolved so the presence of a count isn’t necessarily bad. | k8s.namespace.namek8s.pod.namereason
|
| Pending Pods | Shows pods in a “Pending” state. | k8s.pod.namek8s.pod.phase
|
| Failed Pods | Shows pods in a “Failed” or “Unknown” state. | k8s.pod.namek8s.pod.phase
|
| Unhealthy Nodes | Shows errors that Kubernetes nodes are experiencing. | k8s.namespace.namereasonk8s.pod.namereasonseverity_text
|
| Unhealthy Volumes | Shows volume creation and attachment failures. | k8s.namespace.namek8s.pod.namereasonseverity_text
|
| Unscheduled Daemonset Pods | Tracks cases where a pod in a daemonset is not currently running on every node in the cluster as it should be. | SUB($k8s.daemonset.desired_scheduled_nodes, $k8s.daemonset.current_scheduled_nodes)k8s.daemonset.current_scheduled_nodesk8s.daemonset.desired_scheduled_nodesk8s.daemonset.namek8s.namespace.name
|
| Stateful Set Pod Readiness | Tracks any stateful sets where pods are in an non-ready state that should be in a ready state. | SUB($k8s.statefulset.desired_pods,$k8s.statefulset.ready_pods)k8s.statefulset.desired_podsk8s.statefulset.namek8s.statefulset.ready_pods
|
| Deployment Pod Status | Shows Deployments where Pods have not fully deployed. Numbers greater than zero show pods in a deployment that are not yet “ready”. | SUB($k8s.deployment.desired,$k8s.deployment.available)k8s.deployment.availablek8s.deployment.desiredk8s.deployment.name
|
| Job Failures | Tracks the number of failed pods in Kubernetes jobs. | k8s.job.failed_podsk8s.job.name
|
| Active Cron Jobs | Tracks the number of active pods in each Kubernetes cron job. | k8s.cronjob.active_jobsk8s.cronjob.name
|
OpenTelemetry
OpenTelemetry Collector Operations
The OpenTelemetry Collector Operations Board Template includes queries with key metrics emitted by the OpenTelemetry Collector during its operation:
| Query Name | Query Description | Required Fields |
|---|
| Exporter Span Failures | Shows when errors happen during enqueueing or sending in exporters. | net.host.nameotelcol_exporter_enqueue_failed_spansotelcol_exporter_send_failed_spans
|
| Collector Uptime Smokestacks | Shows the uptime for different pods with a Log10 to make it clearer where restarts are happening. | LOG10($otelcol_process_uptime)net.host.nameotelcol_process_uptime
|
| Exporter Metric Send Failures | Shows when errors happen during sending from exporters. | net.host.nameotelcol_exporter_enqueue_failed_metric_pointsotelcol_exporter_send_failed_metric_points
|
| Exporter Metrics Enqueue Failures | Shows when errors happen during enqueueing in exporters. | net.host.nameotelcol_exporter_send_failed_metric_points
|
| Exporter Log Records Failures | Shows when errors happen during enqueueing or sending in exporters. | net.host.nameotelcol_exporter_enqueue_failed_log_records
|
OpenTelemetry Java Metrics
The OpenTelemetry Java Metrics Board Template includes queries that help you investigate application issues related to the Java Virtual Machine (JVM).
Metrics for Java applications are sourced from the JVM and reported by the OpenTelemetry Java Agent or Honeycomb OpenTelemetry Distribution for Java.
| Query Name | Query Description | Required Fields |
|---|
| JVM Memory Usage (Young Generation) | Shows memory usage for Eden space on the JVM heap, which is where newly created objects are stored. When it fills, a minor Garbage Collection (GC) occurs, moving all “live” objects to the Survivor space. In addition to current memory usage, committed represents the guaranteed available memory, and limit represents maximum usable. | host.namepoolprocess.runtime.jvm.memory.committedprocess.runtime.jvm.memory.limitprocess.runtime.jvm.memory.usageprocess.runtime.jvm.memory.usage_after_last_gcservice.nametype
|
| JVM Memory Usage (Old Generation) | Shows memory usage for tenured Gen JVM heap space, which stores long-lived objects. When a Full or Major GC is performed, it is expensive and may pause app execution. Committed represents guaranteed available memory, and limit represents maximum usable memory. | host.namepoolprocess.runtime.jvm.memory.committedprocess.runtime.jvm.memory.limitprocess.runtime.jvm.memory.usageprocess.runtime.jvm.memory.usage_after_last_gcservice.nametype
|
| JVM Garbage Collection (GC) Activity | Shows JVM garbage collection activity. JVM GC actions occur periodically to reclaim memory but consume CPU cycles to do so. In the worst cases, a GC can cause the entire JVM to pause, making the application appear unresponsive. | process.runtime.jvm.gc.duration.countactiongchost.nameprocess.runtime.jvm.gc.duration.avgprocess.runtime.jvm.gc.duration.maxservice.name
|
| JVM CPU Utilization | Shows system CPU utilization and 1-minute load average, as captured by the JVM. | host.nameprocess.runtime.jvm.cpu.utilizationprocess.runtime.jvm.system.cpu.load_1mservice.name
|
| JVM Buffer Memory Usage | Shows usage of buffer memory, which is provided by the OS and is outside the JVM’s heap memory allocation. Buffer memory is used by Java NIO to quickly write data to network or disk. | host.nameprocess.runtime.jvm.buffer.limitprocess.runtime.jvm.buffer.usageservice.name
|
| JVM Non-Heap Memory Usage | Shows usage of JVM non-heap memory, which is allocated above and beyond the heap size you’ve configured. JVM non-heap memory is a section of memory in the JVM that stores class information (Metaspace), compiled code cache, thread stack, and so on. It cannot be garbage collected. | host.namepoolprocess.runtime.jvm.memory.committedprocess.runtime.jvm.memory.limitprocess.runtime.jvm.memory.usageservice.nametype
|
AWS
AWS Lambda Health
The AWS Lambda Health Board Template includes queries that monitor the health of AWS Lambda functions, including metrics for invocations, errors, throttles, and concurrency:
| Query Name | Query Description | Required Fields |
|---|
| Duration & Execution by ID/Version | Tracks the execution time of Lambda functions, identified by their ID or version. Useful for analyzing the performance and efficiency of different versions or instances of a function over time. | duration_msfaas.executionfaas.namefaas.version
|
| Lambda Invocations by Function | Shows the total number of times each Lambda function is invoked. It helps in tracking the frequency of usage of different functions, enabling a clear understanding of which functions are most or least used. | FunctionNameMetricNameNamespace
|
| Latency by Function/Metric | Shows the response time for each Lambda function, broken down by specific metrics. Useful for identifying functions that may be experiencing performance issues due to high latency. | FunctionNameMetricNameNamespaceamazonaws.com/AWS/Lambda/Duration.maxamazonaws.com/AWS/Lambda/PostRuntimeExtensionsDuration.max
|
| Function Error Count and Rate | Shows two key pieces of information: the total number of errors encountered by each Lambda function and the error rate, calculated as the ratio of errors to total invocations. Useful for pinpointing functions that are failing or experiencing issues. | FunctionNameMetricNameNamespaceamazonaws.com/AWS/Lambda/Errors.count
|
| Lambda Throttles | Shows the instances where Lambda invocations are being throttled, such as when the number of function calls exceeds the concurrency limits. Tracking this helps in managing and optimizing the scalability settings for each function. | FunctionNameMetricNameNamespaceamazonaws.com/AWS/Lambda/Throttles.count
|
| Function Concurrency | Monitors the simultaneous execution count of each Lambda function, tracking how many instances of a function are running at the same time. | FunctionNameMetricNameNamespaceamazonaws.com/AWS/Lambda/ConcurrentExecutions.avgamazonaws.com/AWS/Lambda/UnreservedConcurrentExecutions.avg
|
EC2 Health
The AWS EC2 Board Template includes queries that monitor the health of AWS EC2 instances, including status failures, disk Read and Write operations, and EBS operations:
| Query Name | Query Description | Required Fields |
|---|
| CPU Utilization | Shows CPU utilization per EC2 instance. | amazonaws.com/AWS/EC2/CPUUtilization.maxDimensions.InstanceIdcloud.account.idcloud.region
|
| Network I/O | Shows network input and output per EC2 instance. | cloud.account.idcloud.regionamazonaws.com/AWS/EC2/NetworkIn.maxamazonaws.com/AWS/EC2/NetworkPacketsOut.maxDimensions.InstanceId
|
| EBS Read Operations | Shows the number of read operations committed by the instance. | cloud.account.idcloud.regionamazonaws.com/AWS/EC2/EBSReadOps.maxDimensions.InstanceId
|
| EBS Write Operations | Shows the number of write operations committed by the instance. | amazonaws.com/AWS/EC2/EBSWriteOps.maxDimensions.InstanceIdcloud.account.idcloud.region
|
| EBS IO Balance | Shows available input and output per second that attached EBS volumes are utilizing. Use to monitor potential throttling on an EBS volume attached to an instance. | amazonaws.com/AWS/EC2/EBSIOBalance%.maxDimensions.InstanceIdcloud.account.idcloud.region
|
| Instance Metadata Service Outliers | Shows the number of instances that are not currently using IMDSv2. Use to identify potential security issues with EC2 instances. | amazonaws.com/AWS/EC2/MetadataNoToken.maxDimensions.InstanceIdcloud.account.idcloud.region
|
| EC2 Disk Read/Write | Shows Write and Read operations undertaken by EC2 instances. Use to monitor EBS volume usage. | amazonaws.com/AWS/EC2/EBSWriteBytes.maxamazonaws.com/AWS/EC2/EBSReadBytes.maxDimensions.InstanceIdNamespace
|
| EC2 Instance Status Failures | Shows any EC2 instances that have failed a status check in the provided time period. | cloud.account.idcloud.regionamazonaws.com/AWS/EC2/StatusCheckFailed.maxDimensions.InstanceId
|
AWS ALB/ELB Health
The AWS ALB/ELB Board Template includes queries that monitor the Load Balancer’s health, status codes, active connections, and requests.
This template relies on AWS Metrics streams provided by AWS Cloudwatch.
Data is streamed from an AWS Kinesis Data Firehose to an endpoint compatible with CloudWatch Metric Streams.
To use this template, you must provision a metrics stream for EC2 instances that you wish to monitor.
| Query Name | Query Description | Required Fields |
|---|
| Request Count Per Target | Shows how requests are distributed across targets. Use to diagnose imbalanced traffic in the load balancer. | cloud.regionDimensions.AvailabilityZoneamazonaws.com/AWS/ApplicationELB/RequestCountPerTarget.countDimensions.LoadBalancerDimensions.TargetGroupcloud.account.id
|
| Healthy vs. Unhealthy Host Count | Shows the number of healthy versus unhealthy hosts per load balancer, which is segmented across target groups and availability zones. Use to quickly spot failing load balancer targets. | amazonaws.com/AWS/ApplicationELB/HealthyHostCount.maxamazonaws.com/AWS/ApplicationELB/UnHealthyHostCount.maxDimensions.LoadBalancerDimensions.TargetGroupcloud.account.idDimensions.AvailabilityZone
|
| Load Balancer Status Codes | Shows status codes per load balancer. Use to identify routing or traffic management issues. | cloud.account.idcloud.regionamazonaws.com/AWS/ApplicationELB/HTTPCode_ELB_3XX_Count.countamazonaws.com/AWS/ApplicationELB/HTTPCode_ELB_4XX_Count.countamazonaws.com/AWS/ApplicationELB/HTTPCode_ELB_5XX_Count.countDimensions.LoadBalancer
|
| Active Connections | Shows active connections per load balancer. | amazonaws.com/AWS/ApplicationELB/ActiveConnectionCount.countDimensions.LoadBalancercloud.account.idcloud.region
|
| State Routing | Shows load balancer state routing. Use to identify network configuration errors, unresponsive applications, or health check delays. | amazonaws.com/AWS/ApplicationELB/UnhealthyStateRouting.maxDimensions.LoadBalancerDimensions.TargetGroupDimensions.AvailabilityZonecloud.account.idcloud.regionamazonaws.com/AWS/ApplicationELB/HealthyStateRouting.max
|
| Load Balancer Capacity Units | Shows LCUs consumed during a given period of time. Use to optimize load balancer cost and detecting bottlenecks. | Dimensions.LoadBalancercloud.account.idcloud.regionamazonaws.com/AWS/ApplicationELB/PeakLCUs.max
|
| Anomalous Host Count | Shows the number of hosts behaving abnormally. Use to detect and diagnose excessive error rates, latency issues, or inconsistent health check results. | amazonaws.com/AWS/ApplicationELB/AnomalousHostCount.maxDimensions.LoadBalancerDimensions.TargetGroupcloud.account.id
|
| DNS Target State | Shows load balancer DNS target state resolution. Use to identify failing targets and DNS misconfigurations. | amazonaws.com/AWS/ApplicationELB/HealthyStateDNS.maxamazonaws.com/AWS/ApplicationELB/HealthyStateDNS.countamazonaws.com/AWS/ApplicationELB/UnhealthyStateDNS.maxDimensions.LoadBalancerDimensions.TargetGroupcloud.account.idDimensions.AvailabilityZone
|
| TLS Negotiation Errors | Shows the number of TLS negotiation errors per load balancer. | amazonaws.com/AWS/ApplicationELB/ClientTLSNegotiationErrorCount.countDimensions.LoadBalancerDimensions.AvailabilityZonecloud.account.idcloud.region
|
| Connection Error Count | Shows errors on targets. Use to diagnose and troubleshoot misconfigured load balancer targets. | Dimensions.TargetGroupamazonaws.com/AWS/ApplicationELB/TargetConnectionErrorCount.maxDimensions.LoadBalancercloud.account.idcloud.region
|
SQS
The SQS Board Template provides insight into critical AWS SQS operations.
This template relies on AWS Metrics streams provided by AWS Cloudwatch.
Data is streamed from an AWS Kinesis Data Firehose to an endpoint compatible with CloudWatch Metric Streams.
To use this template, you must provision a metrics stream for EC2 instances that you wish to monitor.
| Query Name | Query Description | Required Fields |
|---|
| Request Count Per Minute | Shows requests made per minute. Use to observe the traffic patterns and detect unexpected load or errors. | telemetry.sdk.languagehttp.hosthttp.routehttp.methodhttp.status_codehttp.server_name
|
| HTTP Response Duration | Shows the P95 response duration by route, status code, and server name. Use for Django HTTP performance. | http.routehttp.methodhttp.status_codehttp.server_nametelemetry.sdk.languagehttp.response.body.sizeduration_ms
|
| HTTP Errors | Shows count of HTTP errors by route, status code, and host.name. Use to assess success and error rates of APIs. | http.status_codehttp.server_nameerrortelemetry.sdk.languagehttp.routehttp.method
|
| Exceptions | Shows exceptions thrown in the service. Use to assess the overall health of the application. | http.server_nameexception.typecode.namespaceexception.messageexception.stacktracetelemetry.sdk.language
|
| AVG and P95 Request Size | Shows the average and P95 HTTP request size. Use to monitor payload efficiency. | http.server_nametelemetry.sdk.languagehttp.request.body.sizehttp.routehttp.methodhttp.status_code
|
| AVG and P95 Response Size | Shows the average and P95 HTTP response size. Use to monitor payload efficiency. | telemetry.sdk.languagehttp.response.body.sizehttp.routehttp.methodhttp.status_codehttp.server_name
|
| P95 and Heatmap of Job Duration | Shows the P95 and a heatmap of Job Duration by messaging destination, messaging system, and server name. Provides insights into status async job runners. | http.server_nametelemetry.sdk.languageduration_msmessaging.destinationmessaging.system
|
| Jobs Executed | Shows count of root traces with messaging system and destination. Use to assess overall performance of the async job operations. | http.server_namemessaging.destinationmessaging.systemtelemetry.sdk.languagemessaging.destination_kind
|
| DB connection Count Per Min | Shows the connection count per minute where database connection event is “open”. Use to gain visibility into connection pooling efficiency. | telemetry.sdk.languagedb.operationdb.systemdb.namedb.connection.event
|
RDS
The RDS Board Template provides insight to monitor and optimize performance for AWS RDS databases.
This template relies on AWS Metrics streams provided by AWS Cloudwatch.
Data is streamed from an AWS Kinesis Data Firehose to an endpoint compatible with CloudWatch Metric Streams.
To use this template, you must provision a metrics stream for EC2 instances that you wish to monitor.
| Query Name | Query Description | Required Fields |
|---|
| Number of Connections | Shows the number of connections to RDS instances. | amazonaws.com/AWS/RDS/DatabaseConnections.countDimensions.DBInstanceIdentifiercloud.account.id
|
| Database Load | Shows the level of session activity on RDS instances. | amazonaws.com/AWS/RDS/DBLoad.maxDimensions.DBInstanceIdentifiercloud.account.id
|
| Disk Queue Depth | Shows the number of outstanding input/output waiting to access the disk. High queue depth can indicate the workload is generating more read/write requests than underlying storage can handle. | amazonaws.com/AWS/RDS/DiskQueueDepth.maxDimensions.DBInstanceIdentifiercloud.account.idamazonaws.com/AWS/RDS/DiskQueueDepth.count
|
| Freeable Memory | Shows the minimum freeable memory per database instance. Use to identify memory pressure in RDS instances. | amazonaws.com/AWS/RDS/FreeableMemory.minDimensions.DBInstanceIdentifiercloud.account.idamazonaws.com/AWS/RDS/FreeableMemory.count
|
| Read/Write Operations | Shows the read and write operations per second that the RDS instance is performing. Use to diagnose bottlenecks, optimize workloads, and manage cost. | Dimensions.DBInstanceIdentifiercloud.account.idamazonaws.com/AWS/RDS/WriteIOPS.maxamazonaws.com/AWS/RDS/ReadIOPS.max
|
| CPU Utilization | Shows maximum CPU utilization across database instance identifiers. | Dimensions.DBInstanceIdentifiercloud.account.idamazonaws.com/AWS/RDS/CPUUtilization.max
|
| Free Storage Space | Shows the amount of free storage space per database instance. | amazonaws.com/AWS/RDS/FreeStorageSpace.maxDimensions.DBInstanceIdentifiercloud.account.id
|
| Burst Balance | Shows the burst capacity per database instance. Lower burst capacity can affect input/output performance. Use for capacity planning and to optimize database performance. | Dimensions.DBInstanceIdentifiercloud.account.idamazonaws.com/AWS/RDS/BurstBalance.sum
|
| Read/Write Latency | Visualizes Read/Write latency per database instance. Use for troubleshooting slow queries, inefficient indexes, or locking issues. | amazonaws.com/AWS/RDS/WriteLatency.sumDimensions.DBInstanceIdentifiercloud.account.idamazonaws.com/AWS/RDS/ReadLatency.sum
|
| Transaction Log Disk Usage | Shows the amount of storage consumed by database transaction logs. Use to prevent storage exhaustion. | Dimensions.DBInstanceIdentifiercloud.account.idcloud.regionamazonaws.com/AWS/RDS/TransactionLogsDiskUsage.max
|
| Checkpoint Lag | Shows checkpoint lag. Use to determine latency between leader and followers in replication. | amazonaws.com/AWS/RDS/CheckpointLag.maxDimensions.DBInstanceIdentifier
|
| Swap Usage | Shows swap activity (from RAM to disk) per RDS instance. Use for identifying performance issues related to memory pressure. | cloud.account.idcloud.regionamazonaws.com/AWS/RDS/SwapUsage.maxDimensions.DBInstanceIdentifier
|
| Network Throughput | Shows the rate at which network data is being sent from RDS instances. Use to identify excessive data transfer or increased query latencies. | amazonaws.com/AWS/RDS/NetworkTransmitThroughput.maxDimensions.DBInstanceIdentifiercloud.account.idcloud.region
|
Honeycomb Features
Refinery Operations
For teams using Refinery to sample their data, the Refinery Board Template provides an overview of sampling operations.
Refinery emits metrics that provide insights into its health, trace throughput, and sampling statistics.
Required fields in the Refinery Board Template map to these metrics and populate automatically when sent to Honeycomb.
To learn more about these fields, visit Refinery Configuration.
| Query Name | Query Description | Required Fields |
|---|
| Stress Relief Status | Shows the current stress level on the Refinery cluster. | stress_levelstress_relief_activatedhostname or host.name
|
| Dropped From Stress | Shows how many traces are being dropped due to stress on the Refinery cluster. | dropped_from_stresshostname or host.name
|
| Stress Relief Log | Shows reasons why Refinery is going into stress relief. | StressReliefreasonmsghostname or host.name
|
| Cache Health | Shows metrics for cache health. | collect_cache_buffer_overrunmemory_inusecollect_cache_entries_max or collect_cache_entries.max-
collect_cache_capacity num_goroutinesprocess_uptime_secondshostname or host.name
|
| Cache Ejections | Shows number of traces ejected from cache. | trace_send_ejected_fulltrace_send_ejected_memsizehostname or host.name
|
| Intercommunications | Shows total events from outside Refinery and events redirected from a peer. | incoming_router_spanpeer_router_batchhostname or host.name
|
| Receive Buffers | Shows receive buffer operations. | incoming_router_droppedpeer_router_droppedhostname or host.name
|
| Peer Send Buffers | Show metrics for the queue used to buffer spans to send to peer nodes. | libhoney_peer_queue_overflowlibhoney_peer_send_errorshostname or host.name
|
| Upstream Send Buffers | Shows metrics for the queue used to buffer spans to send to Honeycomb. | libhoney_upstream_queue_lengthlibhoney_upstream_enqueue_errorslibhoney_upstream_response_errorslibhoney_upstream_send_errorslibhoney_upstream_send_retrieshostname or host.name
|
| EMADynamicSampler Performance | Shows EMADynamicSampler sampling effectiveness. | emadynamic_sample_rate_avgemadynamic_keyspace_sizeemadynamic_num_keptemadynamic_num_dropped
|
| EMAThroughputSampler Performance | Shows EMAThroughputSampler sampling effectiveness. | emathroughput_sample_rate_avgemathroughput_keyspace_sizeemathroughput_num_keptemathroughput_num_dropped
|
| WindowedThroughput Performance | Shows WindowedThroughput sampling effectiveness. | windowedthroughput_sample_rate_avgwindowedthroughput_keyspace_sizewindowedthroughput_num_keptwindowedthroughput_num_dropped
|
| TotalThroughputSampler Performance | Shows TotalThroughputSampler sampling effectiveness. | totalthroughput_sample_rate_avgetotalthroughput_keyspace_sizetotalthroughput_num_kepttotalthroughput_num_dropped
|
| DynamicSampler Performance | Shows DynamicSampler sampling effectiveness. | dynamic_sample_rate_avgdynamic_keyspace_sizedynamic_num_keptdynamic_num_dropped
|
| RulesBasedSampler Performance | Shows RulesBasedSampler sampling effectiveness. | rulesbased_sample_rate_avgrulesbased_num_keptrulesbased_num_dropped
|
| Trace Indicators | Shows total traces sent before completion and span received for a trace already sent. | trace_sent_cache_hittrace_send_no_root
|
| Sampling Decisions | Shows total traces accepted and sent or dropped. | trace_acceptedtrace_send_droppedtrace_send_kept
|
| Refinery Send Event Error Logs | Shows errors when sending events to its peers or upstream to our API server. | |
| Refinery Handler Event Error Logs | Shows errors when receiving or parsing events being sent to a node. | msgdatasetapi_hosterror.errerror.msg
|
| Refinery Events Exceeding Max Size | Shows errors when events are too large to be sent to Honeycomb. | |
Activity Log Security
The Activity Log Security Board Template includes queries that track API Key activity.
Honeycomb automatically creates the required fields for the Activity Log Board Templates when it generates Activity Log events.
| Query Name | Query Description | Required Fields |
|---|
| API Key Added Permissions | Shows when permissions are added to an existing API key. | resource.typeresource.changed_fieldsenvironment.slug
|
| API Key Activities by User | Displays the number of changes to API keys broken down by user. | key_typeenvironment.sluguser.emailresource.action
|
| Authentication Type by User | Displays which type of authentication is used for each user. | authentication_methoduser.email
|
Activity Log Leaderboard
The Activity Log Leaderboard Board Template includes queries that highlight advanced and frequent usage of Honeycomb by your team.
Honeycomb automatically creates the required fields for the Activity Log Board Templates when it generates Activity Log events.
| Query Name | Query Description | Required Fields |
|---|
| Queries by User | Shows which environments are being queried. | |
| Complex Queries by User | Shows which users frequently use Visualize, Where, and Having clauses. | resource.typeSUM( IF(EXISTS($query.having), 3, 0), REG_COUNT($query.where, `,`), REG_COUNT($query.visualize, `,`))user.email
|
| Top Query Visualizations | Shows the most commonly used visualizations. | resource.typeSUM( IF(EXISTS($query.having), 3, 0), REG_COUNT($query.where, `,`), REG_COUNT($query.visualize, `,`))query.visualize
|
| Top Tinkerers | Lists which users perform the most updates to SLOs, Triggers, and Calculated Fields. | |
| Queries by Dataset | Shows which datasets are being queried the most. | resource.typeenvironment.slugdataset.slug
|
| Queries by Environment | Shows a count of run queries as grouped by environment. | resource.typeenvironment.slug
|
Activity Log Trigger and SLO Activity
The Activity Log Trigger and SLO Activity Board Template includes queries related to trigger and SLO activations and modifications.
Honeycomb automatically creates the required fields for the Activity Log Board Templates when it generates Activity Log events.
| Query Name | Query Description | Required Fields |
|---|
| Trigger State Changes | Shows instances when triggers have been triggered or resolved. | resource.typeresource.actionname
|
| Trigger Modifications | Shows creations, modifications, and deletions of triggers. | resource.typeresource.action
|
| Most Updated Triggers | Shows triggers that received the most changes recently. | resource.typeresource.actionname
|
| Top Updated SLOs by Update Type | Shows creations, modifications, and deletions of SLOs and the supporting SLI (Calculated Field). | resource.typeresource.actionenvironment.slugresource.changed_fieldsnameuser.email
|
| SLOs Created and Deleted | Shows creation and deletion of SLOs. | resource.typeresource.actionenvironment.slugnameresource.changed_fieldsuser.email
|
| SLI Expression Changes by SLO | Shows when SLIs (Calculated Fields) related to SLOs have been changed. | resource.typeresource.actionresource.changed_fieldsenvironment.slugnamesli.expressionbefore.sli.expressionuser.email
|
Artificial Intelligence
Anthropic Usage & Cost Monitoring
The Anthropic Usage & Cost Monitoring Board Template provides comprehensive insights into your Anthropic API usage and costs, including token consumption, feature usage, and cost attribution across models, workspaces, and API keys.
Key visualizations include:
Usage Analytics:
- Token Usage Over Time: Track input, output, and cache token consumption trends
- Usage by Model: Compare token usage across different Claude models
- Workspace Usage Distribution: Monitor usage patterns across different workspaces
- API Key Activity: Track usage by individual API keys for access control insights
Cost Monitoring:
- Daily Cost Trends: Monitor spending patterns over time
- Cost by Model: Understand which models drive the highest costs
- Workspace Cost Attribution: Allocate costs across different teams or projects
- Cost per Token Analysis: Calculate cost efficiency metrics
Performance Insights:
- Cache Hit Rate: Monitor cache utilization to optimize costs
- Feature Usage: Track web search and other feature utilization
- Service Tier Distribution: Analyze usage across different API service tiers
The Board Template automatically populates when the Anthropic Usage Receiver sends metrics and logs to Honeycomb.
Required fields include model, workspace_id, service_tier, and cost-related attributes like amount_minor_units and description.
Troubleshooting
To explore common issues when working with Board Templates, visit Common Issues with Visualization: Board Templates.