Several factors affect the exact number of events created by your metric data: number of metrics captured, capture interval, and the number of labels you apply to – or the cardinality of – your metrics. Due to the regular nature of metric captures, you can expect the event volume used for metrics to be consistently predictable over time, and controllable.
To control event volume, you can change:
Honeycomb automatically compacts some event volume based on data point attributes, including system.cpu.time
.
Every metric data point is associated with a resource, representing the system that it describes, and any number of attributes, providing additional context about the meaning of that data point. Honeycomb stores these data points, and all associated metadata (the resources and attributes) in events within our columnar data store.
Honeycomb will combine data points into the same event if:
See some examples of metric-to-event mapping.
Any system that produces OpenTelemetry metrics will send repeated OTLP metrics requests. The more metrics contained in any OTLP request to Honeycomb, the greater the opportunity Honeycomb has to combine those requests into the same set of events.
Requests can be grouped by time or size using OpenTelemetry Collector’s Batch Processor, which can be added to any preexisting OpenTelemetry Collector pipeline.
Requests can also be grouped across hosts by sending them through a single OpenTelemetry Collector processor before forwarding them to Honeycomb. (OpenTelemetry Collector can receive OTLP requests from other servers using the OTLP Receiver.)
For any metrics request, data points from distinct metrics can be combined into the same event if they share the same complete set of attributes (both keys and values) across all resources and data points.
For this reason, it is generally good practice to share sets of attribute values across as many metrics as possible.
For instance, if two distinct metrics are broken out by process.pid
, their data points can share the same events.
But if one metric has a process.pid
attribute and the other does not, each data point will end up in a distinct event.
Resource attributes can be set or changed using the OpenTelemetry SDK, or by using the OpenTelemetry Collector Resource Processor. Labels can be set or changed using the OpenTelemetry SDK, or by using the OpenTelemetry Collector Metrics Transform Processor. Note that this processor lives in the “contrib” build of OpenTelemetry Collector.
Metrics instrumentation can separate any individual metric (for example, http.server.active_requests
) into any number of distinct timeseries that can be distinguished from one another by resource attributes (for example, host.name
) or data point attributes (for example, http.method
).
The larger the cardinality of any of these attributes, the more distinct timeseries the system will be capturing.
(Cardinality is the number of distinct values that exist for any individual attribute.
For example, if http.method
is sometimes GET
and sometimes POST
, the cardinality of this attribute would be 2
.)
Timeseries can sometimes accumulate exponentially.
For example, if a system had 100 distinct host.name
fields, 2 distinct http.method
fields, and 4 distinct http.host
fields, it could consist of up to 100 × 2 × 4 = 800 distinct timeseries just for the http.server.active_requests
metric.
(And given that all of these would have distinct sets of attributes, this means Honeycomb would create 800 events at every capture interval for this metric.)
Here is an example of what this kind of combinatoric cardinality explosion can look like:
host.name measurements (for http.server.active_requests, measured every 60s for 10 minutes)
--------- ---------------------------------------------------------------------------------
host1 46, 20, 36, 11, 38, 25, 5, 32, 57, 14
host2 16, 48, 1, 46, 29, 15, 53, 49, 33, 40
cardinality of host.name = 2
2 timeseries, generated 20 events over 10 minutes
at minute 1, your dataset would contain the following 2 events:
- host.name: host1, http.server.active_requests: 46
- host.name: host2, http.server.active_requests: 16
host.name http.method measurements (for http.server.active_requests, measured every 60s for 10 minutes)
--------- ----------- ---------------------------------------------------------------------------------
host1 GET 9, 4, 15, 6, 26, 11, 5, 4, 19, 9
host1 POST 37, 16, 21, 5, 12, 14, 0, 28, 38, 5
host2 GET 15, 33, 1, 45, 17, 6, 19, 12, 14, 19
host2 POST 1, 15, 0, 1, 12, 9, 34, 37, 19, 21
cardinality of host.name = 2
cardinality of http.method = 2
2*2=4 timeseries, generated 40 events over 10 minutes
at minute 1, your dataset would contain the following 4 events:
- host.name: host1, http.method: GET, http.server.active_requests: 9
- host.name: host1, http.method: POST, http.server.active_requests: 37
- host.name: host2, http.method: GET, http.server.active_requests: 15
- host.name: host2, http.method: POST, http.server.active_requests: 1
host.name http.method http.host measurements (for http.server.active_requests, measured every 60s for 10 minutes)
--------- ----------- --------- ---------------------------------------------------------------------------------
host1 GET public 8, 2, 14, 5, 25, 9, 3, 3, 18, 8
host1 GET internal 1, 2, 1, 1, 1, 2, 2, 1, 1, 1
host1 POST public 37, 16, 20, 5, 11, 13, 0, 27, 37, 4
host1 POST internal 0, 0, 1, 0, 1, 1, 0, 1, 1, 1
host2 GET public 14, 31, 0, 44, 14, 5, 18, 11, 13, 18
host2 GET internal 1, 2, 1, 1, 3, 1, 1, 1, 1, 1
host2 POST public 1, 14, 0, 1, 11, 8, 33, 37, 19, 20
host2 POST internal 0, 1, 0, 0, 1, 1, 1, 0, 0, 1
cardinality of host.name = 2
cardinality of http.method = 2
cardinality of http.host = 2
2*2*2=8 timeseries, generated 80 events over 10 minutes
at minute 1, your dataset would contain the following 8 events:
- host.name: host1, http.method: GET, http.host: public, http.server.active_requests: 8
- host.name: host1, http.method: GET, http.host: internal, http.server.active_requests: 1
- host.name: host1, http.method: POST, http.host: public, http.server.active_requests: 37
- host.name: host1, http.method: POST, http.host: internal, http.server.active_requests: 0
- host.name: host2, http.method: GET, http.host: public, http.server.active_requests: 14
- host.name: host2, http.method: GET, http.host: internal, http.server.active_requests: 1
- host.name: host2, http.method: POST, http.host: public, http.server.active_requests: 1
- host.name: host2, http.method: POST, http.host: internal, http.server.active_requests: 0
Timeseries can be set or changed using the OpenTelemetry SDK, or by using OpenTelemetry Collector’s Filter Processor or Metrics Transform Processor. Note that the Metrics Transform Processor lives in the “contrib” build of OpenTelemetry Collector.
Every metrics stream is configured with a capture interval, which determines the frequency that individual data points are captured. More frequent capture intervals allow for a smaller granularity of any timeseries graph. Less frequent capture intervals generate proportionally fewer events. Capture interval can be modified directly at the point of capture. Generally this variable will be in the OpenTelemetry SDK or in an OpenTelemetry Collector receiver.
As noted above, metrics normally include all data point attributes as key-value pairs on the metric event. However, Honeycomb has found that certain standard attributes relating to OpenTelemetry Semantic Conventions can be combined, or compacted, even when they’re not identical because there are only a small number of individual values for these attributes. This compaction occurs automatically.
For example, the metric system.disk.io
has an attribute called direction
.
The only two values of direction are transmit
and receive
, so Honeycomb distributes these two values into a single event with two fields: system.disk.io.transmit
and system.disk.io.receive
.
The full set of metric names and data point attributes that are distributed in this way is:
Metric Name | Data Point Attribute Name |
---|---|
system.disk.io |
direction |
system.filesystem.usage |
state |
system.processes.count |
status |
system.network.connections |
protocol |
system.network.dropped |
direction |
system.network.dropped_packets |
direction |
system.network.errors |
direction |
system.network.io |
direction |
k8s.node.network.errors |
direction |
k8s.node.network.io |
direction |
k8s.pod.network.errors |
direction |
k8s.pod.network.io |
direction |
system.cpu.time
There is one more metric that is treated specially: system.cpu.time
.
This metric has two key data point attributes that are compacted automatically: state
and logical_number
.
The state
attribute is distributed as above, generating values like system.cpu.time.idle
.
In addition, the logical_number
attribute, an indication of which CPU core is used on a multi-core CPU, is dropped, and its different values are summed into the appropriate state
.
Thus, system.cpu.time.idle
is the sum of the idle
value of the state
attribute over all values of logical_number
.
The result of this manipulation is that up to 128 individual metrics are compacted into a single Honeycomb event.