Manage Metrics Events

Several factors affect the number of events created by your metric data:

number of metrics captured
capture interval
number of labels you apply to metrics

Since metrics are captured regularly, the event volume used for metrics can be predicted and controlled over time.

To control event volume you can change:

How many events are generated by each metric capture
How often those metrics are sent, or the capture interval

Honeycomb automatically compacts some event volume based on data point attributes, including system.cpu.time and system.cpu.utilization.

Events Generated by Each Metric Capture

Every metric data point is associated with a resource, representing the system that it describes, and any number of attributes, providing additional context about the meaning of that data point. Honeycomb stores these data points, and all associated metadata (the resources and attributes) in events within our columnar data store.

Honeycomb will combine data points into the same event if:

they were received as part of the same OTLP request
their timestamps are equivalent when truncated to the second (we truncate metric timestamps to the second for improved compaction)
they have the same set of resource attribute keys and values
they have the same set of data point attribute keys and values (sometimes these are also called “tags” or “labels”)

See some examples of metric-to-event mapping.

Grouping OTLP Metrics Requests

Any system that produces OpenTelemetry metrics will send repeated OTLP metrics requests. The more metrics contained in any OTLP request to Honeycomb, the greater the opportunity Honeycomb has to combine those requests into the same set of events.

Requests can be grouped by time or size using OpenTelemetry Collector’s Batch Processor, which can be added to any preexisting OpenTelemetry Collector pipeline.

Requests can also be grouped across hosts by sending them through a single OpenTelemetry Collector processor before forwarding them to Honeycomb. (OpenTelemetry Collector can receive OTLP requests from other servers using the OTLP Receiver.)

Adjusting the Distinct Attributes in Any Individual Metrics Request

For any metrics request, data points from distinct metrics can be combined into the same event if they share the same complete set of attributes (both keys and values) across all resources and data points. For this reason, it is generally good practice to share sets of attribute values across as many metrics as possible. For instance, if two distinct metrics are broken out by process.pid, their data points can share the same events. But if one metric has a process.pid attribute and the other does not, each data point will end up in a distinct event.

Resource attributes can be set or changed using the OpenTelemetry SDK, or by using the OpenTelemetry Collector Resource Processor. Labels can be set or changed using the OpenTelemetry SDK, or by using the OpenTelemetry Collector Metrics Transform Processor. Note that this processor lives in the “contrib” build of OpenTelemetry Collector.

Adjusting the Number of Captured Timeseries

Metrics instrumentation can separate any individual metric (for example, http.server.active_requests) into any number of distinct timeseries that can be distinguished from one another by resource attributes (for example, host.name) or data point attributes (for example, http.method).

The larger the cardinality of any of these attributes, the more distinct timeseries the system will be capturing. (Cardinality is the number of distinct values that exist for any individual attribute. For example, if http.method is sometimes GET and sometimes POST, the cardinality of this attribute would be 2.)

Timeseries can sometimes accumulate exponentially. For example, if a system had 100 distinct host.name fields, 2 distinct http.method fields, and 4 distinct http.host fields, it could consist of up to 100 × 2 × 4 = 800 distinct timeseries just for the http.server.active_requests metric. (And given that all of these would have distinct sets of attributes, this means Honeycomb would create 800 events at every capture interval for this metric.)

Here is an example of what this kind of combinatoric cardinality explosion can look like:

host.name  measurements (for http.server.active_requests, measured every 60s for 10 minutes)
---------  ---------------------------------------------------------------------------------
host1      46, 20, 36, 11, 38, 25,  5, 32, 57, 14
host2      16, 48,  1, 46, 29, 15, 53, 49, 33, 40

cardinality of host.name = 2
2 timeseries, generated 20 events over 10 minutes
at minute 1, your dataset would contain the following 2 events:
  - host.name: host1, http.server.active_requests: 46
  - host.name: host2, http.server.active_requests: 16

host.name  http.method  measurements (for http.server.active_requests, measured every 60s for 10 minutes)
---------  -----------  ---------------------------------------------------------------------------------
host1      GET           9,  4, 15,  6, 26, 11,  5,  4, 19,  9
host1      POST         37, 16, 21,  5, 12, 14,  0, 28, 38,  5
host2      GET          15, 33,  1, 45, 17,  6, 19, 12, 14, 19
host2      POST          1, 15,  0,  1, 12,  9, 34, 37, 19, 21

cardinality of host.name = 2
cardinality of http.method = 2
2*2=4 timeseries, generated 40 events over 10 minutes
at minute 1, your dataset would contain the following 4 events:
  - host.name: host1, http.method: GET,  http.server.active_requests: 9
  - host.name: host1, http.method: POST, http.server.active_requests: 37
  - host.name: host2, http.method: GET,  http.server.active_requests: 15
  - host.name: host2, http.method: POST, http.server.active_requests: 1

host.name  http.method  http.host  measurements (for http.server.active_requests, measured every 60s for 10 minutes)
---------  -----------  ---------  ---------------------------------------------------------------------------------
host1      GET          public      8,  2, 14,  5, 25,  9,  3,  3, 18,  8
host1      GET          internal    1,  2,  1,  1,  1,  2,  2,  1,  1,  1
host1      POST         public     37, 16, 20,  5, 11, 13,  0, 27, 37,  4
host1      POST         internal    0,  0,  1,  0,  1,  1,  0,  1,  1,  1
host2      GET          public     14, 31,  0, 44, 14,  5, 18, 11, 13, 18
host2      GET          internal    1,  2,  1,  1,  3,  1,  1,  1,  1,  1
host2      POST         public      1, 14,  0,  1, 11,  8, 33, 37, 19, 20
host2      POST         internal    0,  1,  0,  0,  1,  1,  1,  0,  0,  1

cardinality of host.name = 2
cardinality of http.method = 2
cardinality of http.host = 2
2*2*2=8 timeseries, generated 80 events over 10 minutes
at minute 1, your dataset would contain the following 8 events:
  - host.name: host1, http.method: GET,  http.host: public,   http.server.active_requests: 8
  - host.name: host1, http.method: GET,  http.host: internal, http.server.active_requests: 1
  - host.name: host1, http.method: POST, http.host: public,   http.server.active_requests: 37
  - host.name: host1, http.method: POST, http.host: internal, http.server.active_requests: 0
  - host.name: host2, http.method: GET,  http.host: public,   http.server.active_requests: 14
  - host.name: host2, http.method: GET,  http.host: internal, http.server.active_requests: 1
  - host.name: host2, http.method: POST, http.host: public,   http.server.active_requests: 1
  - host.name: host2, http.method: POST, http.host: internal, http.server.active_requests: 0

Timeseries can be set or changed using the OpenTelemetry SDK, or by using OpenTelemetry Collector’s Filter Processor or Metrics Transform Processor. Note that the Metrics Transform Processor lives in the “contrib” build of OpenTelemetry Collector.

Modifying Capture Interval

Every metrics stream is configured with a capture interval, which determines the frequency that individual data points are captured. More frequent capture intervals allow for a smaller granularity of any timeseries graph. Less frequent capture intervals generate proportionally fewer events. Capture interval can be modified directly at the point of capture. Generally this variable will be in the OpenTelemetry SDK or in an OpenTelemetry Collector receiver.

Data Point Attribute Compaction

As noted above, metrics normally include all data point attributes as key-value pairs on the metric event. However, Honeycomb has found that certain standard attributes relating to OpenTelemetry Semantic Conventions can be combined, or compacted, even when they’re not identical because there are only a small number of individual values for these attributes. This compaction occurs automatically.

For example, the metric system.disk.io has an attribute called direction. The only two values of direction are transmit and receive, so Honeycomb distributes these two values into a single event with two fields: system.disk.io.transmit and system.disk.io.receive.

The full set of metric names and data point attributes that are distributed in this way is:

Metric Name	Data Point Attribute Name
`system.disk.io`	`direction`
`system.filesystem.usage`	`state`
`system.processes.count`	`status`
`system.network.connections`	`protocol`
`system.network.dropped`	`direction`
`system.network.dropped_packets`	`direction`
`system.network.errors`	`direction`
`system.network.io`	`direction`
`k8s.node.network.errors`	`direction`
`k8s.node.network.io`	`direction`
`k8s.pod.network.errors`	`direction`
`k8s.pod.network.io`	`direction`

Compaction of `system.cpu.time` and `system.cpu.utilization`

There are other metrics that are treated specially: system.cpu.time, and system.cpu.utilization.

These metrics have two key data point attributes that are compacted automatically: state and logical_number. The state attribute is distributed as above, generating values like system.cpu.time.idle. In addition, the logical_number attribute, an indication of which CPU core is used on a multi-core CPU, is dropped, and its different values are summed into the appropriate state. Thus, system.cpu.time.idle is the sum of the idle value of the state attribute over all values of logical_number.

The result of this manipulation is that up to 128 individual metrics are compacted into a single Honeycomb event.

Honeycomb.io Documentation