Scale and Size Honeycomb Refinery | Honeycomb

Scale and Size Honeycomb Refinery

Use Refinery Board Template to create boards and provide an overview of sampling operations.

Refinery provides a variety of configuration options that allow operators to tune the product to handle a variety of different volumes and shapes of telemetry data.

Refinery is a stateful service and is not optimized for dynamic auto-scaling. Changes in cluster membership can result in temporary inconsistent sampling decisions and dropped traces. As such, we recommend provisioning Refinery for your anticipated peak load.

Sizing The Cache 

In an ideal world with consistent, steady traffic and no traffic bursts, the proper Refinery cache configuration would be set AvailableMemory to utilize all of the available system RAM. Unfortunately, we do not live in an ideal world. Instead, we provide an exploratory approach to sizing Refinery based on experimentation using your actual traffic pattern and volume.

As a rough starting point, set MaxMemoryPercentage to 75 to use 75% of AvailableMemory and set CacheCapacity to the equivalent percentage of AvailableMemory value divided by 10,000 in bytes. For example, if the system’s RAM is 4GB and 75% of that value is approximately 3GB, then set the CacheCapacity to 300_000.

To tune the MaxMemoryPercentage value, monitor process_uptime_seconds and look for restarts. If Refinery restarts due to Out Of Memory exceptions or due to the host’s Out Of Memory Killer, decrease MaxMemoryPercentage to give Refinery more head room on the system.

Sizing The Receive Buffers 

Monitor incoming_router_dropped and peer_router_dropped, and look for values above 0. If either metric is consistently above 0, increase CacheCapacity. The receive buffers are consistently three times the size of CacheCapacity.

Sizing The Send Buffers 

Monitor libhoney_peer_queue_overflow and look for values above 0. If it is consistently above 0, increase PeerBufferSize. The default PeerBufferSize is 100,000.

Monitor libhoney_upstream_queue_length and look for values to stay under the UpstreamBufferSize value. If it hits UpstreamBufferSize, then Refinery will block waiting to send upstream to the Honeycomb API. Adjust the UpstreamBufferSize as needed. The default UpstreamBufferSize is 10,000.

Scaling The CPU 

Monitor CPU usage on the host(s), and target for 80% CPU usage. Spiking to 90% is acceptable but avoid spiking to 100%. If CPU utilization is too high, add more cores or more hosts as needed.

Scaling The RAM 

Monitor collect_cache_buffer_overrun and look for values above 0. If it is consistently above 0, add more RAM or most hosts as needed. Note that occasional blips are acceptable (see collector metrics). If you add more RAM, do not forget to re-size the cache.