Refinery offers a range of configuration options to help operators tune it for varying volumes and shapes of telemetry data.
After your initial setup, we recommend increasing RAM and CPU cores as needed. Use the guidance on this page for scaling, and consult our troubleshooting documentation for additional support.
Refinery includes a built-in mechanism called Stress Relief that activates when the system is under heavy load.
Frequent or prolonged activations indicate that Refinery is under-provisioned for the current load.
You can monitor this via the stress_relief_activated field in Refinery internal metrics.
To determine which resources need to be increased, check the activation reasons in Refinery logs:
StressRelief has been activated.reason field in the log message to understand what triggered the activation.
For example, a reason of MaxAlloc indicates a sudden memory usage spike.Scaling Refinery effectively involves choosing the right balance between vertical and horizontal scaling.
We recommend prioritizing vertical scaling (adding resources to existing nodes) over horizontal scaling (adding more nodes) whenever possible. This approach:
Focus on ensuring fewer nodes can handle your peak load effectively before considering adding additional instances.
Queues control how spans are buffered before sampling. Proper queue configuration ensures that Refinery can handle peak load efficiently.
IncomingQueueSize The IncomingQueueSize value sets the maximum number of spans that a Refinery host can receive and queue for sampling.
Monitor the current queue size using the collector_incoming_queue_length metric and watch for incoming_router_dropped values above 0.
Understand what queue behavior tells you about Refinery’s ability to handle incoming traffic.
Use these steps to decide how to adjust queues, CPU, and cluster size for optimal performance.
memory_inuse is within 80% of allocated memory, try increasing IncomingQueueSize to absorb load.PeerQueueSize The PeerQueueSize value sets the maximum spans that can be received from peer Refinery hosts and queued for sampling.
Apply the same scaling strategy as IncomingQueueSize, but note that adding instances to reduce peer queue length has diminishing returns: more peers increase overall cluster communication overhead, reinforcing the preference for vertical scaling.
AvailableMemory The AvailableMemory value sets the maximum amount or RAM that Refinery can use for processing and queues.
Set memory values to ensure Refinery has enough headroom for normal operation.
AvailableMemory to roughly 85% of total system memory.MaxMemoryPercentage to 75, indicating that Refinery can use up to 75% of AvailableMemory.AvailableMemory to ~3.4GB and MaxMemoryPercentage to 75% (~2.5GB usable).Adjust memory allocations to prevent restarts and handle peak load safely.
Monitor process_uptime_seconds for unexpected restarts.
If Refinery restarts due to Out-of-Memory exceptions or the host’s Out-of-Memory Killer, either increase the memory made available to the Refinery host or reduce MaxMemoryPercentage to provide more headroom.