Fastly supports streaming logs to get more insight into the behavior of their content distribution system. This data can be sent to Honeycomb for querying.
Basic details on setting this up can be found in the Fastly documentation.
Details about sampling the streamed Fastly data are included below.
Sampling can be used to reduce data volume in your Honeycomb datasets where you are gathering Fastly data.
To implement sampling, we recommend the following configuration:
First, update your configuration to create a logging rule which will forward requests to Honeycomb only if the req.http.log_request
local variable is set to "1"
.
Next, create two VCL snippets:
The code of the first snippet should be similar to the following. The rates here should be adjusted based on your production traffic.
table codes {
"200s": "20",
"300s": "5",
"400s": "3",
"500s": "1",
}
This table describes the number of events which flow through per sampled event based on status code (1 will not sample at all, 20 samples every 20th event, and so on).
The second VCL snippet should be exactly as follows:
set req.http.samplerate = table.lookup(codes, regsub(resp.status, "^([1-5])..", "\100s"), "1");
if (randombool(1, std.atoi(req.http.samplerate))) {
set req.http.log_request = "1";
} else {
set req.http.log_request = "0";
}
This will set the req.http.log_request
variable (mentioned above) if sampling should be applied.
Lastly, ensure that the sample rate is included as a property of the JSON event, which is sent to Honeycomb.
In the formatted JSON provided as part of the streaming log configuration, add this line at the same level of the time
and data
keys:
"samplerate": %{req.http.samplerate}V,
This will encode samplerate
as a top level key sent to the Honeycomb API, causing all visualizations rendered by Honeycomb to appear as if all of the events, even ones which were sampled out, were sent.
This basic configuration can be extended as you like to sample based on cache status or other fields if desired.