Send Other Webserver Logs

To support a variety of webserver and related technologies, our Honeytail agent has an Nginx parser that can be easily tricked in to parsing other webservers’ logs. You will create a config that contains the log format of your web server and pass it to the Honeytail nginx parser.

As an example, this page describes how to consume HAProxy and Apache logs using the honeytail nginx parser.

Overview 

To use the nginx parser to consume a non-nginx log file, we will create a config that looks something like nginx config and use it to define the log format. We will then run honeytail on the log using the config file containing the format.

The config file will have one statement log_format name '<format>'; (maybe broken up in to multiple lines). The format will be a series of labels identifying each field - the character following each label is the field separator. For example, to collect the HAProxy name, pid, ip address, and port from a log snippet of haproxy[291] 127.0.0.1:4715, you would use $process[$pid] $ip:$port as your format string. You can use any names you like for the labels—they will be used as the column names in Honeycomb.

Below are two examples—HAProxy’s http log formats and the default apache log format.

You will likely have to tailor these examples to your specific config depending on the version of the web server you are running and other options you may have in their configs.

HAProxy Http Format 

The HAProxy’s http format for logs has a wealth of detail packed in to a very compact form. Here is a sample log line (from the HAProxy docs):

Feb  6 12:12:56 localhost \
  haproxy[14389]: 10.0.1.2:33317 [06/Feb/2009:12:14:14.655] http-in static/srv1 \
  10/0/30/69/109 200 2750 - - ---- 1/1/1/1/0 0/0 {1wt.eu} {}\
  "GET /index.html HTTP/1.1"

Here is the description of those fields (again, from the haproxy docs):

Field Format Extract from the example above
1 process_name '[' pid ']:' haproxy[14389]:
2 client_ip ':' client_port 10.0.1.2:33317
3 '[' accept_date ']' [06/Feb/2009:12:14:14.655]
4 frontend_name http-in
5 backend_name '/' server_name static/srv1
6 Tq '/' Tw '/' Tc '/' Tr '/' Tt* 10/0/30/69/109
7 status_code 200
8 bytes_read* 2750
9 captured_request_cookie -
10 captured_response_cookie -
11 termination_state ----
12 actconn '/' feconn '/' beconn '/' srv_conn '/' retries* 1/1/1/1/0
13 srv_queue '/' backend_queue 0/0
14 '{' captured_request_headers* '}' {haproxy.1wt.eu}
15 '{' captured_response_headers* '}' {}
16 '"' http_request '"' "GET /index.html HTTP/1.1"

Here is the config snippet used to match that log line. Because there are two date fields, we are going to use the one in square brackets [] because it is easier to parse and it is in the correct format (d/m/y:h:m:s.sss). We will stub out the syslog-provided date at the beginning of the line by using dots (to match any character). We can split up the log_format line into multiple lines for easier editing. Make sure the last line ends with a semicolon. For this example, let us call this file hny-haproxy.conf

log_format haproxy '... .. ..:..:.. $hostname $process[$pid]: '
    '$client_ip:$client_port [$time_local] $frontend $backend/$backend_server '
    '$time_client_connect/$time_queued/$time_backend_conn/$time_backend_resp/$time_total '
    '$status_code $bytes_read $request_cookie $response_cookie $termination_state '
    '$act_conn/$fe_conn/$be_conn/$srv_conn/$retries $srv_queue/$backend_queue '
    '{$request_headers} {$response_headers} "$request"';

To use this config, you would run our agent honeytail like this:

honeytail \
    -k YOUR_API_KEY \
    -p nginx \
    -d haproxy \
    -f /path/to/haproxy.log \
    --nginx.conf hny-haproxy.conf \
    --nginx.format haproxy

Apache Log Format 

Apache’s configuration can truly go as far as you want to take it. For this example, let us just stick with the default log format.

Here is an example log line (split into two for readability)

207.46.1.2 - - [03/Nov/2016:16:11:43 -0700] "GET /robots.txt HTTP/1.1" 200 334 \
  "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

From an Apache logging config like this:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

There is not nearly as much there as in the HAProxy log, but let us pull out what we can, taking a hint from the Apache docs to decipher the fields. Let us call this file hny-apache.conf

log_format apache '$remote_ip $identd $user [$time_local] "$request" $status_code '
  '$bytes_sent "$referrer" "$user_agent"';

To use this config, you would run our agent honeytail like this:

honeytail \
    -k YOUR_API_KEY \
    -p nginx \
    -d apache \
    -f /path/to/apache/access.log \
    --nginx.conf hny-apache.conf \
    --nginx.format apache

Details on The Log Format 

Timestamps 

The nginx parser can only interpret timestamps in one of the two formats that nginx itself uses. The field in the log format description must be named correctly in order for Honeycomb to use the timestamp for the event instead of considering it a normal string field.

  • $time_local : Time in the Common Log Format, 06/Feb/2009:12:14:14.655 for example
  • $time_iso8601: Time in the ISO 8601 standard format, 2009-02-06T12:14:14+00:00 for example

Case and Allowed Characters 

The field names in the log_format specification have some restrictions - it must only contain characters in the set [a-z_]. In other words, they:

  • Must be all lower case
  • Must not contain spaces
  • May only contain letters and underscores (no numbers or other symbols)

Any other characters will be considered a field delimiter in the log format.

Suggested Queries in Honeycomb 

Just to whet your appetite, we would like to suggest a few graphs to explore with your haproxy dataset:

  • Slowest endpoints: breakdown by http_request, calculate p95(time_duration), order by p95(time_duration) descending
  • Average connection time per backend server: breakdown by backend_name, server_name, calculate avg(time_backend_connect)

Scrubbing Personally Identifiable Information 

While we believe strongly in the value of being able to track down the precise query causing a problem, we understand the concerns of exporting log data, which may contain sensitive user information.

With that in mind, we recommend using honeytail’s nginx parser, but adding a --scrub_field=sensitive_field_name flag to hash the concrete sensitive_field_name value, or --drop_field=sensitive_field_name to drop it altogether and prevent it being sent to Honeycomb’s servers.

More information about dropping or scrubbing sensitive fields can be found here.

Parsing URL Patterns 

honeytail can break URLs up into their component parts, storing extra information in additional columns. This behavior is turned on by default for the request field on nginx datasets, but can become more useful with a little bit of guidance from you.

See [honeytail’s documentation]/send-data/logs/structured/honeytail/#parsing-url-patterns) for details on configuring our agent to parse URL strings.