Getting Other Webserver Logs into Honeycomb | Honeycomb

We use cookies or similar technologies to personalize your online experience & tailor marketing to you. Many of our product features require cookies to function properly.

Read our privacy policy I accept cookies from this site

Getting Other Webserver Logs into Honeycomb

Our agent’s nginx parser can be easily tricked in to parsing other webservers' logs. You’ll create a config that contains the log format of your web server and pass it to the nginx parser.

As an example, this page describes how to consume HAProxy and Apache logs using the honeytail nginx parser.

Overview  🔗

To use the nginx parser to consume a non-nginx log file, we will create a config that looks something like nginx config and use it to define the log format. We’ll then run honeycomb on the log using the config file containing the format. The config file will have one statement log_format name '<format>'; (maybe broken up in to multiple lines). The format will be a series of labels identifying each field - the character following each label is the field separator. For example, to collect the HAProxy name, pid, ip address, and port from a log snippet of haproxy[291], you would use $process[$pid] $ip:$port as your format string. You can use any names you like for the labels—they will be used as the column names in Honeycomb.

Below are two examples—HAProxy’s http log formats and the default apache log format.

You will likely have to tailor these examples to your specific config depending on the version of the web server you’re running and other options you may have in their configs.

HAProxy http format  🔗

The HAProxy’s http format for logs has a wealth of detail packed in to a very compact form. Here’s a sample log line (from the HAProxy docs):

Feb  6 12:12:56 localhost \
  haproxy[14389]: [06/Feb/2009:12:14:14.655] http-in static/srv1 \
  10/0/30/69/109 200 2750 - - ---- 1/1/1/1/0 0/0 {} {}\
  "GET /index.html HTTP/1.1"

Here’s the description of those fields (again, from the haproxy docs):

  Field   Format                                Extract from the example above
      1   process_name '[' pid ']:'                            haproxy[14389]:
      2   client_ip ':' client_port                   
      3   '[' accept_date ']'                       [06/Feb/2009:12:14:14.655]
      4   frontend_name                                                http-in
      5   backend_name '/' server_name                             static/srv1
      6   Tq '/' Tw '/' Tc '/' Tr '/' Tt*                       10/0/30/69/109
      7   status_code                                                      200
      8   bytes_read*                                                     2750
      9   captured_request_cookie                                            -
     10   captured_response_cookie                                           -
     11   termination_state                                               ----
     12   actconn '/' feconn '/' beconn '/' srv_conn '/' retries*    1/1/1/1/0
     13   srv_queue '/' backend_queue                                      0/0
     14   '{' captured_request_headers* '}'                   {}
     15   '{' captured_response_headers* '}'                                {}
     16   '"' http_request '"'                      "GET /index.html HTTP/1.1"

Here’s the config snippet used to match that log line. Because there are two date fields, we’re going to use the one in square brackets [] because it’s easier to parse and it’s in the correct format (d/m/y:h:m:s.sss). We’ll stub out the syslog-provided date at the beginning of the line by using dots (to match any character). We can split up the log_format line into multiple lines for easier editing. Make sure the last line ends with a semicolon. For this example, let’s call this file hny-haproxy.conf

log_format haproxy '... .. ..:..:.. $hostname $process[$pid]: '
    '$client_ip:$client_port [$time_local] $frontend $backend/$backend_server '
    '$time_client_connect/$time_queued/$time_backend_conn/$time_backend_resp/$time_total '
    '$status_code $bytes_read $request_cookie $response_cookie $termination_state '
    '$act_conn/$fe_conn/$be_conn/$srv_conn/$retries $srv_queue/$backend_queue '
    '{$request_headers} {$response_headers} "$request"';

To use this config, you’d run our agent honeytail like this:

honeytail \
    -k YOUR_API_KEY \
    -p nginx \
    -d haproxy \
    -f /path/to/haproxy.log \
    --nginx.conf hny-haproxy.conf \
    --nginx.format haproxy

Apache log format  🔗

Apache’s configuration can truly go as far as you want to take it. For this example, let’s just stick with the default log format.

Here’s an example line (split into two for readability) - - [03/Nov/2016:16:11:43 -0700] "GET /robots.txt HTTP/1.1" 200 334 \
  "-" "Mozilla/5.0 (compatible; bingbot/2.0; +"

There’s not nearly as much there as in the HAProxy log, but let’s pull out what we can, taking a hint from the Apache docs to decipher the fields. Let’s call this file hny-apache.conf

log_format apache '$remote_ip $identd $user [$time_local] "$request" $status_code '
  '$bytes_sent "$referrer" "$user_agent"';

To use this config, you’d run our agent honeytail like this:

honeytail \
    -k YOUR_API_KEY \
    -p nginx \
    -d apache \
    -f /path/to/apache/access.log \
    --nginx.conf hny-apache.conf \
    --nginx.format apache

Details on the log format  🔗

Timestamps  🔗

The nginx parser can only interpret timestamps in one of the two formats that nginx itself uses. The field in the log format description must be named correctly in order for Honeycomb to use the timestamp for the event instead of considering it a normal string field.

  • $time_local : Time in the Common Log Format, eg 06/Feb/2009:12:14:14.655
  • $time_iso8601: Time in the ISO 8601 standard format, eg 2009-02-06T12:14:14+00:00

Case and allowed characters  🔗

The field names in the log_format specification have some restrictions - it must only contain characters in the set [a-z_]. In other words, they:

  • Must be all lower case
  • Must not contain spaces
  • May only contain letters and underscores (no numbers or other symbols)

Any other characters will be considered a field delimiter in the log format.

Suggested queries in Honeycomb  🔗

Just to whet your appetite, we’d like to suggest a few graphs to explore with your haproxy dataset:

  • Slowest endpoints: breakdown by http_request, calculate p95(time_duration), order by p95(time_duration) descending
  • Average connection time per backend server: breakdown by backend_name, server_name, calculate avg(time_backend_connect)

Scrubbing personally identifiable information  🔗

While we believe strongly in the value of being able to track down the precise query causing a problem, we understand the concerns of exporting log data which may contain sensitive user information.

With that in mind, we recommend using honeytail’s nginx parser, but adding a --scrub_field=sensitive_field_name flag to hash the concrete sensitive_field_name value, or --drop_field=sensitive_field_name to drop it altogether and prevent it being sent to Honeycomb’s servers.

More information about dropping or scrubbing sensitive fields can be found here.

Parsing URL patterns  🔗

honeytail can break URLs up into their component parts, storing extra information in additional columns. This behavior is turned on by default for the request field on nginx datasets, but can become more useful with a little bit of guidance from you.

See honeytail’s documentation for details on configuring our agent to parse URL strings.