Setup
Capturing web logs for Honeycomb requires:- installing our agent,
honeytail - configuring it to parse your NGINX logs correctly
- launching honeytail
Install the Agent
Download and install the latesthoneytail by running:
- deb-amd64
- deb-arm64
- rpm
- bin-linux-amd64
- bin-linux-arm64
- bin-darwin-amd64
- source
Download the Verify the package.Install the package.The packages install
honeytail_1.10.0_amd64.deb package.honeytail, its config file /etc/honeytail/honeytail.conf,
and some start scripts.
Build honeytail from source if you need it in an unpackaged form or for ad-hoc use.ParserNametonginxWriteKeyto your API key, available from the account pageLogFilesto the path for the log file you want to ingest. For NGINX, this is typically/var/log/nginx/access.log.Datasetto the name of the dataset you wish to create with this log file.
Identify Log Locations + Formats
Make sure to run through Optional Configuration below before runninghoneytail, in order to get the richest metadata out of your web traffic and into your logs.
In addition to the standard configuration captured in /etc/honeytail/honeytail.conf, you will want to set the two options in the Nginx Parser Options section:
ConfigFile: the path to your NGINX config file: whichever part of it contains the definition for the log formatLogFormatName: the name of the log format used to produce the NGINX access log file
/etc/nginx/nginx.conf and has the following snippet:
ConfigFile should be set to /etc/nginx/nginx.conf and your LogFormatName value should be set to my_favorite_format.
Or configure honeytail to read the nginx logs using command line parameters:
Launch the Agent
Start up ahoneytail process using upstart or systemd or by launching the process by hand.
- upstart
- systemd
- manual
Backfilling Archived Logs
In addition to getting current logs flowing, you can backfill old logs into Honeycomb to kickstart your dataset. By runninghoneytail from the command line, you can import old logs separate from tailing your current logs.
Adding the --backfill flag to honeytail adjusts a number of settings to make it appropriate for backfilling old data, such as stopping when it gets to the end of the log file instead of the default behavior of waiting for new content (like tail).
The specific locations on your system may vary from ours, but once you fill in your system’s values instead of our examples, you can backfill using this command:
honeytail does not unzip log files, so you will need to do this before backfilling.
Easiest way—pipe to STDIN: zcat *.gz | honeytail --file - --backfill --all-the-other-flags.Troubleshooting
Check outhoneytail Troubleshooting for debugging tips.
Optional Configuration
Nginx logs can be an incredibly powerful, high-level view of your system—especially so if they are configured correctly and enriched with custom, application-specific information about each request. Below are two simple ways to pack those logs with more useful metadata.Missing Default Options
Nginx comes with some fairly powerful optional log fields that are not included by default. This is thelog_format we recommend for any configuration file (note the extra quotes around some fields):
access_log line, but by defining a log_format (combined, in the example above) and specifying the format name (--nginx.format=combined), you will be able to take advantage of all of these additional fields.
Make sure that all fields that start $http_ are quoted in your log_format:
$bytes_sent: the size of the response sent back to the client, including headers$host: the requested Host header, identifying how your server was addressed$http_authorization: authorization headers, for associating logs with individual users (must be quoted)$http_referer: the referring site, if the client followed a link to your site (must be quoted)$http_user_agent: the User-Agent header, useful in identifying your clients (must be quoted)$http_x_forwarded_for: the origin IP address, if running behind a load balancer (must be quoted)$http_x_forwarded_proto: the origin protocol, if terminating TLS in front of nginx (must be quoted)$remote_addr: the IP address of the host making the connection to nginx$remote_user: the user name supplied when/if using basic authentication$request_id: an nginx-generated unique ID to every request (only available in nginx version 1.11 and later).$request_length: the length of the client’s request, including headers and body$request_time: the time (in ms) the server took to respond to the request$request: the HTTP method, request path, and protocol version$server_name: the hostname of the machine accepting the request$status: the HTTP status code returned for this request
Embedding Custom Response Headers
Nginx can also be configured to extract custom request and response headers. Of the two, response headers are the most powerful in this case—they can carry application-specific IDs or timers back through to the nginx log. Having all of the information pertinent to a single request, available in a single log line, can be an incredibly powerful tool in diagnosing the origin of a problem in your system. To include a specific response header in youraccess.log, add an $upstream_http_ variable to your log_format—the response header values will be written out and ingested by our nginx parser!
Make sure to put quotes around these variables to capture any embedded spaces.
For example, an X-RateLimit-Remaining header can be output by adding $upstream_http_x_ratelimit_remaining to the log_format line.
See the nginx docs for more about extracting metadata from the HTTP response or request.
As with other fields which may output strings (for example,$http_user_agent), be careful when logging strings—add an extra set of double quotes around values which might contain spaces, in order to ensure correct parsing.
A final trick: sometimes, response headers may be set for logging that should not be exposed back to the user.
In this case, the proxy_hide_header directive may be used to strip out specific headers by name:
Scrubbing Personally Identifiable Information
While we believe strongly in the value of being able to track down the precise query causing a problem, we understand the concerns of exporting log data which may contain sensitive user information. With that in mind, we recommend usinghoneytail’s nginx parser, but adding a --scrub_field=sensitive_field_name flag to hash the concrete sensitive_field_name value, or --drop_field=sensitive_field_name to drop it altogether and prevent it being sent to Honeycomb’s servers.
Find more information about dropping or scrubbing sensitive field.
Parsing URL Patterns
honeytail can break URLs up into their component parts, storing extra information in additional columns.
This behavior is turned on by default for the request field on nginx datasets, but can become more useful with a little bit of guidance from you.
See honeytail’s documentation for details on configuring our agent to parse URL strings.