$ sudo initctl start honeytail
$ sudo systemctl start honeytail
$ honeytail -c /etc/honeytail/honeytail.conf
We have written a lightweight tool called honeytail
. Honeytail will tail your existing log files, parse the content, and send it up to Honeycomb.
If you already have structured data in an existing log, this is the easiest method to get that data in to Honeycomb.
The quality of your dataset within Honeycomb depends entirely upon the quality of the data going into the log file. To get the most useful insight out of Honeycomb, you must provide high quality data in your log file. In addition to as much detail about each event as you can include, it is best to always include some host-level information to give each event in the log context, for example the host on which the log exists.
Honeytail is designed to run as a daemon so that it can continuously consume new content as it appears in the log files as well as detect when a log file rotates. It must be configured with your team API key and the name of the Dataset to which you want to write data. You specify one of the available parser modules depending on how your log data is structured. Once running, honeytail will take care of uploading all the data in your log file and picking up new data as it comes in.
Honeytail is open source—we encourage auditing the software you will run on your servers. We also happily consider pull requests with new log format parsers and other improvements.
To see an example of Honeytail in action, try out the Honeytail-Dockerd Example App.
honeytail
will tail existing log files, parse the content, and send it up to Honeycomb.
You can view its source here.
Download and install the latest honeytail
by running:
# Download and install the AMD64 debian package
wget -q https://honeycomb.io/download/honeytail/v1.2.0/honeytail_1.2.0_amd64.deb && \
echo '0406ea25b9d1bcffb6a1b7067deebc79fa8006ca40de13d5e01661f4b5fef598 honeytail_1.2.0_amd64.deb' | sha256sum -c && \
sudo dpkg -i honeytail_1.2.0_amd64.deb
# Download and install the ARM64 debian package
wget -q https://honeycomb.io/download/honeytail/v1.2.0/honeytail_1.2.0_arm64.deb && \
echo 'db18bc5f7f3b62e01323f6f3def9eb56f485557f32d4331bc692c2d42dd63c04 honeytail_1.2.0_arm64.deb' | sha256sum -c && \
sudo dpkg -i honeytail_1.2.0_arm64.deb
# Download and install the rpm package
wget -q https://honeycomb.io/download/honeytail/v1.2.0/honeytail-1.2.0-1.x86_64.rpm && \
echo 'a53e5f9071f7b3f0a6cbee83f523b19d6d68c6961d2fbad7ecfc23567056e97f honeytail-1.2.0-1.x86_64.rpm' | sha256sum -c && \
sudo rpm -i honeytail-1.2.0-1.x86_64.rpm
wget -q -O honeytail https://honeycomb.io/download/honeytail/v1.2.0/honeytail-linux-amd64 && \
echo 'd830774a620f6ecc4b39898bd5349f75fc86f4852314f42e0ac80c0e8b735677 honeytail' | sha256sum -c && \
chmod 755 ./honeytail
wget -q -O honeytail https://honeycomb.io/download/honeytail/v1.2.0/honeytail-linux-arm64 && \
echo '99097a7c8f57a9639572bddc118c0d23470726beffc280a7843498d3a4aedf3b honeytail' | sha256sum -c && \
chmod 755 ./honeytail
wget -q -O honeytail https://honeycomb.io/download/honeytail/v1.2.0/honeytail-darwin-amd64 && \
echo '5d14ed3535efaa678657d74a0cfbe50e3aada778fa1a6bbd32159ed8e692ca2f honeytail' | shasum -a 256 && \
chmod 755 ./honeytail
# Build from latest source after setting up go
git clone https://github.com/honeycombio/honeytail
cd honeytail; go install
The packages install honeytail
, its config file /etc/honeytail/honeytail.conf
,
and some start scripts. Build honeytail
from source if you need it in an unpackaged form or for ad-hoc use.
A example Dockerfile is also available on Github.
You should modify the config file and uncomment and set:
ParserName
to the appropriate one of json
, nginx
, mongo
, mysql
, arangodb
, regex
WriteKey
to your API key, available from the account pageLogFiles
to the path for the log file you want to ingest, or -
for stdinDataset
to the name of the dataset you wish to create with this log file.The docs pages for JSON, NGINX, MongoDB, MySQL, and regex have more detail on additional options to set for each parser. The other available options are all described in the config file and below.
Launch honeytail
by hand with honeytail -c /etc/honeytail/honeytail.conf
or using the standard sudo initctl start honeytail
(upstart) or sudo systemctl start honeytail
(systemd) commands.
honeytail
will automatically start back up after rebooting your system. To disable this, put the word manual
in /etc/init/honeytail.override
(upstart) or run systemctl disable honeytail
(systemd).
Start up a honeytail
process using upstart
or systemd
or by launching the process by hand. This will tail the log file specified in the config and leave the process running as a daemon.
$ sudo initctl start honeytail
$ sudo systemctl start honeytail
$ honeytail -c /etc/honeytail/honeytail.conf
Note: In order to start successfully sending data, you will need to
update the config file included with these packages to specify the parser (e.g.,
nginx
), its associated options (such as where the log files are
found) and the API key.
is your API key for the team
you have currently selected,
.
Note: We enforce a rate limit in order to protect our servers from abuse. This can be raised on a case-by-case basis; please contact us to lift your limit.
Honeytail can handle the following formats:
If you have events in older log files you’d like to load into Honeycomb, use honeytail
with the --backfill
option.
Note: honeytail does not unzip log files, so you’ll need to do this before backfilling.
Here’s an example honeytail
invocation to pull in multiple existing logs and as much as the current log as possible.
honeytail \
-c /etc/honeytail/honeytail.conf \
--file=/var/log/app/myapp.log.* \
--file=/var/log/app/myapp.log \
--backfill
Let’s break down the various parts of this command.
--parser=json
: For the purposes of this example, all logs are already JSON formatted. Take a look at the timestamp section of the JSON connector to make sure your historical logs have their times interpreted correctly.--file=/var/log/app/myapp.log.*
: Honeycomb understands file globs and will ingest all of the files in series.--file=/var/log/app/myapp.log
: Specify the --file
(or its short form, -f
) as many times as necessary to include additional files that don’t match a glob. Ingest as much of the current file as exists.--backfill
: This flag tells honeytail
to read the specified files in their entirety, stop when finished reading, and to respond to rate limited responses (HTTP 429) by slowing down the rate at which it sends events.Honeytail will read all the content in all the old logs and then stop. When it finishes, you’re ready to send new log lines. By default, honeytail
will keep track of its progress through a file, and if interrupted, will pick back up where it left off. By launching honeytail pointing at the main app log, it will find the state file it created while reading in the backlog and start up where it left off.
Here’s the second honeytail
invocation, where it will tail the current log file and send in recent entries:
honeytail \
--writekey=YOUR_API_KEY \
--parser=json \
--dataset='My App' \
--file=/var/log/app/myapp.log
Below, find some general debugging tips when trying to send data to Honeycomb. As always, we’re happy to help with any additional problems you might have.
“Datasets” are created when we first begin receiving data under a new “Dataset Name” (used/specified by all of our SDKs and agents).
If you don’t see an expected dataset yet, our servers mostly likely haven’t yet received anything from you.
To figure out why, the simplest step is to add a --debug
flag to your honeytail
call.
This should output information about whether lines are being parsed, failing to send to our servers,
or—whether honeytail
is receiving any input at all.
Another useful thing to try may be to add --status_interval=1
to your flags, which will output a line like the below, each second (newlines added for legibility):
INFO[0002] Summary of sent events avg_duration=295.783µs
count_per_status=map[400:10]
errors=map[]
fastest=259.689µs
response_bodies=map[request body is too large:10]
slowest=348.297µs
total=10
The total
here is the number of events sent to Honeycomb; the rest are stats characterizing how those events were sent and received.
(A total=0
value would clue us into the fact that honeytail
just isn’t sending any events at all.)
In the line above, we see that events were, in fact, invalid and being rejected by the server.
When using honeytail, the --dataset
(-d
for short) argument will determine the name of the dataset created on Honeycomb’s servers.
If you’re writing into an existing dataset, the quickest way to check for new data is to run a COUNT
query over the last 30 minutes:
If you don’t see your new events appear, try the --debug
or --status_interval=1
(change 1
to 5
to see the summary every 5 seconds).
honeytail
doesn’t seem to be progressing on my log file 🔗 Are you trying to send data from an existing file? honeytail
’s default behavior is to watch files and process newly-appended data.
If you’re attempting to send data from an existing file, make sure to use the --backfill
flag—this will make sure honeytail
begins reading the file from the beginning and exits when finished.
Our JSON parser makes a best-effort attempt to parse and understand timestamps in your JSON logs. Take a look at the Timestamp parsing section of the JSON docs to see timestamp formats understood by default.
If you suspect your timestamp format is unconventional, or the time field is keyed by an unconventional field name,
providing --json.timefield
and --json.format
arguments will nudge honeytail
in the right direction.
Let’s say you have an incredible volume of log content and your website gets hit frequently enough that you will still get excellent data quality even if you’re only looking at 1/20th the traffic. Honeytail can sample the log file and for each 20 lines, only send one of them. It does so randomly, so you won’t see every 20th line being sent - instead each line will have a 5% chance of being sent.
When these log lines reach Honeycomb, they will include metadata indicating that each one represents 20 similar lines, so all your graphs will show accurate total counts.
honeytail \
--writekey=YOUR_API_KEY \
--dataset='Webtier' \
--parser=nginx \
--file=/var/log/nginx/access.log \
--samplerate 20 \
--nginx.conf /etc/nginx/nginx.conf \
--nginx.format main
Adjusting the sample rate based on the content of your events can allow you to keep important infrequent events while discarding less important higher volume traffic. Honeytail has a dynamic sampler that will vary the sample rate based on the contents of the fields of your choice - more frequent occurrences of the content of the field will be sampled more heavily.
For example, suppose that successful web traffic (HTTP status codes in the 200 range) is much more frequent than errored traffic (status codes in the 500s) - you might want to discard more of the successful traffic and keep more of the errored traffic. Applying the dynamic sampler to the status
field in your nginx traffic will have this effect. The actual sample rate applied will vary based on the cardinality of the chosen field and the frequency of each value, but it will be in the ballpark of the samplerate
specified.
honeytail \
--writekey=YOUR_API_KEY \
--dataset='Webtier' \
--parser=nginx \
--file=/var/log/nginx/access.log \
--samplerate 20 \
--nginx.conf /etc/nginx/nginx.conf \
--nginx.format main \
--dynsampling status
You can specify the dynsampling
flag multiple times and it will sample traffic based on the frequency and uniqueness of concatenating all the values of the fields you specify.
In addition to static and dynamic sampling support, Honeytail also has support
for sampling data deterministically based on the value of a field. This is
useful for making sampling decisions based on properties like a request ID or
trace ID. Approximately 1/N events will be sampled (where N is the sample rate),
and any events which have the same value for the field passed to the
--deterministic_sampling
flag will be sampled consistently. This flag must be
used with --samplerate
to specify the sampling rate.
honeytail \
--writekey=YOUR_API_KEY \
--dataset='Deterministically Sampled Nginx Logs' \
--parser=nginx \
--file=/var/log/nginx/access.log \
--deterministic_sampling request_id \
--samplerate 2
For instance, in the above example, about half of the requests would be sampled
and if another Honeytail instance were running elsewhere with the same settings
for --samplerate
and --deterministic_sampling
, it would sample the same
subset of requests.
It’s not unusual for a log to omit interesting information like the name of the machine on which the process is running. After all, you’re on that machine, right? Why would you add the hostname? Log transports like rsyslog will prepend logs with the hostname sending them, but if you’re sending logs from each host, this data may not exist. Honeytail lets you add in extra fields to each event sent up to Honeycomb with the --add_field
flag.
For this example, let’s assume that you have ngnix running as a web server in both your production and staging environments. Your shell sets $ENV
with the environment (prod or staging). Here is how to run honeytail to consume your nginx log and insert the hostname and environment along with each log line:
honeytail \
--writekey=YOUR_API_KEY \
--dataset='Webtier' \
--parser=nginx \
--file=/var/log/nginx/access.log \
--nginx.conf /etc/nginx/nginx.conf \
--nginx.format main \
--add_field hostname=$(hostname) \
--add_field env=$ENV
When it comes time to add additional fields based on the content of your log file, it is time to invoke the data augmentation flag, --da_map_file
. As an example, your log file might contain the IP address of the host connecting to this service, but you would really like to include the hostname in your events. Or your log file contains user ID and you would like to add user name and group. If you can build a map of source values to new fields, then you can use the --da_map_file
flag to augment your data.
As our example, let’s add hostname and AWS availability zone to a log file that contains a IP addresses. The IP address is stored in a field called ip_addr
in the events we’re processing.
The first step is to build a JSON file containing the name of the source column (ip_addr
), and a map of values to new fields (eg 10.0.0.6
should add a field hostname
with the value app21
and aws_az
of c
, 10.0.0.7
has different fields, etc.).
{
"ip_addr": {
"10.0.0.6": { "hostname": "app21", "aws_az": "c" },
"10.0.0.7": { "hostname": "app32", "aws_az": "b" }
}
}
When the log parser comes across an event that has "ip_addr":"10.0.0.6"
, it will add the two additional fields "hostname":"app21","aws_az":"c"
. Note that additional source column names may be specified (each with their own translation map) by extending the content of the JSON map file.
The recommended method to deploy this is to generate a map of all the values to the new fields that should be added and distribute it to all the hosts that will be running honeytail. Example use cases:
Sometimes you will have fields in your log file that you don’t want to send to Honeycomb or that you want to obscure before letting them leave your servers. For this example, let’s say that you have in your log a large text field with the contents of an email. It is large enough that you don’t want it sent up to Honeycomb. Also in this log you have a some sensitive information like a person’s birthday. You want to be able to ask questions about the most common birthdays, but you don’t want to expose the actual birthdays outside your infrastructure.
Honeytail has two flags that will help you accomplish these goals. --drop_field
will remove a field before sending the event to Honeycomb and --scrub_field
will subject the value of a field to a SHA256 hash before sending it along. You will still be able to do inclusion and frequency analysis on the hashed fields (as there will be a 1-1 mapping of value to hashed value) but the actual value will be obscured.
Here is your honeytail invocation:
honeytail \
--writekey=YOUR_API_KEY \
--dataset='My App' \
--parser=json \
--file=/var/log/app/myapp.log \
--drop_field email_content \
--scrub_field birthday
honeytail
config 🔗 The honeytail
binary supports reading its config from a config file as well as command line arguments.
To get started, if you’ve already been using a few command line arguments, add an additional flag: --write_current_config
.
This will write your current config to STDOUT
so you can use it as a starting point.
$ honeytail \
-p mysql \
-k YOUR_API_KEY \
-d YOUR_DATASET \
-f ./mysql-slow.log \
--write_current_config
[Required Options]
; Parser module to use. Use --list to list available options.
ParserName = mysql
; Team API key
WriteKey = YOUR_API_KEY
; Log file(s) to parse. Use '-' for STDIN, use this flag multiple times to tail multiple files, or use a glob (/path/to/foo-*.log)
LogFiles = ./mysql-slow.log
; Name of the dataset
Dataset = YOUR_DATASET
This can be particularly useful for versioning or productionizing honeytail
use—or for providing additional configuration when using advanced honeytail
features like scrubbing sensitive fields or parsing custom URL structures.
Once the config file is saved, simply run honeytail
with a -c
argument in lieu of all of the other flags:
$ honeytail \
-p mysql \
-k YOUR_API_KEY \
-d YOUR_DATASET \
-f ./mysql-slow.log \
--scrub_field=field_name_1 --scrub_field=field_name_2 \
--write_current_config > ./scrubbed_mysql.conf
$ honeytail -c ./scrubbed_mysql.conf
honeytail
can break URLs up into their component parts, storing extra information in additional columns. This behavior is turned on by default for the request
field on nginx
datasets, but can become more useful with a little bit of guidance from you.
There are several flags that adjust the behavior of honeytail
as it breaks apart URLs.
When using the nginx
parser, honeytail looks for a field named request
. When using a different parser (such as the JSON parser), you should specify the name of the field that contains the URL with the --request_shape
flag.
Using this flag creates a few generated fields. Given a request
field containing a value like:
GET /alpha/beta/gamma?foo=1&bar=2 HTTP/1.1
… will produce nginx events for Honeycomb that look like:
field name | value | description |
---|---|---|
request | GET /alpha/beta/gamma?foo=1&bar=2 HTTP/1.1 | the full original request |
request_method | GET | the HTTP method, if it exists |
request_protocol_version | HTTP/1.1 | the HTTP version string |
request_uri | /alpha/beta/gamma?foo=1&bar=2 | the unmodified URL (not including the method or version) |
request_path | /alpha/beta/gamma | just the path portion of the URL |
request_query | foo=1&bar=2 | just the query string portion of the URL |
request_shape | /alpha/beta/gamma?foo=?&bar=? | a normalized version of the URL |
request_pathshape | /alpha/beta/gamma | a normalized version of the path portion of the URL |
request_queryshape | foo=?&bar=? | a normalized version of the query portion of the URL |
(The generated fields will all be prefixed by the field name specified by --request_shape
— in the above example request
. Use the --shape_prefix
field to prepend an additional string to these generated fields.)
If the URL field contains just the URL, the request_method
and request_protocol_version
fields will be omitted.
The path portion of the URL (from the beginning /
up to the ?
that separates the path from the query) can be grouped by common patterns, as is common for REST interfaces.
For example, given a URL fragments like:
/books/978-0812536362
/books/978-9995788940
We can break the fragments into a field containing the generic endpoint (/books/:isbn
) and a separate field for the ISBN number itself by specifying a --request_pattern
flag:
honeytail ... \ # other arguments
--parser=nginx \
--request_pattern=/books/:isbn
This will produce, among other fields:
request_path | request_shape | request_path_isbn | (other fields) |
---|---|---|---|
/books/978-0812536362 | /books/:isbn | 978-0812536362 | … |
/books/978-9995788940 | /books/:isbn | 978-9995788940 | … |
You can specify multiple --request_pattern
flags and they’ll be considered in order. The first one to match a URL will be used. Patterns should represent the entire path portion of the URL - include a “*” at the end to match arbitrary additional segments.
For example, if we have a wider variety of URL fragments, like:
/books/978-0812536362
/books/978-3161484100/borrow
/books/978-9995788940
/books/978-9995788940/borrow
We can provide our additional --request_pattern
flags and track a wider variety of request_shape
s:
honeytail ... \ # other arguments
--parser=nginx \
--request_pattern=/books/:isbn/borrow --request_pattern=/books/:isbn
We’ll see our request_path_isbn
populated as before, as the :isbn
parameter is respected in both patterns:
request_path | request_shape | request_path_isbn | (other fields) |
---|---|---|---|
/books/978-0812536362 | /books/:isbn | 978-0812536362 | … |
/books/978-3161484100/borrow | /books/:isbn/borrow | 978-3161484100 | … |
/books/978-9995788940 | /books/:isbn | 978-9995788940 | … |
/books/978-9995788940/borrow | /books/:isbn/borrow | 978-9995788940 | … |
A URL’s query string can be broken apart similarly, with the --request_query_keys
flag, with generated fields named like <field>_query_<keyname>
.
If, on top of our previous examples, our URL fragments had query strings like:
/books/978-0812536362?borrower_id=23597
Providing --request_query_keys=borrower_id
would return us a Honeycomb event with a request_query_borrower_id
field with a value of 23597
.
If you would like to automatically create a field for every key in the query string, you can use the flag --request_parse_query=all
. This will automatically create a new field <field>_query_<key>
for every query parameter encountered in the query string. For any publicly accessible web server, it is likely that this will quickly create many useless columns because of all the random traffic on the internet.
For more detail and examples see our urlshaper package on GitHub.