Getting Unstructured Logs into Honeycomb with Custom Regexes | Honeycomb

Getting Unstructured Logs into Honeycomb with Custom Regexes

Installation 

Download and install the latest honeytail by running:

Download the honeytail_1.9.0_amd64.deb package.

wget -q https://honeycomb.io/download/honeytail/v1.9.0/honeytail_1.9.0_amd64.deb

Verify the package.

echo '16bd171d495f73a2dc2c0d2a7eaaa36fc2c57446c9f8e5c5dfe10c1a8442241f  honeytail_1.9.0_amd64.deb' | sha256sum -c

Install the package.

sudo dpkg -i honeytail_1.9.0_amd64.deb

The packages install honeytail, its config file /etc/honeytail/honeytail.conf, and some start scripts. Build honeytail from source if you need it in an unpackaged form or for ad-hoc use.

Download the honeytail_1.9.0_arm64.deb package.

wget -q https://honeycomb.io/download/honeytail/v1.9.0/honeytail_1.9.0_arm64.deb

Verify the package.

echo '70b3ec1a0a748556ef5f1d8decc83d47160ae30f5df55bd3ab666009d4f6dc4b  honeytail_1.9.0_arm64.deb' | sha256sum -c

Install the package.

sudo dpkg -i honeytail_1.9.0_arm64.deb

The packages install honeytail, its config file /etc/honeytail/honeytail.conf, and some start scripts. Build honeytail from source if you need it in an unpackaged form or for ad-hoc use.

Download the honeytail-1.9.0-1.x86_64.rpm package.

wget -q https://honeycomb.io/download/honeytail/v1.9.0/honeytail-1.9.0-1.x86_64.rpm

Verify the package.

echo 'f4bc1a5c87e240d641c090f014793098b2e92e0fc58c0c0d0e592ab9e71f717a  honeytail-1.9.0-1.x86_64.rpm' | sha256sum -c

Install the package.

sudo rpm -i honeytail-1.9.0-1.x86_64.rpm

The packages install honeytail, its config file /etc/honeytail/honeytail.conf, and some start scripts. Build honeytail from source if you need it in an unpackaged form or for ad-hoc use.

Download the 1.9.0 binary.

wget -q -O honeytail https://honeycomb.io/download/honeytail/v1.9.0/honeytail-linux-amd64

Verify the binary.

echo 'bc26b2dc4ae003e18e600c4157121f626f9292201d0b96280c9298df81df1852  honeytail' | shasum -a 256 -c

Set the permissions to allow execution.

chmod 755 ./honeytail

Download the 1.9.0 binary.

wget -q -O honeytail https://honeycomb.io/download/honeytail/v1.9.0/honeytail-linux-arm64

Verify the binary.

echo '36d21063e656b380d6435003a56b21007289ab047b157c5618d2b94b6962a0fd  honeytail' | shasum -a 256 -c

Set the permissions to allow execution.

chmod 755 ./honeytail

Download the 1.9.0 binary.

wget -q -O honeytail https://honeycomb.io/download/honeytail/v1.9.0/honeytail-darwin-amd64

Verify the binary.

echo '04de45c55bb340a4c3e54e45b68b2b95ef8cae3fbc0c70d86db75d13d0aaeb2c  honeytail' | shasum -a 256 -c

Set the permissions to allow execution.

chmod 755 ./honeytail

Clone the Honeytail repository.

git clone https://github.com/honeycombio/honeytail

Install from source.

cd honeytail; go install

You should modify the config file and uncomment and set:

  • WriteKey to your API key, available from the account page
  • LogFiles to the path for the log file you want to ingest, or - for stdin
  • Dataset to the name of the dataset you wish to create with this log file.
  • ParserName to regex
  • LineRegex to a regular expression with named capture groups.

Launch the Agent 

Start up a honeytail process using upstart or systemd or by launching the process by hand. This will tail the log file specified in the config and leave the process running as a daemon.

sudo initctl start honeytail
sudo systemctl start honeytail
honeytail -c /etc/honeytail/honeytail.conf

Backfilling Archived Logs 

To backfill existing data, run honeytail with --backfill the first time:

honeytail -c /etc/honeytail/honeytail.conf \
  --file /var/log/myapp/log12.log \
  --backfill

This command can also be used at any point to backfill from older, rotated log files. You can read more about our backfill behavior here.

Note
If you have chosen to backfill from old logs, do not forget to transition into the default streaming behavior to stream live logs to Honeycomb!

Regexes 

We use golang’s regexp package, which uses RE2 syntax.

Specifying Regexes 

Command line: use the --regex.line_regex flag to tell honeytail how to extract data from a log line.

You must provide at least one regex. You may optionally specify multiple regexes. Lines will be parsed by the first regex to find a match. Precedence is based on the order you pass in line_regex, so specify your regexes from most-specific to least-specific.

On the command line, you will need to wrap the regex in quotes.

honeytail \
    --writekey YOUR_API_KEY \
    --file PATH/FILE.LOG \
    --parser regex \
    --dataset "MY_TEST_SET" \
    --backfill \
    --regex.line_regex="\[(?P<time>\d{2}:\d{2}:\d{2})\] (?P<message>\w+)" \
    --regex.line_regex="(?P<field1>\w+) (?P<field2>\w+)"

Equivalent configuration file specification. Note that you should not wrap the regex in quotes here.

[Regex Parser Options]
; a regular expression with named capture groups representing the fields you want parsed
LineRegex = \[(?P<time>\d{2}:\d{2}:\d{2})\] (?P<message>\w+)
LineRegex = (?P<field1>\w+) (?P<field2>\w+)

Regex Syntax 

Regexes must contain at least one named capture group. Use the (?P<name>re) syntax for named groups. Example:

Log file

[2017/11/07 22:59:46] 200 ...
[2017/11/07 22:59:48] 500 ...
[2017/11/07 23:01:02] 404 ...

with

--regex.line_regex="\[(?P<time>\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2})\] (?P<status>\d+)"

will yield rows like this:

{
  time: "2017/11/07 22:59:46",
  status: "200"
}

Nested Regex Grouping 

Nested groups are supported. For example:

--regex.line_regex="(?P<outer>[^ ]* (?P<inner1>[^ ]*) (?P<inner2>[^ ]*))"

will parse a log line “A B C” into { outer: "A B C", inner1: "B", inner2: "C" }.

Timestamp Parsing 

Honeycomb expects all events to contain a timestamp field; if one is not provided, the server will associate the current time of ingest with the given payload.

Use the --regex.timefield and --regex.time_format flags to help honeytail understand where and how to extract the event’s timestamp.

For example, given a log file like the following:

[08/Oct/2015:00:26:26 +0000] 200 174 0.099

A command to consume those log lines (retaining the "local_time" field as the event’s timestamp would look like:

honeytail \
    --parser=regex \
    --writekey=YOUR_API_KEY \
    --file=server.log  \
    --dataset='MY_DATASET' \
    --backfill \
    --regex.line_regex=SOME_REGEX \
    --regex.timefield="local_time" \
    --regex.time_format="%d/%b/%Y:%H:%M:%S %z"

The --regex.timefield="local_time" argument tells honeytail to consider the "local_time" value to be the canonical timestamp for the events in the specified file.

The --regex.time_format argument specifies the timestamp format to be used while parsing. (It understands common strftime formats.)