We use cookies or similar technologies to personalize your online experience and tailor marketing to you. Many of our product features require cookies to function properly. Your use of this site and online product constitutes your consent to these personalization technologies. Read our Privacy Policy to find out more.

X

Getting unstructured Logs into Honeycomb with custom regexes

Installation

Download and install the latest honeytail by running:

wget -q https://honeycomb.io/download/honeytail/linux/honeytail_1.762_amd64.deb && \
      echo 'd7bed8a005cbc6a34b232c54f0f84b945f0bb90905c67f85cceaedee9bbbad1e  honeytail_1.762_amd64.deb' | sha256sum -c && \
      sudo dpkg -i honeytail_1.762_amd64.deb

The packages install honeytail, its config file /etc/honeytail/honeytail.conf, and some start scripts. The binary is just honeytail, available if you need it in an unpackaged form or for ad-hoc use.

You should modify the config file and uncomment and set:

Launch the agent

Start up a honeytail process using upstart or systemd or by launching the process by hand. This will tail the log file specified in the config and leave the process running as a daemon.

$ sudo initctl start honeytail

Backfilling archived logs

To backfill existing data, run honeytail with --backfill the first time:

honeytail -c /etc/honeytail/honeytail.conf \
  --file /var/log/myapp/log12.log \
  --backfill

This command can also be used at any point to backfill from older, rotated log files. You can read more about our backfill behavior here.

Note: (If you’ve chosen to backfill from old logs, don’t forget to transition into the default streaming behavior to stream live logs to Honeycomb!)

Regexes

We use golang’s regexp package, which uses RE2 syntax.

Specifying regexes

Command line: use the --regex.line_regex flag to tell honeytail how to extract data from a log line.

You must provide at least one regex. You may optionally specify multiple regexes. Lines will be parsed by the first regex to find a match. Precedence is based on the order you pass in line_regex, so specify your regexes from most-specific to least-specific.

On the command line, you’ll need to wrap the regex in quotes.

honeytail \
    --writekey YOUR_API_KEY \
    --file PATH/FILE.LOG \
    --parser regex \
    --dataset "MY_TEST_SET" \
    --backfill \
    --regex.line_regex="\[(?P<time>\d{2}:\d{2}:\d{2})\] (?P<message>\w+)" \
    --regex.line_regex="(?P<field1>\w+) (?P<field2>\w+)"

You are currently logged in to the team, so we have populated the write key here to the first write key for that team.

Equivalent config file specification. Note that you should not wrap the regex in quotes here.

[Regex Parser Options]
; a regular expression with named capture groups representing the fields you want parsed
LineRegex = \[(?P<time>\d{2}:\d{2}:\d{2})\] (?P<message>\w+)
LineRegex = (?P<field1>\w+) (?P<field2>\w+)

Regex syntax

Regexes must contain at least one named capture group. Use the (?P<name>re) syntax for named groups. Example:

Log file

[2017/11/07 22:59:46] 200 ...
[2017/11/07 22:59:48] 500 ...
[2017/11/07 23:01:02] 404 ...

with

--regex.line_regex="\[(?P<time>\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2})\] (?P<status>\d+)"

will yield rows like this:

{
	time: "2017/11/07 22:59:46",
	status: "200"
}

Nested regex grouping

Nested groups are supported. For example,

--regex.line_regex="(?P<outer>[^ ]* (?P<inner1>[^ ]*) (?P<inner2>[^ ]*))"

will parse a log line “A B C” into { outer: "A B C", inner1: "B", inner2: "C" }.

Timestamp parsing

Honeycomb expects all events to contain a timestamp field; if one is not provided, the server will associate the current time of ingest with the given payload.

Use the --regex.timefield and --regex.time_format flags to help honeytail understand where and how to extract the event’s timestamp.

For example, given a log file like the following:

[08/Oct/2015:00:26:26 +0000] 200 174 0.099

A command to consume those log lines (retaining the "local_time" field as the event’s timestamp would look like:

honeytail \
    --parser=regex \
    --writekey=YOUR_API_KEY \
    --file=server.log  \
    --dataset='MY_DATASET' \
    --backfill \
    --regex.line_regex=SOME_REGEX \
    --regex.timefield="local_time" \
    --regex.time_format="%d/%b/%Y:%H:%M:%S %z"

You are currently logged in to the team, so we have populated the write key here to the first write key for that team.

The --regex.timefield="local_time" argument tells honeytail to consider the "local_time" value to be the canonical timestamp for the events in the specified file.

The --regex.time_format argument specifies the timestamp format to be used while parsing. (It understands common strftime formats.)