Derived Column Reference | Honeycomb

Derived Column Reference

Syntax 

An expression performs functions, mathematical and/or logical operations on other primitives and field’s values to return a result; similar to expressions in a spreadsheet. A function’s name is all-capitalized. A function accepts one or more arguments enclosed in parenthesis. Functions support column names, primitive types, and other functions as arguments.

Column
Reference columns by their name prefixed with a dollar sign $. Column names that start with numbers or contain spaces requires surrounding the column name with double quotes.
Column Name Example Syntax
durationMs $durationMs
Context Key Length $"Context Key Length"
1st token $"1st token"
String Literal
Text enclosed in double quotes ("foo") that supports intepretation. Within the quotes, any character may appear except newline ("\n") and unescaped double quote ("\\") which require the use of the backslash character.
Raw String Literal
Text enclosed in back quotes (`foo`). Within the quotes, any character may appear except a back quote. This is useful for expression of text that use the backslash character. For example, file paths and regular expressions.
Integer
A positive or negative whole number
Floating Point
A positive or negative number with a decimal point
Boolean
A truthy value represented with true and false false.
Null
An empty, missing value represented with null.

Function Reference 

The following functions can be used in derived column expressions. Functions operate within the context of a single event, meaning that each function takes column values from an event, and produces a new “derived” column attached to that event.

List of Operators 

Conditional Operators 

IF 

The IF statement takes two or more arguments. Every pair of arguments is evaluated as a condition; the final argument is the default: IF( condition, then-val [, condition2, then-val2]... [, else-val]).

  • If condition evaluates to true, evaluates to then-val.
  • If condition2 is specified, then if it is true, evaluates to then-val2.
  • If else-val is not specified, evaluates to null.
  • Otherwise evaluates to else-val.

All non-zero numbers, as well as the string true, evaluate to true.

# Usage: `IF( condition, then-val [, condition2, then-val2]... [, else-val])`
# Examples
IF(CONTAINS($team_name, "acme"), "important-customer", "everyone else")
IF(GTE($http_status, 400), 1, 0)
IF(GTE($duration_ms, 500), "slow")

# The multiple-value form can be used as a type of case statement,
# but SWITCH is more efficient if the same expression is being tested for strict equality.
IF(EQUALS($path,"/login"),"login",
   CONTAINS($path,"/browse/"),"browsing",
   "other"
  )

Note that the multi-argument IF is equivalent to nested IF statements:

# nested IF
IF(condition, value,
  IF(condition2, value2,
    IF(condition3, value3, default)
  )
)

# equivalent with multicondition IF
IF(condition, value,
   condition2, value2,
   condition3, value3, default)

SWITCH 

The SWITCH statement takes three or more arguments. The first argument is the value to test the cases against. Every pair of arguments is evaluated as a case and the value to return. If there is an unpaired final argument, it will be returned as the default if none of the cases match. If no default value exists, then null will be returned. SWITCH( expression, case1, val1[, case2, val2]... [, default-val]).

  • If expression equals case1, evaluates to val1.
  • If case2 is specified and equals expression, evaluates to val2.
  • If default-val is not specified, evaluates to null.
  • Otherwise evaluates to default-val.

The rules for EQUALS apply for testing cases.

# Usage: `SWITCH( expression, case1, val1[, case2, val2]... [, default-val])`
# Examples
SWITCH(REG_VALUE($service.name, "^([a-z]+)-shard-"),
  "alpha", "blue-owls",
  "beta", "violet-octopi",
  "delta", "violet-octopi",
  "gamma", "orange",
  "platform"
)

COALESCE 

Evaluates to the first non-empty argument. This is useful for similar fields where fallback values are preferable to null.

# Usage: COALESCE(arg1, arg2, ...)
# Examples
COALESCE($full_name, $email)
COALESCE($container_id, $hostname, $availability_zone, "unknown")
COALESCE($service.name, $service_name, "unknown")

# This can be used to approximate a case statement. For some cases, this is easier to write than the multi-factor IF statement.
COALESCE(
  IF(GTE($duration_ms, 200), "slow"),
  IF(GTE($status, 500), $error_message),
  IF(CONTAINS($team, "acme"), "high priority"),
  "normal"
)

Comparison Operators 

LT 

If both arguments are numbers, returns true if the first provided value is less than the second. If both arguments are strings, returns true if the first provided value falls lexicographically before the second. Always false if either argument is empty, or if the types of the columns do not match.

# Usage: LT(left, right)
# Examples
LT($roundtrip_us, 500)
LT($mysql_read_ms, $mysql_write_ms)

LTE 

If both arguments are numbers, returns true if the first provided value is less than or equal to the second. If both arguments are strings, returns true if the first provided value is the same as, or falls lexicographically before, the second. Always false if either argument is empty, or if the types of the columns do not match.

# Usage: LTE(left, right)
# Examples
LTE($roundtrip_ms, 0.5)
LTE($get_schema_ms, $persist_schema_ms)

GT 

If both arguments are numbers, returns true if the first provided value is greater than the second. If both arguments are strings, returns true if the first provided value falls lexicographically after the second. Always false if either argument is empty, or if the types of the columns do not match.

# Usage: GT(left, right)
# Examples
GT($payload_size_kb, 300)
GT($num_invalid_columns, $num_valid_columns)

GTE 

If both arguments are numbers, returns true if the first provided value is greater than or equal to the second. If both arguments are strings, returns true if the first provided value is the same as, or falls lexicographically after, the second. Always false if either argument is empty, or if the types of the columns do not match.

# Usage: GTE(left, right)
# Examples
GTE($payload_size_mb, 0.3)
GTE($memory_inuse, MUL($max_memory_process, 0.75))

EQUALS 

True if the two provided arguments are equal. (Arguments of different types, such as the integer 200 and the string "200", are not considered equal.)

# Usage: EQUALS(arg1, arg2)
# Examples
EQUALS($remote_addr, "216.3.123.12")
EQUALS($gzipped, true)
EQUALS($oversize_num_columns, 0)

IN 

True if the first provided argument is equal to any of the subsequent arguments. IN can be thought of as a more compact form of a series of OR equality checks.

# Usage: IN(arg1, compare1, ...)
# Examples
IN($method, "DELETE", "POST", "PUT")
IN($build_id, "9051", "9052")
IN($num_invalid_payloads, 0, 1, -1)

EXISTS 

True when the supplied argument has a defined value, false where it does not.

# Usage: EXISTS(arg1)
# Examples
EXISTS($batch_size)
EXISTS($team_name)
EXISTS($json_serialization_ms)

Boolean Operators 

NOT 

Evaluates the provided argument to a boolean value, then inverts that value.

# Usage: NOT(arg1)
# Examples
NOT(EXISTS($batch))
NOT(IN($build_id, "175", "176"))
NOT(IN($company, "acme", "globex", "soylent"))

AND 

Takes a variable number of arguments and returns true if all arguments are truthy.

# Usage: AND(arg1, arg2, ...)
# Examples
AND(GTE($roundtrip_ms, 100), EQUALS($method, "POST"))
AND(NOT(IN($method, "GET", "DELETE")), EXISTS($batch))
AND(EQUALS($api_version, "v3"), OR(LT($request_ms, 30), GT($request_ms, 300)))

OR 

Takes a variable number of arguments and returns true if any arguments are truthy.

# Usage: OR(arg1, arg2, ...)
# Examples
OR(EQUALS($company, "acme"), GTE($priority, 5))
OR(GTE($mysql_latency_ms, 20), AND(GTE($s3_latency_ms, 100), EQUALS($method, "GET")))

Math Operators 

MIN 

Evaluates to the smallest argument of the same type as the first non-empty argument. “Smallest” means the smaller, if numeric, or lexicographically first, if a string.

# Usage: MIN(arg1, arg2, ...)
# Examples
MIN($memory_inuse_local, $memory_inuse_merge, $memory_inuse_fetch)

MAX 

Evaluates to the largest argument of the same type as the first non-empty argument. “Largest” means the larger, if numeric, or lexicographically last, if string.

# Usage: MAX(arg1, arg2, ...)
# Examples
MAX($mysql_latency_ms, $redis_latency_ms)
MAX(1, DIV($total_volume, $count))

SUM 

Evaluates to the sum of all numeric arguments. Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero.

# Usage: SUM(arg1, arg2, ...)
# Examples
SUM($serialize_ms, $scan_ms, $publish_ms)
SUM(1.0, 5, "2.3")

SUB 

Evaluates to the first argument subtracted by the second, or arg1 - arg2. Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero.

# Usage: SUB(arg1, arg2)
# Examples
SUB($serialization_ms, 100)
SUB($total_ms, SUM($local_ms, $merge_ms, $serialize_ms))

MUL 

Evaluates to the product of all numeric arguments. Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero.

# Usage: MUL(arg1, arg2, ...)
# Examples
MUL($count, $time_per_item)
MUL(100, DIV($json_decode_ms, $total_ms))

DIV 

Divides the first argument by the second, or arg1 / arg2. Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero. A division by zero evaluates to null.

# Usage: DIV(arg1, arg2)
# Examples
DIV($io_bytes_read, 1024)
DIV($total_ms, $rows_examined)
DIV($json_parse_time_ms, $total_request_time_ms)

MOD 

Computes the remainder of arg1 / arg2. Strings are parsed into floats if possible, unparseable strings or other values evaluate to zero. Evaluates to null if arg2 is zero.

# Usage: MOD(arg1, arg2)
# Examples
MOD(15, 12)  # 3
MOD(5.5, 4)  # 1.5
MOD(6, 5.5)  # 0.5

LOG10 

Computes the base-10 logarithm of the argument. Strings are parsed into numbers if possible. Unparseable strings, and arguments less than or equal to 0, evaluate to null.

# Usage: LOG10(arg1)
# Examples
LOG10($duration_ms)

BUCKET 

Computes discrete (categorical) bins, or buckets, to transform continuous fields into categorical ones. This can be useful to group data into groups. The syntax is BUCKET( $column, size, [min, [max]]). The function returns size-sized buckets from min to max. For example, BUCKET( $column, 10, 0, 30) will return groups named < 0, 0 - 10, 10 - 20, 20 - 30, and > 30.

If only the min is specified, then the function will run without an upper bound; if neither a min nor max is specified, then the function will run without either bound, starting from a bucket 0 - size.

In all versions, points on the boundary between two bins will fall into the lower bin. The size, min, and max may not be columns. If the column is not a float value, the function returns null.

# Usage: BUCKET( $column, size, [min, [max]])
# Examples
BUCKET($duration_ms, 500, 0, 3000) # size, min and max
BUCKET($current_budget, 10, -5) # min only
BUCKET($num_users, 10) # size only

Cast Operators 

INT 

Casts the argument to an integer, truncating the value if necessary. The argument is first coerced to a float if possible. Non-numeric values return 0.

# Usage: INT(arg1)
# Examples
INT($price_dollars)
INT(DIV($seconds, 3600))

FLOAT 

Casts the argument to a float. Non-numeric values return 0.0.

# Usage: FLOAT(arg1)
# Examples
FLOAT($price_dollars) # For example, 300.5
FLOAT("3.1415926535") # 3.1415926535

BOOL 

Casts the argument to a bool. Evaluates to true if the argument is truthy:

source type value BOOL($value)
int 0 false
int (anything else) true
float 0.0 false
float (anything else) true
string "true" true
string (anything else) false
bool true true
bool false false
nil false
# Usage: BOOL(arg1)
# Examples
BOOL($price_dollars) # For example, true
BOOL("")             # false
BOOL(true)           # true

STRING 

Casts the argument to a string. Empty arguments are converted to "".

# Usage: STRING(arg1)
# Examples
STRING($price_dollars) # "300.5", for example
STRING(true)           # "true"
STRING($empty_column)  # ""

String Operators 

CONCAT 

Concatenates string representations of all arguments into a single string result. Non-string arguments are converted to strings, empty arguments are ignored.

# Usage: CONCAT(arg1, arg2, ...)
# Examples
CONCAT($api_version, $sdk)
IF($is_batch, CONCAT($url, "-batch"), $url)

STARTS_WITH 

True if the first argument starts with the second argument. False if either argument is not a string.

# Usage: STARTS_WITH(string, prefix)
# Examples
STARTS_WITH($url, "https")
STARTS_WITH($user_agent, "ELB-")

CONTAINS 

True if the first argument contains the second argument. False if either argument is not a string.

# Usage: CONTAINS(string, substr)`
# Examples
CONTAINS($email, "@honeycomb.io")
CONTAINS($header_accept_encoding, "gzip")
IF(CONTAINS($url, "/v1/"), "api_v1", "api_v2")

REG_MATCH 

True if the first argument matches the second argument, which must be a defined regular expression. False if the first argument is not a string or is empty. The provided regex must be a string literal containing a valid regular expression.

Note
Golang regex syntax can be tested here. If your regular expression contains character classes such as \s, \d or \w, enclose the regular expression in `backticks` so that it is treated as a raw string literal.
# Usage: REG_MATCH(string, regex)
# Examples
REG_MATCH($error_msg, `^[a-z]+\[[0-9]+\]$`)
REG_MATCH($referrer, `[\w-_]+\.(s3\.)?amazonaws.com`)

REG_VALUE 

Evaluates to the first regex submatch found in the first argument. Evaluates to an empty value if the first argument contains no matches or is not a string. The provided regex must be a string literal containing a valid regular expression.

Note
Golang regex syntax can be tested here. If your regular expression contains character classes such as \s, \d or \w, enclose the regular expression in `backticks` so that it is treated as a raw string literal.
# Usage: REG_VALUE(string, regex)
# Examples
REG_VALUE($user_agent, `Chrome/[\d.]+`)
REG_VALUE($source, `^(ui-\d+|log|app-\d+)`)

The first example above yields a string like Chrome/1.2.3 and the second could be any one of ui-123, log, or app-456.

REG_VALUE is most effective when combined with other functions. As an example, the honeytail agent sets its User-Agent header to a string like libhoney-go/1.3.0 honeytail/1.378 (nginx), but there are also User-Agents like "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36.... In order to extract only the name of the parser used and not get caught up with other things in parentheses (such as the Macintosh... bit), we use this as a derived column:

IF(CONTAINS($user_agent, "honeytail"), REG_VALUE($user_agent, `\([a-z]+\)`), null)`

This results in fields that contain (nginx), (mysql), and so on. Combining CONTAINS or REG_MATCH with REG_VALUE is a way to limit the total number of strings available to the match and more effectively grab only the values you are expecting.

REG_COUNT 

Returns the number of non-overlapping successive matches yielded by the provided regex. Returns 0 if the first argument contains no matches or is not a string. The provided regex must be a string literal containing a valid regular expression.

Note
Golang regex syntax can be tested here. If your regular expression contains character classes such as \s, \d or \w, enclose the regular expression in `backticks `so that it is treated as a raw string literal.
# Usage: REG_COUNT(string, regex)
# Examples
REG_COUNT($sql, `JOIN`)
REG_COUNT($ip, `19.`)

LENGTH 

Returns the length of a string in either bytes, or user-perceived characters. The second argument must be either “bytes” or “chars”. Returns 0 if the first argument is not a string, or if the first argument is not valid utf8 when second argument is “chars”.

# Usage: LENGTH(string[, "bytes" | "chars"])
# Examples
LENGTH($hostname, "bytes")      # returns the number of bytes that make up the string.
LENGTH($hostname, "chars")      # returns the number of user-perceived characters that make up the string.
Note
“User-perceived characters” are also known as “grapheme clusters” and represent a basic unit of a writing system for a language.

To show the difference between the two units, take the single character 🏳️‍🌈 (unicode rainbow flag) in the example below:

LENGTH("🏳️‍🌈", "bytes")      # == 14
LENGTH("🏳️‍🌈", "chars")      # == 1

Time Functions 

UNIX_TIMESTAMP 

Converts a date string in RFC3339 format (for example, 2017-07-20T11:22:44.888Z) to a Unix timestamp (1500549764.888). This is useful for comparing two timestamps in an event; for example, to calculate a duration from a start and an end timestamp.

# Usage: UNIX_TIMESTAMP(string)
# Examples
UNIX_TIMESTAMP($timestamp)

EVENT_TIMESTAMP 

Returns the Unix timestamp of the current event as a float (1500549764.888, for example). This is useful for comparing two timestamps in an event; for example, to calculate a duration from a start and an end timestamp. This function takes no arguments.

# Usage: EVENT_TIMESTAMP()
# Examples
EVENT_TIMESTAMP()

INGEST_TIMESTAMP 

Returns the Unix timestamp indicating when Honeycomb’s servers received the current event, as a float ( 1500549764.888, for example). This is useful for debugging event latency by comparing the event timestamp to the ingestion time. This function takes no arguments.

# Usage: INGEST_TIMESTAMP()
# Examples
# Event latency, in seconds. May be negative for future-dated events.
SUB(INGEST_TIMESTAMP(),EVENT_TIMESTAMP())
# Event latency, accounting for span duration.
SUB(INGEST_TIMESTAMP(),SUM(EVENT_TIMESTAMP(),DIV($duration_ms,1000)))

FORMAT_TIME 

Formats a Unix timestamp as a string. The first argument is a format specifier string compatible with POSIX strftime, and the second argument is the numeric timestamp. Does not support not-UTC timezones or locale-modified specifiers. Also note this formatting is more expensive than other derived column functions and may slow down queries, especially when using a complex format.

# Usage: FORMAT_TIME(format, timestamp)
# Examples
FORMAT_TIME("%A", 1626810584)     # Tuesday
FORMAT_TIME("%FT%TZ", 1626810584) # 2021-07-20T19:49:44Z