An expression performs functions, mathematical and/or logical operations on other primitives and field’s values to return a result; similar to expressions in a spreadsheet. A function’s name is all-capitalized. A function accepts one or more arguments enclosed in parenthesis. Functions support column names, primitive types, and other functions as arguments.
$
.
Column names that start with numbers or contain spaces requires surrounding the column name with double quotes.
Column Name | Example Syntax |
---|---|
durationMs | $durationMs |
Context Key Length | $"Context Key Length" |
1st token | $"1st token" |
"foo"
) that supports intepretation.
Within the quotes, any character may appear except newline ("\n"
) and unescaped double quote ("\\"
) which require the use of the backslash character.`foo`
).
Within the quotes, any character may appear except a back quote.
This is useful for expression of text that use the backslash character.
For example, file paths and regular expressions.true
and false false
.null
.The following functions can be used in derived column expressions. Functions operate within the context of a single event, meaning that each function takes column values from an event, and produces a new “derived” column attached to that event.
IF
The IF
statement takes two or more arguments.
Every pair of arguments is evaluated as a condition; the final argument is the default: IF( condition, then-val [, condition2, then-val2]... [, else-val])
.
condition
evaluates to true, evaluates to then-val
.condition2
is specified, then if it is true, evaluates to then-val2
.else-val
is not specified, evaluates to null.else-val
.All non-zero numbers, as well as the string true
, evaluate to true
.
# Usage: `IF( condition, then-val [, condition2, then-val2]... [, else-val])`
# Examples
IF(CONTAINS($team_name, "acme"), "important-customer", "everyone else")
IF(GTE($http_status, 400), 1, 0)
IF(GTE($duration_ms, 500), "slow")
# The multiple-value form can be used as a type of case statement,
# but SWITCH is more efficient if the same expression is being tested for strict equality.
IF(EQUALS($path,"/login"),"login",
CONTAINS($path,"/browse/"),"browsing",
"other"
)
Note that the multi-argument IF
is equivalent to nested IF
statements:
# nested IF
IF(condition, value,
IF(condition2, value2,
IF(condition3, value3, default)
)
)
# equivalent with multicondition IF
IF(condition, value,
condition2, value2,
condition3, value3, default)
SWITCH
The SWITCH
statement takes three or more arguments.
The first argument is the value to test the cases against.
Every pair of arguments is evaluated as a case and the value to return.
If there is an unpaired final argument, it will be returned as the default if none of the cases match.
If no default value exists, then null will be returned.
SWITCH( expression, case1, val1[, case2, val2]... [, default-val])
.
expression
equals case1
, evaluates to val1
.case2
is specified and equals expression
, evaluates to val2
.default-val
is not specified, evaluates to null.default-val
.The rules for EQUALS
apply for testing cases.
# Usage: `SWITCH( expression, case1, val1[, case2, val2]... [, default-val])`
# Examples
SWITCH(REG_VALUE($service.name, "^([a-z]+)-shard-"),
"alpha", "blue-owls",
"beta", "violet-octopi",
"delta", "violet-octopi",
"gamma", "orange",
"platform"
)
COALESCE
Evaluates to the first non-empty argument. This is useful for similar fields where fallback values are preferable to null.
# Usage: COALESCE(arg1, arg2, ...)
# Examples
COALESCE($full_name, $email)
COALESCE($container_id, $hostname, $availability_zone, "unknown")
COALESCE($service.name, $service_name, "unknown")
# This can be used to approximate a case statement. For some cases, this is easier to write than the multi-factor IF statement.
COALESCE(
IF(GTE($duration_ms, 200), "slow"),
IF(GTE($status, 500), $error_message),
IF(CONTAINS($team, "acme"), "high priority"),
"normal"
)
LT
If both arguments are numbers, returns true if the first provided value is less than the second. If both arguments are strings, returns true if the first provided value falls lexicographically before the second. Always false if either argument is empty, or if the types of the columns do not match.
# Usage: LT(left, right)
# Examples
LT($roundtrip_us, 500)
LT($mysql_read_ms, $mysql_write_ms)
LTE
If both arguments are numbers, returns true if the first provided value is less than or equal to the second. If both arguments are strings, returns true if the first provided value is the same as, or falls lexicographically before, the second. Always false if either argument is empty, or if the types of the columns do not match.
# Usage: LTE(left, right)
# Examples
LTE($roundtrip_ms, 0.5)
LTE($get_schema_ms, $persist_schema_ms)
GT
If both arguments are numbers, returns true if the first provided value is greater than the second. If both arguments are strings, returns true if the first provided value falls lexicographically after the second. Always false if either argument is empty, or if the types of the columns do not match.
# Usage: GT(left, right)
# Examples
GT($payload_size_kb, 300)
GT($num_invalid_columns, $num_valid_columns)
GTE
If both arguments are numbers, returns true if the first provided value is greater than or equal to the second. If both arguments are strings, returns true if the first provided value is the same as, or falls lexicographically after, the second. Always false if either argument is empty, or if the types of the columns do not match.
# Usage: GTE(left, right)
# Examples
GTE($payload_size_mb, 0.3)
GTE($memory_inuse, MUL($max_memory_process, 0.75))
EQUALS
True if the two provided arguments are equal.
(Arguments of different types, such as the integer 200
and the string "200"
, are not considered equal.)
# Usage: EQUALS(arg1, arg2)
# Examples
EQUALS($remote_addr, "216.3.123.12")
EQUALS($gzipped, true)
EQUALS($oversize_num_columns, 0)
IN
True if the first provided argument is equal to any of the subsequent arguments.
IN
can be thought of as a more compact form of a series of OR
equality checks.
# Usage: IN(arg1, compare1, ...)
# Examples
IN($method, "DELETE", "POST", "PUT")
IN($build_id, "9051", "9052")
IN($num_invalid_payloads, 0, 1, -1)
EXISTS
True when the supplied argument has a defined value, false where it does not.
# Usage: EXISTS(arg1)
# Examples
EXISTS($batch_size)
EXISTS($team_name)
EXISTS($json_serialization_ms)
NOT
Evaluates the provided argument to a boolean value, then inverts that value.
# Usage: NOT(arg1)
# Examples
NOT(EXISTS($batch))
NOT(IN($build_id, "175", "176"))
NOT(IN($company, "acme", "globex", "soylent"))
AND
Takes a variable number of arguments and returns true if all arguments are truthy.
# Usage: AND(arg1, arg2, ...)
# Examples
AND(GTE($roundtrip_ms, 100), EQUALS($method, "POST"))
AND(NOT(IN($method, "GET", "DELETE")), EXISTS($batch))
AND(EQUALS($api_version, "v3"), OR(LT($request_ms, 30), GT($request_ms, 300)))
OR
Takes a variable number of arguments and returns true if any arguments are truthy.
# Usage: OR(arg1, arg2, ...)
# Examples
OR(EQUALS($company, "acme"), GTE($priority, 5))
OR(GTE($mysql_latency_ms, 20), AND(GTE($s3_latency_ms, 100), EQUALS($method, "GET")))
MIN
Evaluates to the smallest argument of the same type as the first non-empty argument. “Smallest” means the smaller, if numeric, or lexicographically first, if a string.
# Usage: MIN(arg1, arg2, ...)
# Examples
MIN($memory_inuse_local, $memory_inuse_merge, $memory_inuse_fetch)
MAX
Evaluates to the largest argument of the same type as the first non-empty argument. “Largest” means the larger, if numeric, or lexicographically last, if string.
# Usage: MAX(arg1, arg2, ...)
# Examples
MAX($mysql_latency_ms, $redis_latency_ms)
MAX(1, DIV($total_volume, $count))
SUM
Evaluates to the sum of all numeric arguments. Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero.
# Usage: SUM(arg1, arg2, ...)
# Examples
SUM($serialize_ms, $scan_ms, $publish_ms)
SUM(1.0, 5, "2.3")
SUB
Evaluates to the first argument subtracted by the second, or arg1 - arg2
.
Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero.
# Usage: SUB(arg1, arg2)
# Examples
SUB($serialization_ms, 100)
SUB($total_ms, SUM($local_ms, $merge_ms, $serialize_ms))
MUL
Evaluates to the product of all numeric arguments. Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero.
# Usage: MUL(arg1, arg2, ...)
# Examples
MUL($count, $time_per_item)
MUL(100, DIV($json_decode_ms, $total_ms))
DIV
Divides the first argument by the second, or arg1 / arg2
.
Strings are parsed into numbers if possible, unparseable strings or other values evaluate to zero.
A division by zero evaluates to null.
# Usage: DIV(arg1, arg2)
# Examples
DIV($io_bytes_read, 1024)
DIV($total_ms, $rows_examined)
DIV($json_parse_time_ms, $total_request_time_ms)
MOD
Computes the remainder of arg1 / arg2
.
Strings are parsed into floats if possible, unparseable strings or other values evaluate to zero.
Evaluates to null if arg2
is zero.
# Usage: MOD(arg1, arg2)
# Examples
MOD(15, 12) # 3
MOD(5.5, 4) # 1.5
MOD(6, 5.5) # 0.5
LOG10
Computes the base-10 logarithm of the argument. Strings are parsed into numbers if possible. Unparseable strings, and arguments less than or equal to 0, evaluate to null.
# Usage: LOG10(arg1)
# Examples
LOG10($duration_ms)
BUCKET
Computes discrete (categorical) bins, or buckets, to transform continuous fields into categorical ones.
This can be useful to group data into groups.
The syntax is BUCKET( $column, size, [min, [max]])
.
The function returns size
-sized buckets from min
to max
.
For example, BUCKET( $column, 10, 0, 30)
will return groups named < 0
, 0 - 10
, 10 - 20
, 20 - 30
, and > 30
.
If only the min
is specified, then the function will run without an upper bound; if neither a min
nor max
is specified, then the function will run without either bound, starting from a bucket 0 - size
.
In all versions, points on the boundary between two bins will fall into the lower bin. The size, min, and max may not be columns. If the column is not a float value, the function returns null.
# Usage: BUCKET( $column, size, [min, [max]])
# Examples
BUCKET($duration_ms, 500, 0, 3000) # size, min and max
BUCKET($current_budget, 10, -5) # min only
BUCKET($num_users, 10) # size only
INT
Casts the argument to an integer, truncating the value if necessary. The argument is first coerced to a float if possible. Non-numeric values return 0.
# Usage: INT(arg1)
# Examples
INT($price_dollars)
INT(DIV($seconds, 3600))
FLOAT
Casts the argument to a float. Non-numeric values return 0.0.
# Usage: FLOAT(arg1)
# Examples
FLOAT($price_dollars) # For example, 300.5
FLOAT("3.1415926535") # 3.1415926535
BOOL
Casts the argument to a bool.
Evaluates to true
if the argument is truthy:
source type | value |
BOOL($value) |
---|---|---|
int | 0 |
false |
int | (anything else) | true |
float | 0.0 |
false |
float | (anything else) | true |
string | "true" |
true |
string | (anything else) | false |
bool | true |
true |
bool | false |
false |
nil |
false |
# Usage: BOOL(arg1)
# Examples
BOOL($price_dollars) # For example, true
BOOL("") # false
BOOL(true) # true
STRING
Casts the argument to a string.
Empty arguments are converted to ""
.
# Usage: STRING(arg1)
# Examples
STRING($price_dollars) # "300.5", for example
STRING(true) # "true"
STRING($empty_column) # ""
CONCAT
Concatenates string representations of all arguments into a single string result. Non-string arguments are converted to strings, empty arguments are ignored.
# Usage: CONCAT(arg1, arg2, ...)
# Examples
CONCAT($api_version, $sdk)
IF($is_batch, CONCAT($url, "-batch"), $url)
STARTS_WITH
True if the first argument starts with the second argument. False if either argument is not a string.
# Usage: STARTS_WITH(string, prefix)
# Examples
STARTS_WITH($url, "https")
STARTS_WITH($user_agent, "ELB-")
CONTAINS
True if the first argument contains the second argument. False if either argument is not a string.
# Usage: CONTAINS(string, substr)`
# Examples
CONTAINS($email, "@honeycomb.io")
CONTAINS($header_accept_encoding, "gzip")
IF(CONTAINS($url, "/v1/"), "api_v1", "api_v2")
REG_MATCH
True if the first argument matches the second argument, which must be a defined regular expression.
False if the first argument is not a string or is empty.
The provided regex
must be a string literal containing a valid regular expression.
\s
, \d
or \w
, enclose the regular expression in `backticks`
so that it is treated as a raw string literal.# Usage: REG_MATCH(string, regex)
# Examples
REG_MATCH($error_msg, `^[a-z]+\[[0-9]+\]$`)
REG_MATCH($referrer, `[\w-_]+\.(s3\.)?amazonaws.com`)
REG_VALUE
Evaluates to the first regex submatch found in the first argument.
Evaluates to an empty value if the first argument contains no matches or is not a string.
The provided regex
must be a string literal containing a valid regular expression.
\s
, \d
or \w
, enclose the regular expression in `backticks`
so that it is treated as a raw string literal.# Usage: REG_VALUE(string, regex)
# Examples
REG_VALUE($user_agent, `Chrome/[\d.]+`)
REG_VALUE($source, `^(ui-\d+|log|app-\d+)`)
The first example above yields a string like Chrome/1.2.3
and the second could be any one of ui-123
, log
, or app-456
.
REG_VALUE
is most effective when combined with other functions.
As an example, the honeytail
agent sets its User-Agent
header to a string like libhoney-go/1.3.0 honeytail/1.378 (nginx)
, but there are also User-Agent
s like "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36...
.
In order to extract only the name of the parser used and not get caught up with other things in parentheses (such as the Macintosh...
bit), we use this as a derived column:
IF(CONTAINS($user_agent, "honeytail"), REG_VALUE($user_agent, `\([a-z]+\)`), null)`
This results in fields that contain (nginx)
, (mysql)
, and so on.
Combining CONTAINS
or REG_MATCH
with REG_VALUE
is a way to limit the total number of strings available to the match and more effectively grab only the values you are expecting.
REG_COUNT
Returns the number of non-overlapping successive matches yielded by the provided regex.
Returns 0 if the first argument contains no matches or is not a string.
The provided regex
must be a string literal containing a valid regular expression.
\s
, \d
or \w
, enclose the regular expression in `backticks `
so that it is treated as a raw string literal.# Usage: REG_COUNT(string, regex)
# Examples
REG_COUNT($sql, `JOIN`)
REG_COUNT($ip, `19.`)
LENGTH
Returns the length of a string in either bytes, or user-perceived characters.
The second argument must be either “bytes” or “chars”.
Returns 0
if the first argument is not a string, or if the first argument is not valid utf8 when second argument is “chars”.
# Usage: LENGTH(string[, "bytes" | "chars"])
# Examples
LENGTH($hostname, "bytes") # returns the number of bytes that make up the string.
LENGTH($hostname, "chars") # returns the number of user-perceived characters that make up the string.
To show the difference between the two units, take the single character 🏳️🌈 (unicode rainbow flag) in the example below:
LENGTH("🏳️🌈", "bytes") # == 14
LENGTH("🏳️🌈", "chars") # == 1
UNIX_TIMESTAMP
Converts a date string in RFC3339 format (for example, 2017-07-20T11:22:44.888Z
) to a Unix timestamp (1500549764.888
).
This is useful for comparing two timestamps in an event; for example, to calculate a duration from a start and an end timestamp.
# Usage: UNIX_TIMESTAMP(string)
# Examples
UNIX_TIMESTAMP($timestamp)
EVENT_TIMESTAMP
Returns the Unix timestamp of the current event as a float (1500549764.888
, for example).
This is useful for comparing two timestamps in an event; for example, to calculate a duration from a start and an end timestamp.
This function takes no arguments.
# Usage: EVENT_TIMESTAMP()
# Examples
EVENT_TIMESTAMP()
INGEST_TIMESTAMP
Returns the Unix timestamp indicating when Honeycomb’s servers received the current event, as a float ( 1500549764.888
, for example).
This is useful for debugging event latency by comparing the event timestamp to the ingestion time.
This function takes no arguments.
# Usage: INGEST_TIMESTAMP()
# Examples
# Event latency, in seconds. May be negative for future-dated events.
SUB(INGEST_TIMESTAMP(),EVENT_TIMESTAMP())
# Event latency, accounting for span duration.
SUB(INGEST_TIMESTAMP(),SUM(EVENT_TIMESTAMP(),DIV($duration_ms,1000)))
FORMAT_TIME
Formats a Unix timestamp as a string. The first argument is a format specifier string compatible with POSIX strftime, and the second argument is the numeric timestamp. Does not support not-UTC timezones or locale-modified specifiers. Also note this formatting is more expensive than other derived column functions and may slow down queries, especially when using a complex format.
# Usage: FORMAT_TIME(format, timestamp)
# Examples
FORMAT_TIME("%A", 1626810584) # Tuesday
FORMAT_TIME("%FT%TZ", 1626810584) # 2021-07-20T19:49:44Z