Derived Column String Operators

String operators manipulate and perform operations on strings.

CONCAT 

Concatenates string representations of all arguments into a single string result. Non-string arguments are converted to strings, empty arguments are ignored.

# Usage: CONCAT(arg1, arg2, ...)
# Examples
CONCAT($api_version, $sdk)
IF($is_batch, CONCAT($url, "-batch"), $url)

TO_LOWER 

Converts an input string to be all lower-case.

# Usage: TO_LOWER(string)`
# Examples
TO_LOWER($service.name)
IF(CONTAINS(TO_LOWER(app.user.name), "bob"), "bob!", "not bob!")

STARTS_WITH 

Returns true if the first argument starts with the second argument. Returns false if either argument is not a string.

# Usage: STARTS_WITH(string, prefix)
# Examples
STARTS_WITH($url, "https")
STARTS_WITH($user_agent, "ELB-")

CONTAINS 

Returns true if the first argument contains the second argument. Returns false if either argument is not a string.

# Usage: CONTAINS(string, substr)`
# Examples
CONTAINS($email, "@honeycomb.io")
CONTAINS($header_accept_encoding, "gzip")
IF(CONTAINS($url, "/v1/"), "api_v1", "api_v2")

REG_MATCH 

Returns true if the first argument matches the second argument, which must be a defined regular expression. Returns false if the first argument is not a string or is empty. The provided regex must be a string literal containing a valid regular expression.

Note
Golang regex syntax can be tested here. If your regular expression contains character classes such as \s, \d or \w, enclose the regular expression in `backticks` so that it is treated as a raw string literal.
# Usage: REG_MATCH(string, regex)
# Examples
REG_MATCH($error_msg, `^[a-z]+\[[0-9]+\]$`)
REG_MATCH($referrer, `[\w-_]+\.(s3\.)?amazonaws.com`)

REG_VALUE 

Evaluates to the first regex submatch found in the first argument. Evaluates to an empty value if the first argument contains no matches or is not a string. The provided regex must be a string literal containing a valid regular expression.

Note
Golang regex syntax can be tested here. If your regular expression contains character classes such as \s, \d or \w, enclose the regular expression in `backticks` so that it is treated as a raw string literal.
# Usage: REG_VALUE(string, regex)
# Examples
REG_VALUE($user_agent, `Chrome/[\d.]+`)
REG_VALUE($source, `^(ui-\d+|log|app-\d+)`)

The first example above yields a string like Chrome/1.2.3 and the second could be any one of ui-123, log, or app-456.

REG_VALUE is most effective when combined with other functions. As an example, the honeytail agent sets its User-Agent header to a string like libhoney-go/1.3.0 honeytail/1.378 (nginx), but there are also User-Agents like "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36.... In order to extract only the name of the parser used and not get caught up with other things in parentheses (such as the Macintosh... bit), we use this as a derived column:

IF(CONTAINS($user_agent, "honeytail"), REG_VALUE($user_agent, `\([a-z]+\)`), null)

This results in fields that contain (nginx), (mysql), and so on. Combining CONTAINS or REG_MATCH with REG_VALUE is a way to limit the total number of strings available to the match and more effectively grab only the values you are expecting.

REG_COUNT 

Returns the number of non-overlapping successive matches yielded by the provided regex. Returns 0 if the first argument contains no matches or is not a string. The provided regex must be a string literal containing a valid regular expression.

Note
Golang regex syntax can be tested here. If your regular expression contains character classes such as \s, \d or \w, enclose the regular expression in `backticks `so that it is treated as a raw string literal.
# Usage: REG_COUNT(string, regex)
# Examples
REG_COUNT($sql, `JOIN`)
REG_COUNT($ip, `19.`)

LENGTH 

Returns the length of a string in either bytes, or user-perceived characters. The second argument must be either “bytes” or “chars”. Returns 0 if the first argument is not a string, or if the first argument is not valid utf8 when second argument is “chars”.

# Usage: LENGTH(string[, "bytes" | "chars"])
# Examples
LENGTH($hostname, "bytes")      # returns the number of bytes that make up the string.
LENGTH($hostname, "chars")      # returns the number of user-perceived characters that make up the string.
Note
“User-perceived characters” are also known as “grapheme clusters” and represent a basic unit of a writing system for a language.

To show the difference between the two units, refer to the single character 🏳️‍🌈 (unicode rainbow flag) in the example below:

LENGTH("🏳️‍🌈", "bytes")      # == 14
LENGTH("🏳️‍🌈", "chars")      # == 1