New to running queries in Honeycomb? Check out the introduction to building queries!
Derived columns allow you to run queries based on the value of an expression that’s derived from the columns in an event, without having to modify the structure of the event itself.
For example, let’s say you have events from a web server. You want to compare latency for small requests and large requests, where a request is “small” if the value of its
content_length field is less than 1000. Adding
content_length to the “Group By” clause of a query would create a separate time series for every single value of
content_length. That’s not ideal.
Instead, you can use a derived column. Derived columns are similar to functions in a spreadsheet. In this case, we’ll want to group by the expression
which evaluates to
true if the value of the
content_length column is less than 1000, and
You can add a derived column by clicking “Create derived column” in the dropdown of the Group By clause:
That will open a modal dialog where you can craft your expression:
and see sample values computed from recent data. After you click “create column”, the query builder will update to include your new derived column:
You can now run the updated query to get the breakdown you wanted:
After you create a derived column, it’ll remain available for reuse in later queries. You can view the list of derived columns, add new ones, and delete old derived columns by going to your dataset’s schema page (from the query page, click “Overview”, then “Schema”).
Derived columns are composed of functions with some number of arguments enclosed in parenthesis. All function names are all-capital, case-sensitive:
Arguments can be any of the following:
$column_name, which will be populated with values from the given column.
"foo"). Escaped special characters such as “\n” will be interpreted.
`bar`). This is useful for regular expressions which include sequences such as
null, which is equivalent to an empty (missing) value.
A list of all functions is at the Derived Column Function Reference
Many derived column functions are sensitive to column types. In confirming that a derived column function is working the way you expect, check the types of the inputs and outputs. It can be helpful to coerce column types to the expected type:
GTE(INT($duration_ms), 50) will definitely do the expected behavior;
GTE($duration_ms, 50) fails if
$duration_ms is a string column.
The derived column creator accepts constant input; it can be useful to get to know functions by entering sample values: