We use cookies or similar technologies to personalize your online experience and tailor marketing to you. Many of our product features require cookies to function properly. Your use of this site and online product constitutes your consent to these personalization technologies. Read our Privacy Policy to find out more.

X

Explore with BubbleUp

About BubbleUp

BubbleUp is intended to help explain how some data points are different from the other points returned by a query. The goal is to try to explain how a subset of data differs from other data. This feature surfaces potential places to look for signal within your data.

For example, consider the graph below, which shows the statistical distribution of roundtrip_dur of an application’s requests over the selected time period.

In this set of points, for example, the analyst might want to distinguish the strange group of events that have a surprisingly-high roundtrip_dur:

Screenshot illustrating use of BubbleUp

In this screenshot:

This can help your analysis, because it helps figure out which fields are the most likely next starting points. In this case, it seems clear that one particular endpoint had a transient period of slowing down the requests.

Using BubbleUp

Currently, BubbleUp mode is only supported for heatmaps.

Learn how to make a heatmap in “Using Heatmaps.” Learn about interpreting heatmaps from “Heatmaps make Ops Better.”

To access BubbleUp mode:

BubbleUp mode works based on a selection you make within a heatmap. Create a Heatmap, and then click on BubbleUp.

Click within the heatmap to select one corner, and drag to cover the opposite corner. Ensure your selection covers some or all of the points that you want to investigate.

The selected area is called the selection, and is highlighted in orange colors; the entire area of the shown heatmap is the baseline, and is shown in blue colors.

The BubbleUp charts are displayed below the heatmap.

Interpreting BubbleUp Charts

A BubbleUp is based on a selection of points queried from the dataset. It shows every (non-empty) column in the dataset. For each column, it shows a histogram of values within the baseline in blue, and those from the selection in orange. The histogram shows the distribution of different values for the dataset. The height of each bar is proportional to the number of times the value occurs in the results of the query.

A BubbleUp shows a series of miniature histograms, one for each column in the dataset. The columns are divided into two groups, for categorical dimensions and continuous measures.

Dimensions

A dimension is a column that can be used to group, separate, or filter data items. In BubbleUp, categorical and ordinal data are visualized together. Categorical columns are those in which the values do not fall in a meaningful order. Examples of categorical columns include user_id, hostname, or is_responding.

Screenshot A low-cardinality, categorical dimension. In BubbleUp, categorical dimensions are shown captioned with the relevant value. The field platform has five distinct values; in both the baseline and selection, there are more “android” and “ios” values than “js” and “rest”.

Screenshot A high cardinality, categorical dimension. When there are many columns, only the top fifty are shown, including some from each of baseline and selection. In hostname, the baseline set is truncated.

Screenshot In endpoint, the one bar of the selection stands out as a visible outlier. It can be interpreted to mean “there is only one value for endpoint within the selection.”

Screenshot An ordinal dimension is one that has a meaningful order. In status_code, the values are numeric, and so are arranged in ascending order. The value 200 occurs frequently in both baseline and selection. Code 500 occurs less frequently in the selection — but almost never occurs in the baseline. Conversely, code 400 is rare in the baseline, but never appears at all in the selection.

Very different heights of bars in the baseline and selection can be indications that this column is unusual. For example, it could be valuable to learn how status_code differs, or what happens with the one specific endpoint.

Screenshot A tooltip is displayed when you hover your mouse over a pair of histogram bars, displaying the field value they represent.

Measures

Continuous, numerical dimensions are those where individual values are not as important. Instead, the distribution is important. In the screenshot below, the baseline and outliers are very different for mysql_dur and roundtrip_dur; they seem very similar for fraud_dur. This can help validate hypotheses — for example, the fact that mysql_dur is as different as roundtrip_dur might suggest that roundtrip time is being driven by mysql time. Screenshot

Tips and Tricks for Using BubbleUp

  • After you perform a categorical drilldown, the next step is often to add a filter to examine the field where the values between the selection and baseline look most different. For example, in the mysql_replset above, it might be valuable to filter on db_shard_1. Other times, it can be valuable to pursue a breakdown to understand better how fields vary from each other.
  • BubbleUp is best at differentiating data that is distinctly separated from the main body. The more separate the data is, the easier it is to distinguish. For example, it was very easy to pick out the blue points in area A in the figure above. In contrast, it is far more difficult to tell how a dataset is different when the selection is inside the body, as in the following screenshot:

Screenshot

  • If you can’t find differentiating fields in your BubbleUp, zoom your graph into a narrower time range. Bigger time ranges means more possible variance and causes in the baseline, sometimes a more isolated time range will help reduce unrelated variation in your data that accumulate over time.

Troubleshooting

  • BubbleUp ignores the Break Down, Order, and Limit fields.
  • Sometimes, fields like user_id are stored as numbers, and so are shown as continuous measures. The easiest way to fix this is to go to the Dataset Settings page and adjust the data type to “string.”
  • Some derived columns may display as numbers but are strings underneath. This will manifest in a drilldown by not showing any results. (For example, a derived column that tries to match a number, “REG_VALUE('$field', '*([0-9]+).*')’ will not show correctly. To fix this, coerce the regular expression to a float with the function FLOAT().
  • You might see (no value) or (empty) as values in a field. A value of (no value) means that this key was not populated in the event payload, which is possible because Honeycomb events are schema-less. A value of (empty) is likely to occur in string typed columns, representing an empty string.