Explore with BubbleUp | Honeycomb

We use cookies or similar technologies to personalize your online experience & tailor marketing to you. Many of our product features require cookies to function properly.

Read our privacy policy I accept cookies from this site

Explore with BubbleUp

About BubbleUp  🔗

BubbleUp is intended to help explain how some data points are different from the other points returned by a query. The goal is to try to explain how a subset of data differs from other data. This feature surfaces potential places to look for signal within your data.

For example, consider the graph below, which shows the statistical distribution of durationMs of an application’s requests over the selected time period.

In this set of points, for example, the analyst might want to distinguish the strange group of events that have a surprisingly-high durationMs:

Use of BubbleUp

In this screenshot:

  • A shows that the user has selected the area ranging from about 14:10 to 14:40, and on the y-axis from about 700 to 1200.
  • The selected events are very different from the rest of the events on several dimensions (B). They are very different with regards to the value of user_id, endpoint_shape, and name; but in the other fields, like platform and build_id, they have fairly similar proportions.
  • The events in the selection are very different on two of the measures (C). They are different on mysql_dur. They are also very different on durationMs (as we might expect, as that was the initial selection).

This feature can help your analysis, because it helps figure out which fields are the most likely next starting points. In this case, it seems clear that one particular endpoint had a transient period of slowing down the requests.

Using BubbleUp  🔗

Currently, BubbleUp mode is only supported for heatmaps.

Learn how to make a heatmap in “Using Heatmaps.” Learn about interpreting heatmaps from “Heatmaps make Ops Better.”

To access BubbleUp mode:

BubbleUp mode works based on a selection you make within a heatmap. Create a Heatmap, and then click on BubbleUp.

Click within the heatmap to select one corner, and drag to cover the opposite corner. Ensure your selection covers some or all of the points that you want to investigate.

The selected area is called the selection, and is highlighted in orange colors; the remaining area of the shown heatmap is the baseline, and is shown in blue colors. BubbleUp separates events from the Baseline and events from the Selection, showing them as distinct groups.

The BubbleUp charts are displayed below the heatmap.

Interpreting BubbleUp Charts  🔗

A BubbleUp is based on a selection of points queried from the dataset. It shows every (non-empty) column in the dataset. For each column, it shows a histogram of values within the baseline in blue, and those from the selection in orange. The histogram shows the distribution of different values for the dataset. The height of each bar is proportional to the number of times the value occurs in the results of the query.

A BubbleUp shows a series of miniature histograms, one for each column in the dataset. The columns are divided into two groups, for categorical dimensions and continuous measures.

Dimensions  🔗

A dimension is a column that can be used to group, separate, or filter data items. In BubbleUp, categorical and ordinal data are visualized together. Categorical columns are those in which the values do not fall in a meaningful order. Examples of categorical columns include user_id, hostname, or is_responding.

Screenshot A low-cardinality, categorical dimension. In BubbleUp, categorical dimensions are shown captioned with the relevant value. The field platform has five distinct values; in both the baseline and selection, there are more “android” and “ios” values than “js” and “rest”. The donut charts in the top right show that there are most events in the selection have a platform, hostname, or endpoint; only a few events in the baseline do.

Screenshot A high cardinality, categorical dimension. When there are many columns, only the top fifty are shown, including some from each of baseline and selection. In hostname, both sets are truncated.

Screenshot In endpoint, the one bar of the selection stands out as a visible outlier. It can be interpreted to mean “there is only one value for endpoint within the selection.”

Screenshot An ordinal dimension is one that has a meaningful order. In status_code, the values are numeric, and so are arranged in ascending order. The value 200 occurs frequently in both baseline and selection. Code 500 occurs less frequently in the selection — but almost never occurs in the baseline.

Very different heights of bars in the baseline and selection can be indications that this column is unusual. For example, it could be valuable to learn how status_code differs, or what happens with the one specific endpoint.

Measures  🔗

Continuous, numerical dimensions are those where individual values are not as important. Instead, the distribution is important. In the screenshot below, the baseline and outliers are very different for durationMs and mysql_dur; they seem very similar for fraud_dur. This can help validate hypotheses — for example, the fact that mysql_dur is as different as roundtrip_dur might suggest that roundtrip time is being driven by mysql time. The donut charts in the top right show that all rows in both the baseline and the selection have a durationMs field; most events in the selection have a mysql_dur, and just under half of the events in the selection have a fraud_dur. Screenshot

Tooltips  🔗

Screenshot

A tooltip is displayed when you hover your mouse over a pair of histogram bars, displaying the field value they represent. Hovering over the top bar reveals additional information about the number of events with that field. Click on the “actions” menu below the tooltip to create a new query that filters or breaks down by the field. In this case, the user_id with value 20109 is in 67% of the selection, and just 2% of the baseline.

Screenshot

Hovering over the top bar displays a tooltip that shows the complete title of the field. The field availability_zone appears in 61% of all events, and just 10% of the baseline. In other words, most baseline events do not have an availability_zone.

Tips and Tricks for Using BubbleUp  🔗

  • After you perform a categorical drilldown, the next step is often to add a filter, or where clause to examine the field where the values between the selection and baseline look most different. For example, in the mysql_replset above, it might be valuable to filter on db_shard_1. Other times, it can be valuable to pursue a breakdown to understand better how fields vary from each other.
  • BubbleUp is best at differentiating data that is distinctly separated from the main body. The more separate the data is, the easier it is to distinguish. For example, it was very easy to pick out the blue points in area A in the figure above. In contrast, it is far more difficult to tell how a dataset is different when the selection is inside the body, as in the following screenshot:

Screenshot

  • If you can’t find differentiating fields in your BubbleUp, zoom your graph into a narrower time range. Bigger time ranges means more possible variance and causes in the baseline, sometimes a more isolated time range will help reduce unrelated variation in your data that accumulate over time.

Troubleshooting  🔗

  • BubbleUp ignores the Break Down, Order, and Limit fields.
  • Sometimes, fields like user_id are stored as numbers, and so are shown as continuous measures. The easiest way to fix this is to go to the Dataset Settings page and adjust the data type to “string.”
  • Some derived columns may display as numbers but are strings underneath. This will manifest in a drilldown by not showing any results. (For example, a derived column that tries to match a number, “REG_VALUE('$field', '*([0-9]+).*')' will not show correctly. To fix this, coerce the regular expression to a float with the function FLOAT().
  • You might see (no value) or (empty) as values in a field. A value of (no value) means that this key was not populated in the event payload, which is possible because Honeycomb events are schema-less. A value of (empty) is likely to occur in string typed columns, representing an empty string.