We use cookies or similar technologies to personalize your online experience & tailor marketing to you. Many of our product features require cookies to function properly.

Read our privacy policy I accept cookies from this site

Dataset best practices

Like a well-organized closet, a little dataset planning can make a big difference later for day-to-day use. The recommendations below are things we have found that make it easier.

Use Datasets to group data  🔗

Datasets are used to partition your data into separate and queryable sets. In general, all events in the same Dataset should be considered equivalent either in their frequency and scope, or in the system layer in which they occur. You should separate events into different Datasets when you cannot establish equivalency between them (e.g. data gathered from a dev environment vs prod).

You may, for example, find it useful to capture API and batch-processing events in the same Dataset if they share some request_id field. By contrast, events from two different environments with only one differentiator (like the value of some “environment” column) might appear highly similar and, as a result, be more easily confused. Relying on consistent application of some “environment” filter is risky and can create misleading results.

Here is another example from one of our customers. They’ve put API and web requests in the same Dataset, because—for them—an API request is really one type of web request that has more fields. Our customer adds the extra API fields (even though the web requests don’t have them) because Honeycomb supports sparse data and provides filters that enable our customer to look at web or API requests, and so on. Our customer does not want to filter out web requests, however, when looking at something like overall traffic.

For this same company, SQL queries reside in a different Dataset because SQL queries are not in any way equivalent to API data: There can be multiple (or no) SQL queries for a single API query, for instance.

Set the default granularity for a Dataset  🔗

Some Datasets, such as those that contain metric data, are periodic: data is captured at a regular, known interval, or granularity. For these Datasets, it’s helpful to ensure that all queries default to using that granularity or higher, which avoids spiky or confusing graphs.

The Default Granularity setting allows you to specify the expected interval for a periodic dataset. Queries in this Dataset won’t drop below the default granularity. You can still override the default on any individual queries, if needed.

To modify the Default Granularity setting:

  1. Log in to Honeycomb.
  2. Navigate to the Datasets tab.
  3. Select Settings on the right side of your dataset’s row.
  4. Under Overview > Default Granularity, use the dropdown to select the minimum interval for this dataset.