We use cookies or similar technologies to personalize your online experience and tailor marketing to you. Many of our product features require cookies to function properly. Your use of this site and online product constitutes your consent to these personalization technologies. Read our Privacy Policy to find out more.

X

A Guide to Exploring RubyGems.org Traffic

RubyGems.org is the Ruby community’s gem hosting service. Gem developers can publish their gems for anyone to install, and Ruby developers can browse gem pages to learn more about dependencies and revision histories. Their open-source site is fronted by Fastly, whose CDN logs are easy to send straight to Honeycomb.

Being able to sift through CDN traffic for a site like RubyGems.org exposes a surfeit of interesting tidbits about the gems that the Ruby community is downloading most, which gems have the largest number of actively-downloaded versions, and how the Fastly cache status impacts download times.

Launch the Dataset

What are some things we can learn from this dataset?

Below, find a few examples of interesting tidbits we discovered by exploring their Fastly data.

Note: All of these questions / explorations link directly to a graph attempting to answer that question. That graph is a permalink to a previously-run (and permanently preserved) execution of that question. To run it again, simply hit “Run Query” to get recent data.

A few particularly interesting fields

But don’t stop there! You can find a full description of each field in the right-hand sidebar under the Details tab.

Now, go explore on your own!

We’ve described some fun starter queries for you to begin exploring RubyGems.org’s traffic.

A couple of Honeycomb-specific notes: if you’re struggling with the query builder, you can find some helpful documentation here. And note that RubyGems.org serves quite a bit of traffic, so we recommend constraining queries to the Fast Query Window while you experiment with queries — this way you can iterate fast while exploring, then expand the time window when you find a query worth running over a longer period of time.

Some technical fine print

Playing with data is all well and good, but where did it all come from?

RubyGems.org is first and foremost open source, and supported by a fantastic crew of folks who were willing to expose their realtime data to the public in the interest of a learning opportunity for the community. Connecting their traffic to Honeycomb was simply a matter of configuring their CDN logs to output a structured format for ingestion by Honeycomb.

To protect client privacy while also preserving uniqueness, the client_ip field populated in the Log Streaming to Honeycomb docs has been replaced with a client_ip_hash, which hashes the client_ip values.

Extracting the gem name + version

On the Honeycomb side, a handful of extra columns were created in order to extract some particularly useful fields out of the standard HTTP fields: bundler_version, bundler_minor_version, downloaded_gem_name, and downloaded_gem_version. The first two operate on the user_agent field while the latter two operate on the served url. (You can see the definition of the derived columns by expanding the Details sidebar and clicking on the field name in question.)

While these fields could certainly be populated in the Fastly config, using derived columns to extract values from other fields on the fly allows for a bit more flexibility in column definition and column evolution. The URL pattern for gem downloads from RubyGems.org is /gems/NAME-VERSION.gem, and we were able to utilize the NAME_PATTERN and Gem::Version::VERSION_PATTERN to confidently match the values we were interested in.