As we build out the Rill Data team, we often encounter folks who are new to Apache Druid and looking for ways to get up to speed quickly. For this reason we maintain this “Guide to Apache Druid.” It’s meant to be a balanced list of articles, customer stories, and architectural diagrams that best helped us get up to speed answering questions like:
- Who uses Apache Druid?
- What are they using it for?
- How does it fit in with the other pieces of the Modern Data Stack?
This is a living document so when particularly relevant pieces come up we’ll update this page.
The Apache Druid site
It’s always best to hear directly from users exactly how they’re using Druid inside their company. While the community-maintained list of companies above is fairly exhaustive, these stories below are some of our favorites (in no particular order):
- Salesforce, 2020: Delivering High-Quality Insights Interactively Using Apache Druid at Salesforce
- Netflix, 2020: How Netflix uses Druid for Real-time Insights to Ensure a High-Quality Experience
- Walmart, 2017: Event Stream Analytics at Walmart with Druid
- eBay, 2019: Monitoring at eBay with Druid
- AirBnB, 2018: How Druid enables analytics at Airbnb – Airbnb Engineering & Data Science – Medium
- Lyft, 2018: Data modeling tradeoffs with Druid
- Lyft, 2018: Streaming SQL and Druid at Lyft
- Snap, 2018: Druid at Snap Meetup presentation from Charles Allen, formerly of Metamarkets
- Pinterest, 2020: Powering Pinterest ad analytics with Apache Druid
- Pinterest, 2019: Druid at Pinterest
- Naver, 2018: Web analytics at scale
What is the internal architecture of Apache Druid and how is it different from other OLAP databases?
- Apache Druid (part 1): A Scalable Timeseries OLAP Database System
- An introduction to Druid, your Interactive Analytics at (big) Scale (This is one of the best reviews)
- Comparison of the Open Source OLAP Systems for Big Data: ClickHouse, Druid and Pinot
- Druid | Introducing Druid: Real-Time Analytics at a Billion Rows Per Second
- Druid: A Real-Time Analytical Data Store
- The challenges of running Druid at large scale, and future directions, part 1
- The challenges of running Druid at large scale, and future directions, part 2
- The anatomy of a Druid segment file – Engineers @ Optimizely – Medium
Reference Architecture Diagrams
OLAP databases like Apache Druid are enjoying a resurgence in popularity amidst increasingly complex data operations. But as the “Modern Data Stack” matures, it’s always fascinating to see the variety of ways companies arrange the pieces of their data stack. The videos and articles below all include an architectural diagram—redrawn here to facilitate comparison, and including links to the original piece.
Did we miss something great?
We're always on the lookout for smart write-ups. Let us know if you found something that we overlooked.