Apache Druid is an open source data store designed for high performance (sub-second) OLAP queries on large (terabyte) datasets. It is most commonly used for operational analytics use cases, where quick decisions must be made on data that is being streamed in.
Common Druid use cases include clickstream analytics, anomaly detection, network monitoring, customer behavior analysis and digital marketing, but it is applicable in any environment where you need real-time data ingestion, fast aggregation, and low-latency queries. In today’s competitive business landscape, analytics is core to understanding both user and product behavior, and providing fast decision making on real-time data is an increasingly critical component in your data strategy.
Druid’s query performance is typically at least 10X faster than common 2nd-generation cloud warehouses such as Snowflake and BigQuery. But what about the cost? With Druid, the more queries you perform, the better your price performance will be. In other words, if you perform under 1000 queries per month, your cost to achieve speed might be comparable to other data warehouse solutions, but if you perform 10 million queries a month, your cost will be a fraction (less than 1%) of what you would pay on other data warehouse solutions. With Druid, the price performance advantage when doing time series analysis on large streamed data allows you to accomplish infinitely more analysis than you could on the same budget using a non-operational database such as Snowflake, BigQuery, or Redshift.
Druid’s powerful performance comes from an architecture that leverages ideas from data warehouses, time series databases, and search systems. Key characteristics from each of these architectures are brought together to create a highly performant, scaleable and self-healing database that supports high ingestion rates and low latency queries.
By leveraging these key features from data warehouses, time series databases, and search systems, Druid provides a highly cost effective and scalable solution to time series analysis and aggregation of very large scale data.
But nothing comes for free and despite its name, Druid cannot perform magic! With open source Apache Druid, you download the Druid software, create your own data cluster, and then tune it based on your needs. Achieving performance is easy but achieving price performance requires an intimate understanding of performance needs and the ability to scale your cluster up and down based on ingestion and query demand.
Cluster tuning for price performance involves:
Apache Druid is open source, and detailed configuration information is available at the Apache site. You’ll find the above information and much, much more at https://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html.
The obvious question is: do you want to manage your own Druid cluster? Many companies have gone this route and it is feasible with a dedicated DevOps team. If you do have DevOps resources to spare, managing your own Druid cluster may be the right approach, but for most companies, that overhead is a burden.
Alternatively, many companies encapsulate some of the complexity by using a service provider to manage your cluster. Druid service providers will provide professional services to manage your cluster for you, or provide tools that help with the burden of configuring, monitoring, and healing your clusters.
If you want the price performance of Apache Druid without the DevOps or maintenance overhead, Rill is the perfect solution for you. Our fully managed SaaS offering leverages Kubernetes auto-scaling to remove the burden of configuration. Your team simply logs into Rill to access your Druid cluster. All scaling and configuration is performed by Rill and the cluster is dynamically adjusted based on your team’s ingestion and query needs. User and group management and enterprise level security allow you to share your analytics with appropriate team members or customers. Ultimately you and your team can focus on your core business values rather than configuring servers.
Rill supports interoperability, both on the front end in your ETL tools and for your data visualization needs. Ingest data from Kafka or other common data warehouses and visualize your data using your Tableau, Looker, or your favorite data visualization tool.
At Rill our goal is to encapsulate the complexity of Druid behind the curtain of a seamless and secure cloud service. If you are looking to access mission-critical operational analytics capabilities with a fast-time-to-value, give Rill a try. We are happy to give you a hand bringing your data into Rill and Druid and getting you the fast access your business use cases demand.