Druid Summit 2021
Learn best practices and real-world use cases for Apache Druid with the Rill team—operators of the first and longest continuously running Apache Druid service.
Deep Dive into Druid Metrics
It is critical to set up proper monitoring for a large distributed system. Even a single slow node can slow the entire cluster down. Apache Druid generates hundreds of metrics related to queries, ingestion, and coordination at each node, which need to be analyzed for query performance, identifying bad nodes, finding bottlenecks, and identifying slow queries.
This talk covers commonly used reference architectures and open source tools including Prometheus and dogfooding to a separate Druid cluster. We will also share sample pipelines and dashboards on how you can best store and analyze Druid metrics.
Nishant Bangarwa is Co-founder and Head of Engineering at Rill Data. He is an active open-source contributor and a PMC member of Apache Druid and Apache Superset. He is also a committer in Apache Calcite and Apache Hive.
Neil Buesing is a Principal Solutions Architect at Rill Data, helping customers build out their streaming pipelines. As a Confluent Community Catalyst '21–'22, his development focus is on Apache Kafka and is a frequent presenter at Kafka Summit Conferences and meetups.
The Importance of Exactly Once Semantics for Analytics Processing
Apache Druid and Apache Kafka work well together for time-series analytics. Leveraging Kafka Streams (and KsqlDB as an implementation of Kafka Streams) with Druid leads to rich datasets. Apache Druid’s Kafka Consumer maintains offsets within Druid however, this alone is not enough when your additional stream processing happens outside of Druid.
This talk includes a demonstration showcasing the importance of exactly-once semantics (EOS) along with design considerations of where to put stream processing; leveraging Kafka Streams for performing aggregations and enrichments.