Data Talks on the Rocks

How ClickHouse became one of the fastest-growing databases in the world

Michael Driscoll
Author
January 20, 2026
Date
5
 minutes
Reading time

In 2009, Alexey Milovidov started building a database he didn't want to build.

The task sounded insane. Build something faster than everything else in existence. Build it for a workload that existing systems couldn't handle. Build it when the entire industry had already decided that Hadoop and MapReduce were the future of analytics.

But as Data Talks on the Rocks host Michael Driscoll explores in this week's conversation with the ClickHouse co-founder and CTO, sometimes the most unreasonable ideas turn out to be the only reasonable solutions.

And sometimes refusing to accept industry consensus is exactly what creates breakthrough technology.

The Real-Time Problem Nobody Else Solved

There's a specific pain point where ClickHouse stands alone.

You cannot use Snowflake as a backend for real-time applications serving user requests directly. BigQuery can handle massive data volumes but fails at interactive latency. Redshift doesn't provide the feature depth or attention to detail that production applications demand. Postgres isn't an analytical database, and the extensions trying to make it one fall far short of what's actually required.

Alexey is blunt about this landscape. When asked what differentiates ClickHouse from alternatives, his answer cuts through the usual vendor diplomacy:

There are some scenarios when ClickHouse is the only working technology.

The category is OLAP databases. But the real distinction is simpler. ClickHouse lets you run complex analytical queries with latencies measured in milliseconds, not seconds. It handles both massive scale and interactive speed simultaneously. Every other system makes you choose.

This combination unlocked use cases that previously required architectural gymnastics or simply didn't exist.

Why AI Companies Keep Choosing ClickHouse

The AI boom revealed something unexpected about ClickHouse's design.

Feature stores turn out to be databases with a fancy name. Data preparation, profiling, and cleaning for machine learning models require fast analytical queries over large datasets. ClickHouse handles this naturally.

MLOps observability generates enormous volumes of telemetry from both training and inference. Companies like Weights & Biases and Braintrust chose ClickHouse because it ingests this firehose while remaining queryable in real time.

Vector search represents the third category. When embedding datasets grow too large to fit in memory, specialized vector databases stop working. ClickHouse continues functioning because it gets the fundamentals right: compression, query optimization, storage layout, and how data is sorted on disk.

Interestingly, ClickHouse has supported array data types since 2012, long before anyone cared about vector embeddings. The feature was built to collect identifiers and categories. It accidentally became perfect for AI workloads a decade later.

The pattern repeats across use cases. ClickHouse has five different functions for calculating quantiles because web analytics and telemetry applications genuinely need that level of precision. The uint256 data type was designed for large integers but maps perfectly to Ethereum addresses. Features built for specific problems keep solving unexpected new ones.

The Developer Experience That Actually Matters

Most analytical databases require you to manage multiple instance types in your cluster topology just to ingest data.

ClickHouse doesn't work that way.

Getting started takes minutes, not days. There's no complex YAML configuration to navigate. No separate ingestion layer to provision. No aggregation nodes to tune. The architecture stays simple even as scale increases.

Alexey's explanation for this simplicity is revealing:

I'm a ClickHouse user. And I'm a really bad user. I notice when anything is slightly not right.

He uses the product constantly. On trains between European cities, he's opening production logs and noticing small things that need improvement. This obsessive attention to detail from the primary author creates a different kind of quality than teams building products they don't personally use.

The documentation exists and is comprehensive. But the design goal is that you shouldn't need to read walls of text just to get basic functionality working. Features should feel natural. The common path should be obvious.

This philosophy extends to how ClickHouse handles complexity. Other systems like Apache Druid and Apache Kylin assemble multiple components that each handle different responsibilities. You write complex configurations just to get data into the system. Alexey describes starting to read their documentation and getting bored in the first paragraph.

ClickHouse takes the opposite approach. Treat it like a relational database with flat tables. Calculate everything on the fly. Use brute force where appropriate. Surprisingly, this works.

The Two-Year Journey to Build a Better Analyzer

Some problems take years to solve correctly.

The analyzer, ClickHouse's infrastructure for query analysis and planning, took two years from alpha to becoming the default in production. Not because the team moved slowly, but because the requirements were uncompromising.

ClickHouse SQL contains significant extensions. You can write alias names for expressions anywhere in other expressions. Array joins and lambda functions make SQL feel almost functional. Higher-order functions let you compose operations in ways standard SQL doesn't support.

Alexey could have used Apache Calcite or another existing query analyzer. Many databases written in C++ somehow plug in these Java projects. But adopting an external analyzer would have meant losing ClickHouse's extensions or creating compatibility headaches.

The team chose a harder path. Build it natively in C++. Support every corner case. Maintain 100% backward compatibility. They refused to create a Python 2 to Python 3 situation where subtle incompatibilities cause migration nightmares lasting years.

When the analyzer finally shipped, users got better performance and access to features like recursive CTEs that previously weren't possible. The switch happened transparently. Queries just got faster.

This matters most for classic data warehousing scenarios. Ad hoc queries written by users or generated by business intelligence tools can be enormous, with nested subqueries and complex joins. The database has to optimize and rewrite these queries intelligently. Before the analyzer, ClickHouse's capabilities were limited. Now it competes directly with systems like Oracle and Snowflake in handling verbose BI-generated SQL.

The 3 AM Problem That Keeps the Work Alive

Fifteen years into building ClickHouse, Alexey remains the most prolific contributor.

Mike asks how he stays inspired working on such a long-term, complex project. The answer is simpler than expected.

Sometimes I'm thinking about some problem and I see that I know exactly how I should do it. I want to do it right now at 3 AM on Saturday, and then I just go and do it.

The joy comes from clarity. Seeing the solution fully formed and being unable to resist implementing it immediately. Not because of roadmaps or sprint planning, but because the problem demands to be solved.

This drive built a database that now powers some of the fastest-growing companies in AI, handles petabytes of data daily, and continues expanding into use cases its creator never anticipated.

If you want to understand how unreasonable ideas become industry standards, this conversation is essential.

Watch the full episode on Data Talks on the Rocks.

Ready for faster dashboards?

Try for free today.