Data Talks on the Rocks

Data Talks on the Rocks 8 - ClickHouse, StarTree, MotherDuck, and Tobiko Data

Michael Driscoll
Author
May 20, 2025
Date
5
 minutes
Reading time
Data Talks on the Rocks is a series of interviews from thought leaders and founders discussing the latest trends in data and analytics.

Data Talks on the Rocks 1 features: 

  • Edo Liberty, founder & CEO of Pinecone
  • Erik Bernhardsson, founder & CEO of Modal Labs
  • Katrin Ribant, founder & CEO of Ask-Y, and the former founder of Dataroma

Data Talks on the Rocks 2 features: 

  • Guillermo Rauch, founder & CEO of Vercel
  • Ryan Blue, founder & CEO of Tabular which was recently acquired by Databricks 

Data Talks on the Rocks 3 features:

  • Lloyd Tabb, creator of Malloy, and the former founder of Looker

Data Talks on the Rocks 4 features:

  • Alexey Milovidov, co-founder & CTO of ClickHouse

Data Talks on the Rocks 5 features:

  • Hannes Mühleisen, creator of DuckDB

Data Talks on the Rocks 6 features:

  • Simon Späti, technical author & data engineer

Data Talks on the Rocks 7 features:

  • Kishore Gopalakrishna, co-founder & CEO of StarTree

Data Talks on the Rocks 8 features:

  • Joe Reis, author

Data Talks on the Rocks 9 features:

  • Matthaus Krzykowski, co-founder & CEO of dltHub

Data Talks on the Rocks 10 features:

  • Wes McKinney, creator of Pandas & Arrow

On April 23, 2025, 200+ data practioners and technical leaders joined our all-star event during Data Council. We discussed real-time databases and next-generation ETL with a legendary panel of technical founders Yury Izrailevsky (co-founder of ClickHouse, ex-Google, Netflix, Yahoo), Kishore Gopalakrishna (founder of StarTree, created Apache Pinot at LinkedIn), Jordan Tigani (co-founder of MotherDuck, ex-SingleStore and Google), and Tobias (Toby) Mao (founder of Tobiko, creators of SQLMesh and SQLGlot).

I’ve noted some of my favorite highlights below:

"dbt is great, don't get me wrong, but it's really stateless. There's no state and everything is just a script and you just run your Jinja SQL. So if you're familiar with the DevOps space, it's kind of like Chef, where you can tell it exactly what to do and it just does it blindly. SQLMesh on the other hand, is stateful. It understands SQL and kind of is more declarative. So it's more like Terraform. And so there are two very different approaches for transformation." - Toby Mao, founder of Tobiko.
"I went to Hannes and Mark, who are the creators of DuckDB and I said, hey, you know what? You guys have built something amazing, will you give me a job? I'd love to sort of work with you and build a SaaS service. You're probably building a SaaS service. And they said, we don't want to build a SaaS service. We just want to focus on building this amazing database. But we'll partner with you." - Jordan Tigani, co-founder of MotherDuck.
"User facing analytics - that's the term that we coined, which is really very different when you actually get into the weeds of it. Your users don't want to wait for a load page to load for seconds, right? I mean, it's okay in the data warehouse, internal use cases because you don't have the option. But when you sell something to your end users, you have to be in milliseconds. - Kishore Gopalakrishna, founder of StarTree.
"In the cloud, when we first created the company and decided to build a cloud product, we chose a very different architecture under the hood. We moved away from shared-nothing to compute and storage separation where the data sits in object storage like S3 or GCS, and the compute nodes are completely stateless. You can do idling, you can even do things like compute compute separation where you have different compute clusters for different use cases, like separating reads and writes and because it's serverless, it scales up and down. You only pay for what you use, and that actually ends up being cheaper. And on top of it, our cloud product, we've added a whole bunch of security features like, identity, cloud native identity, and access management, RBAC, bring your own key.- Yury Izrailevsky, co-founder of ClickHouse.

Real-time database roundup with the founders of StarTree, MotherDuck, and ClickHouse

At a recent panel hosted by Rill Data, leaders from ClickHouse, StarTree, and MotherDuck shared how they think about real-time analytics, cloud database architecture, pricing models, lakehouse adoption, and AI. Their discussion revealed where modern analytics infrastructure is heading next.

Real-time analytics is having another defining moment.

For the last decade, analytics infrastructure has been shaped by cloud data warehouses, open source query engines, and lakehouse architectures. But a new shift is underway: analytics is moving closer to the end user. It is no longer just for internal dashboards and analyst workflows. Increasingly, it powers product experiences, embedded applications, operational decision-making, and AI agents.

That shift changes what matters.

At a recent discussion moderated by Rill Data CEO Michael Driscoll, three database leaders — Yury Izrailevsky of ClickHouse, Kishore Gopalakrishna of StarTree, and Jordan Tigani of MotherDuck — explored the future of the real-time analytics database market and why performance, pricing, and developer experience are all being reevaluated.

Their conversation offered a useful snapshot of where the data infrastructure market stands in 2026, especially for teams building user-facing analytics.

Why build another real-time analytics database?

The database market is famously crowded. Yet ClickHouse, StarTree, and MotherDuck have all found strong momentum by focusing on gaps that traditional data platforms did not address well.

For Jordan Tigani, the opportunity behind MotherDuck came from a simple observation: many organizations do not actually have “big data” workloads in the way the industry often assumes. Even when they do, many teams query only small slices of that data at a time. That makes it possible to design for smaller, more interactive workloads first, rather than optimizing everything for extreme distributed scale.

For Kishore Gopalakrishna, the need came from a very different direction. Apache Pinot was created at LinkedIn to support user-facing analytics at massive scale. This was not traditional business intelligence. It was analytics delivered directly into end-user product experiences, where latency expectations are much stricter and concurrency can be enormous.

For Yury Izrailevsky, ClickHouse stood out because the technology was already proving itself in demanding production environments. Companies were adopting it because it was dramatically faster than alternatives for high-performance analytical workloads. That created a clear opening to build a managed cloud product around battle-tested open source software.

The common thread is clear: these companies were not trying to build another generic database. They were responding to modern analytical workloads that traditional systems struggled to serve well.

User-facing analytics is driving the next wave of data infrastructure

One of the most important ideas from the panel was that user-facing analytics is now a major design center for analytics infrastructure.

Traditional BI systems were built for internal users. In that world, seconds of latency might be tolerable. Queries might be expensive. Concurrency might be limited. Dashboards were mostly consumed by analysts, operators, or executives.

That model breaks down when analytics becomes part of the product.

If a user opens an application and expects instant filtering, drill-down, or natural language answers, the system behind that interaction has to behave more like application infrastructure than warehouse infrastructure. That means low latency, high concurrency, predictable cost, and high freshness.

This is exactly the category that platforms like ClickHouse, Apache Pinot, and DuckDB/MotherDuck increasingly address. While each takes a different architectural approach, all three are helping define the modern real-time analytics stack.

For teams building customer-facing dashboards, operational apps, internal tools, or embedded analytics products, this is a major shift. The question is no longer just where data is stored. The question is whether the system can power experiences people actively use.

ClickHouse vs StarTree vs MotherDuck: three different bets on real-time analytics

The panel also highlighted how differently modern analytics companies are approaching the same market.

ClickHouse: cloud elasticity on top of high-performance analytics

ClickHouse Cloud is built around a separation of compute and storage, with stateless compute nodes and data stored in object storage. That architecture makes it easier to scale workloads up and down, isolate different use cases, and simplify operations compared with self-managed clusters.

For teams comparing ClickHouse vs traditional cloud data warehouses, the value proposition is speed, operational flexibility, and the ability to support demanding real-time use cases without overprovisioning permanently.

StarTree: high-concurrency user-facing analytics with Apache Pinot

StarTree’s position is rooted in Apache Pinot’s origins at LinkedIn. The focus is on serving analytical queries in milliseconds under heavy concurrency, especially for user-facing analytics.

That is a very specific workload profile. It favors architectures optimized for random access, freshness, and speed, rather than general-purpose warehouse scanning. For applications where analytics is embedded directly into the product, StarTree is betting that these performance characteristics matter more than broad warehouse-style flexibility.

MotherDuck: better analytics for the common workload

MotherDuck’s argument is that the industry has overbuilt for rare, massive workloads and underbuilt for the common case. With DuckDB as the foundation, the emphasis is on excellent performance for smaller-scale and interactive workloads, plus a better developer experience.

That perspective makes MotherDuck especially interesting in conversations about small data, local-first analytics, and hybrid execution models. It is also a reminder that modern hardware has changed the economics of analytics architecture in ways many legacy assumptions do not fully capture.

Why pricing models matter in real-time analytics

Modern analytics pricing is often treated as a procurement issue. The panel made clear that pricing is really a product issue.

Kishore Gopalakrishna offered a direct critique of credit-based consumption models common in cloud data warehouses. His core argument was that these models can discourage usage and create incentive misalignment. If every query feels expensive, product teams and end users naturally become more cautious. That limits adoption, experimentation, and embedded use cases.

StarTree’s response has been to favor a core-based model, where customers can run large volumes of queries without worrying about being charged on every interaction.

That matters because the future of real-time analytics depends on broad usage. If analytics is going to sit behind applications, APIs, and AI interfaces, it cannot feel prohibitively expensive every time someone clicks, filters, or asks a question.

For buyers evaluating real-time analytics databases, pricing should be seen as part of the architecture decision. It determines not just cost, but what kinds of user experiences are feasible.

Lakehouse, Apache Iceberg, and the performance tradeoff

The panel also tackled one of the biggest infrastructure topics in analytics: the rise of the lakehouse and Apache Iceberg.

There was broad agreement that Iceberg and open table formats are strategically important. They improve interoperability, reduce lock-in, and make it easier for different engines to access the same data.

But there was also skepticism about whether lakehouse architectures can fully replace database-native storage for demanding real-time workloads.

Jordan Tigani described himself as a lakehouse skeptic in some respects, arguing that decoupling storage and compute introduces tradeoffs in performance, security, metadata handling, and updates.

Kishore Gopalakrishna emphasized that for the millisecond-latency workloads StarTree targets, remote object storage still adds too much overhead. Iceberg support is useful, but it is not the right default for use cases that depend on extremely low latency and high concurrency.

Yury Izrailevsky took a hybrid view. ClickHouse supports Iceberg and data lake interoperability, but still argues that native ClickHouse storage provides superior performance and compression for many use cases.

The takeaway is not that lakehouse architectures are failing. It is that Apache Iceberg is becoming foundational for openness and interoperability, while specialized execution engines still matter when speed, freshness, and concurrency are critical.

That is likely to define the next stage of the market: open formats underneath, differentiated performance on top.

AI and analytics: what changes when data becomes conversational?

No conversation about the future of data is complete without AI, but the panel’s discussion was notably grounded.

The speakers did not suggest that AI has solved analytics. In fact, they highlighted the opposite: text-to-SQL remains difficult in messy environments, especially when schemas are inconsistent, semantics are unclear, or metrics definitions are not well modeled.

At the same time, all three speakers described real momentum around AI-powered interfaces.

In constrained environments, natural language interfaces can work surprisingly well. In some user-facing analytics applications, conversational experiences are already replacing or complementing dashboards. In other cases, AI is helping with query generation, autocomplete, semantic interpretation, and exploration.

The most important point was this: AI interfaces only work well when the backend is fast.

If an AI agent sits on top of a slow warehouse or batch-oriented engine, the experience breaks down quickly. But if the backend is a real-time analytics database, conversational interfaces become much more viable.

This is one reason real-time systems are becoming more strategically important in the age of AI. The future of analytics may not be only dashboards, and it may not be only chat. It will likely be a blend of visual interfaces, metrics-driven modeling, and conversational access — all powered by low-latency data infrastructure.

The future of real-time analytics is closer to the user

The most useful conclusion from the discussion is that analytics infrastructure is being reshaped by proximity to the user.

That means:

  • lower latency requirements,
  • more product-embedded use cases,
  • higher concurrency,
  • more pressure for predictable pricing,
  • stronger demand for interoperability,
  • and new interface expectations driven by AI.

In other words, the market is moving beyond the classic warehouse-centric model.

The next generation of analytics platforms will not win only by storing more data or scaling more clusters. They will win by helping companies deliver fast, intuitive, cost-effective data experiences directly inside products and workflows.

That is the larger meaning behind the rise of ClickHouse, StarTree, MotherDuck, and similar platforms. They are not just alternative databases. They are part of a broader reset in how we think about analytics itself.

And for teams building modern products, that reset is arriving quickly.

Post-modern ETL chat with the founder of Tobiko Data

The modern data stack is evolving—fast.

New tools like SQLMesh, SQLGlot, DuckDB, and ClickHouse are reshaping how teams build data pipelines, model transformations, and power analytics.

This shift points to what we’re calling the post-modern data stack: a more composable, developer-friendly approach to data infrastructure—where SQL is central, systems are stateful, and compute is interchangeable.

In a recent conversation with Toby Mao (Founder of Tobiko Data), we explored how this new stack is emerging—and why it matters for data teams today.

From Metrics Layers to Modern Data Transformation

Many data teams first encountered structure through metrics layers—tools designed to standardize definitions like MAU, revenue, or retention.

At Airbnb, this took shape in a system called Minerva.

But the real lesson wasn’t about metrics—it was about data transformation.

Without reliable transformations, metrics break down.

Early approaches relied heavily on YAML and templating, which created friction:

  • Limited flexibility
  • Poor performance
  • Lack of SQL awareness

The solution? Move toward SQL-native transformation frameworks.

This shift laid the groundwork for tools like SQLMesh.

SQLGlot: Cross-Database SQL for the Modern Data Stack

One of the biggest challenges in data engineering is SQL dialect fragmentation.

Teams often run queries across:

  • Snowflake
  • BigQuery
  • Spark
  • Trino / Presto
  • ClickHouse

But SQL isn’t truly portable across these systems.

SQLGlot solves this problem by enabling:

  • SQL parsing and validation
  • Translation across 20+ SQL dialects
  • Engine-agnostic query execution

This makes it possible to write SQL once and run it anywhere—a foundational capability for the post-modern data stack.

SQLMesh vs dbt: Stateless vs Stateful Data Pipelines

A common question for data teams:

How does SQLMesh compare to dbt?

The key difference comes down to architecture.

dbt (Stateless)

  • Executes SQL scripts as written
  • Relies on user-defined logic for incremental models
  • Limited awareness of past runs

SQLMesh (Stateful)

  • Tracks what data has already been processed
  • Understands time and dependencies
  • Enables declarative pipeline definitions

This has major implications for incremental data processing.

Instead of manually managing timestamps and backfills, SQLMesh allows you to define a time range—and handles the rest.

This approach is closer to modern infrastructure tools like Terraform than traditional scripting.

Incremental Data Pipelines Done Right

At scale, rebuilding entire datasets isn’t practical.

Multi-terabyte tables require incremental processing—but doing this correctly is hard.

In many tools, users must:

  • Write custom logic to track changes
  • Manage edge cases manually
  • Reprocess large amounts of data unnecessarily

SQLMesh introduces a better model:

  • Native understanding of time and state
  • Automatic backfills and forward fills
  • Precise control over data ranges

This leads to:

  • Faster pipelines
  • Lower compute costs
  • More reliable data workflows

DuckDB and the Shift to Local-First Data Development

Another key trend in the post-modern data stack is local-first development.

SQLMesh leverages DuckDB to enable:

  • Zero-config local environments
  • Fast unit testing for data transformations
  • Offline development and CI/CD

This means developers can:

  • Build pipelines locally
  • Test transformations instantly
  • Deploy to production warehouses later

Combined with SQLGlot, this creates a powerful workflow:

👉 Develop locally → Test quickly → Deploy anywhere

The Rise of Composable Data Architecture

The post-modern data stack is modular and composable.

Instead of relying on a single platform, teams combine best-in-class tools:

Core Components

  • Transformation: SQLMesh, dbt
  • Orchestration: Airflow, Dagster, Kestra
  • Databases: DuckDB, ClickHouse, Snowflake, BigQuery
  • Real-time analytics: Apache Pinot, StarTree
  • BI / dashboards: Rill Data

These tools work together—not in isolation.

This composability allows teams to:

  • Reduce vendor lock-in
  • Optimize for specific use cases
  • Scale more efficiently

Open Source vs Cloud: A New Balance

Open source remains a critical foundation of the modern data stack.

Projects like SQLMesh and SQLGlot demonstrate a strong commitment to:

  • Fully featured open source cores
  • Transparent development
  • Community-driven innovation

Cloud offerings (like Tobiko Cloud) build on top with:

  • Managed infrastructure
  • Observability and monitoring
  • Security and access control
  • Enterprise-grade reliability

This hybrid model is becoming the standard for data platforms.

Why the Post-Modern Data Stack Matters

This shift isn’t just about new tools—it’s about a new philosophy:

  • SQL as the universal language
  • Stateful systems over stateless scripts
  • Local development with cloud deployment
  • Composable architectures instead of monoliths
  • Open source as the foundation

For data teams, this means:

  • Faster iteration cycles
  • Lower infrastructure costs
  • More reliable pipelines
  • Better collaboration between engineers and analysts

How Rill Fits Into the Post-Modern Data Stack

At Rill, we’re building a real-time BI tool designed for this new world.

We work seamlessly with:

  • DuckDB
  • ClickHouse
  • Apache Pinot
  • Modern transformation frameworks

Our goal is simple:

👉 Make it easy to explore, visualize, and act on data—without complexity.

Final Thoughts

The modern data stack isn’t disappearing—it’s evolving.

The post-modern data stack represents the next phase:

  • More flexible
  • More developer-friendly
  • More aligned with how teams actually work

Tools like SQLMesh and SQLGlot are pushing the ecosystem forward.

And the best part?

We’re still early.

Ready for faster dashboards?

Try for free today.