
The Semantic Layer Problem Nobody Wants to Talk About

For years, the data industry has been chasing the same dream: a universal semantic layer that makes data accessible to everyone.
Dozens of startups have raised hundreds of millions on this promise. Every major vendor now claims to have one.
But as Data Talks on the Rocks host Michael Driscoll explores in this week's conversation with Lloyd Tabb, founder of Looker and now leading Malloy at Meta, most teams building semantic layers are fundamentally misunderstanding the problem.
And the misunderstanding starts with treating data people like anything other than developers.
The Developer Respect Problem
Watch how most semantic layer tools actually work. Drag-and-drop interfaces. Point-and-click configuration. YAML files that promise "no code required."
The implicit message is clear: data practitioners aren't real developers. They need training wheels.
Lloyd learned this lesson across eight iterations of the same tool, starting at LiveOps in the early 2000s, through multiple failed startups, then with Looker, and now with Malloy. Each time, the breakthrough came from respecting the craft.
"Yet another language without a debugger," Lloyd notes, describing failed attempts by others. The development experience matters. The tooling around the language matters. The cycle time for iteration matters.
Malloy runs as a VS Code extension for exactly this reason. Data developers deserve the same quality tooling that software engineers expect.
Why Rectangles Aren't Enough
Most semantic layer attempts treat data transformation as rectangle-in, rectangle-out. Group by, aggregate, maybe add a join. Stack some materialized views if things get complex.
We learned this pattern in kindergarten, as Lloyd puts it. Sort the coins into piles. Count them. Sum their values.
But business data doesn't actually work this way.
The hard part arrives when you start joining. When you need a dimension from one table and a measure from three tables away, your carefully constructed rectangle suddenly produces wildly wrong numbers because join fan-out destroyed your aggregations.
Looker solved one version of this problem through a key innovation: join relationships don't affect aggregate calculations in Looker's semantic model. You can pick any dimension, any measure, from anywhere in the graph, and it produces the correct result.
This gave analysts dimensional freedom. But Looker still ended with rectangles. Visual dashboards. URLs you could share. The semantic model served the UI, which meant it served business users, which meant data engineers and data scientists largely ignored it.
They kept transforming data with SQL. The semantic layer became something for "business people."
Data Lives in Graphs, Not Tables
The shift that led to Malloy came from recognizing what SQL actually obscures.
Most modern data arrives as JSON. Nested structures. Arrays of objects with their own arrays. Event streams where relationships are implicit in the structure itself.
Every ETL tool immediately flattens this into rectangles. Dozens of normalized tables. Carefully constructed foreign keys. The graph gets shredded, then painstakingly reconstructed through JOIN statements.
Malloy refuses to flatten. It preserves the graph structure through the entire pipeline. The semantic model understands nesting. Queries can return nested results. The output matches how the data actually exists.
BigQuery pioneered this technically. They ingested Protocol Buffers directly into a columnar store and queried them without joins. It took years to make the SQL dialect work properly, but the core insight was correct: stop destroying structure just to recreate it later.
The Infrastructure Shift Nobody Expected
When Redshift launched, Lloyd remembers the exact thought: "We won." Looker bet on SQL when everyone else was writing MapReduce jobs on Hadoop. When Redshift appeared and ran a hundred times faster than Postgres, the entire market validated their architecture.
Now Lloyd sees a similar inflection point, but inverted.
"My laptop with DuckDB running on it is faster than Redshift was then."
You don't need a cluster anymore for most analytical workloads. You can embed a query engine directly into your application. You can link DuckDB into Malloy running in VS Code and get sub-second responses on datasets that would have required serious infrastructure five years ago.
This doesn't eliminate cloud warehouses. Privacy and access control still matter. Some data absolutely needs to live in a tightly secured warehouse with sophisticated governance.
But the default assumption has shifted. Most data doesn't need that level of infrastructure anymore.
The AI Trap Everyone's Walking Into
Lloyd's perspective on AI and semantic layers cuts against the current hype cycle.
Simple translation works fine. Teaching someone to answer "which carrier has the most flights to JFK" is trivial. An AI can absolutely handle that level of question-answering, especially with a well-structured semantic model providing named dimensions and measures.
But then people leap to the fantasy: "I have 10,000 tables in my warehouse, AI, do something useful with them."
Lloyd's response: "Good luck with that."
The reality of enterprise data warehouses demolishes these dreams immediately. Half the tables have overlapping columns. Nobody remembers what most of them contain. The documentation is either missing or misleading.
Structure first. Intelligence second. The teams building AI data tools without requiring semantic modeling are setting themselves up for spectacular failures. You can't reason over chaos.
Why SQL Probably Won't Die (But Maybe It Should)
Lloyd hears the same objection constantly: nobody will give up SQL. Too entrenched. Too familiar. Too much existing code.
Tools like SQLGlot are betting on this permanence, translating between SQL dialects and accepting that SQL itself is the immutable foundation.
Malloy makes the opposite bet.
Every SQL dialect handles arrays differently. Unnesting looks completely different across engines. Window functions work the same for basic cases, then diverge wildly for anything complex.
Malloy abstracts this chaos. Write once, run anywhere. The same model executes identically on Snowflake, Redshift, BigQuery, DuckDB, and Postgres.
The adoption case is harder. SQL has forty years of momentum. But Lloyd has played this game before. Everyone said visual BI tools would replace programmatic analytics. Looker bet that data developers wanted to write code, not click buttons.
Now the bet is that a better language, with better tooling, solving real problems that SQL makes unnecessarily hard, will eventually win.
What Actually Makes a Semantic Layer Work
Lloyd is on version 4.0 of Malloy because they've thrown the language away three times.
This is the part nobody talks about when they announce their semantic layer product six months after founding the company.
Getting the semantics right is brutally difficult. Making it learnable is harder. Building tooling that makes developers productive requires years of iteration.
A professor at University of Washington now teaches business students Malloy instead of SQL, and the class is succeeding. This matters more to Lloyd than anything else: "Is it learnable? Can somebody be productive with this tool?"
Most semantic layers fail this test silently. The documentation looks good. The demos are impressive. Then teams try to actually build with them and hit walls everywhere.
Lloyd's answer is to treat semantic layer development like compiler development. Rigorous testing. Consistent semantics. Debuggers that actually work. Fast iteration cycles.
The tools that succeed aren't the ones with the slickest marketing. They're the ones built by people who genuinely understand the craft and respect the practitioners.
If you want to understand where semantic layers are actually heading, beyond the vendor pitches and conference keynotes, this conversation is essential.
Watch the full episode on Data Talks on the Rocks.
Ready for faster dashboards?
Try for free today.
.jpg)
.png)
