We recently hosted an in-person event in San Francisco to discuss the Modern Data Stack—ETL pipelines, quality monitoring, operational analytics, fast OLAP stores, and headless BI.
The panel included:
For many, it was their first event post Covid. The ability to gather in person once again with like-minded individuals made for a truly special night. This is the first post in our blog series where we will share the perspective of five founders on trending topics in the data ecosystem.
The easy way to define it is cloud based SaaS tools and the transition from legacy sales oriented companies like Oracle and SAP towards these much more modern ways of software. But the longer I think about it, what it really is becoming is something that is much more horizontally integrated vs vertically integrated.
We used to have a bunch of data tools that did a bunch of things. You had BI tools that were also your data stores, that would also do transformation, and data ingestion too. If you want to do product analytics that would live in Google Analytics, and it was end to end tracking all the way through your dashboards and reporting. If you want to do sales analytics, it lives in Salesforce, and you're logging in constantly to see the source of truth for data.
To me, the modern data stack really is turning that on its side so that you have one unified layer. For each part of that step you have one layer for ingestion, you have one layer for your database, you have one layer for transformation, you have one layer for monitoring, you have one layer for operational analytics, etc. And all of those things are offered across all different domains of business. Ultimately that's what the Modern Data Stack is: these individual layers that can be integrated vertically, as opposed to taking each tool vertically solving the problem.
Data itself is turning into a product that serves all these various use cases, specifically because it is horizontal. So now the only way for a team to reach all the various stakeholders, if it's going to be horizontal, is to think and deliver like a product team. So to me, the Modern Data Stack is the “software-ification” of data organizations, turning what they do into an agile, business-like operational model.
That’s why you see the emergence of not just tools for every layer in the stack, but also all of the tools and practices around DevOps starting to appear in the world of data. You need to know if your stats or if your metrics are correct, right? So you need monitoring, you need to know what metrics you have. you have a catalog, and you need to deploy those. A lot of the Modern Data Stack is about that transition of how the data team delivers for the rest of the company.
I think the biggest thing that I see in the last four or five years is data is just becoming unmanageable. You have smaller and smaller teams, working with more and more disparate data sets, where no one actually knows the state of any single data set at any point in time. Sure, maybe I'm the expert on ETL, but now we have 510, and now we have DBT. Also everyone's writing their own pipelines, and then everyone is ingesting their own data. All of a sudden, you have a team of maybe like five, six data engineers and a couple of data scientists who are managing 500 datasets, 600 datasets, who actually knows what's in any of those? Probably nobody.
And so we're past the age of tribal knowledge and of asking your neighbor, “Hey, what's in this table? How do I pull the latest trip count numbers?” And we're in an age where you need tooling to be able to manage data at the scale. And whether that's cataloging, whether that's operational analytics, whether it's monitoring, whether it's access at high speeds, whether it's visualization, the tooling needs to support the scale of the data, and empower the smaller teams to be able to actually manage and work with that amount of very, very disparate data.
I think the two salient changes with the Modern Data Stack are:
We've started to create a set of organs, like a large scale organism that has a heart, liver, and the pancreas. And we all understand there's separation between these things. I think organizations have figured out that we have data engineering, we have data quality, etc. etc. Instead of data being one size fits all, we now have data teams that have broken themselves up into little pieces.
And the Modern Data Stack in some ways is the productization of all of those pieces. Naturally big companies like Lyft, Uber, and Pinterest have faced these problems first, they built these pieces, and now many of the companies here are the “productization” of what we saw internally. So there's certainly more to be built, but when I think of Modern Data Stack, it's a new set of organs for a new planet of cloud and new scale of data.
For me, organizations in the past that have had the crown jewels to do certain things with data whether that's ingestion, storage, processing, generating reports, were held by a selected pool of individuals, right? To me, the Modern Data Stack is anything that democratizes any one of those components.
Whether it's ingestion of data so the marketing team can ingest third party data into the data warehouse, to storage like Snowflake that stores this data and processes, to even the consumption level and asking information like how do I use this data? So the Modern Data Stack is anything that really democratizes this ingestion, processing, storage and flow of data.
Click here to watch the video, and be on the lookout for additional blogs and videos from the event including the real ROI on data, authentic community building, and more. In the meantime, we invite you to try Druid on Rill to see if it fits your needs for operational analytics.