5 Founders on the biggest unsolved problems of the Modern Data Stack

Marianne Jarvis
Author
July 14, 2021
Date

In a previous blog post, we shared the perspectives on "What is the Modern Data Stack?" from five different founders of data companies. Another hot topic from this same event was around the ROI of data. Can you really measure the value of a decision based on data? Today, we share the founders' varying responses.

The panelists included:

  • Census - Founder, CEO Boris Jabes. Census is a data automation platform that provides an easy way to sync your data warehouse with the apps you use.
  • Bigeye - Co-founder, CTO Egor Gryaznov. Bigeye is the leading data quality engineering platform designed to help data teams build trust in data.
  • Stemma - Co-founder, CEO Mark Grover. Stemma is a fully managed data catalog with automated metadata, personalized experience, and enterprise management - powered by Amundsen. 
  • Mode - Co-founder, Chief Analytics Officer Benn Stancil. Mode is an analytics platform designed to help data analysts and data scientists analyze, visualize, and share data.
  • Rill Data - Co-founder, CEO Michael E. Driscoll. Rill is the only fully managed cloud service for Apache Druid that enables real-time, operational analytics and insights into your business. 
  • base case capital - Founder, Managing Partner Alana Anderson. base case capital is an early-stage venture capital firm focused on the next generation of enterprise software

What are the largest unsolved problems in the modern data stack?

Boris Jabes (Census)

The inevitable main problem is we must find a way to think fast and think slow. That seems to be the eternal thing everyone wants to talk about now. How do you do everything that we've talked about at any arbitrary latency? And can these two worlds ever meet? I think I spend most of my time telling people “you're overthinking the problem.” Maybe you should use the data before you talk about making it crazy fast because most people are not that sophisticated. But I do think that's definitely something on most people's minds.

Mike Driscoll (Rill Data)

If anyone remembers the first time you used Google Maps after using MapQuest or other tools, and it was a mind blowing moment. You're like “Oh my gosh! I can see the world outside of this 200 by 200 pixel square with a bunch of reloads.” I think we're still waiting for that Google Maps moment for Big Data, where you want to have a dialogu—this fluid-like connection with data where you can move through it interactively. We're living in this weird world where we're solving problems, but we're just not solving them fast enough. 

The other analogy I would make is imagine if you transmitted video over Zoom, and insisted on full fidelity for every frame sent over the internet. Of course everything would be super laggy. In the world of data, we're just crunching so many rows and events and values that we actually don't know about. Because no one's really looking at that long cardinality value that was actually a mistake that someone entered. So I think some of those techniques, like codecs for data, much better compression, sketchy and approximate statistics, are things that will let us get to that Google Maps moment for Big Data.

Egor Gryaznov (BigEye)

We talked about the horizontalization of the data stack, but I think we swung the pendulum a little too far in the other direction. We are now, and I’m guilty of this, in this mindset that every tool solves a problem. Now what data teams have to go and do is say, well  we are using Mode for visualization, Stemma for data cataloging, and Druid for the database, and then our analytics etc. You get it. Every team goes to the grocery store and says “here’s my credit card, now give me this SaaS product, and this SaaS product, and this SaaS product too”, and then they still all have to work together. And the more products that come out, the bigger the spider web grows. 

The problem with this is now you have teams that are building their own solutions just to make the products work together for them. What Fivetran is doing is amazing. They say, “look, we're going to figure this out for you. We’re going to be that middle piece, and anything you want on the left and anything you want on the right, don't worry about it, we'll figure it out somehow.” It's black magic, and there's a box that you don't have to worry about. I think that is going to start happening at all those interconnects. It doesn't matter what your database is, it doesn't matter what your catalog is, it doesn't matter what your querying tool is, there's going to be all these interconnects. Either all of the tools are going to have to build them, or someone is just going to come in and become the Fivetran of every single interconnect. I think that's what is really missing.

Benn Stancil (Mode)

I feel obligated to say that what is missing has something to do with metrics and what Transform and Supergrain are building. I think in a lot of cases we're making this too complicated. The thing that's actually missing is people build a bunch of stuff with data and don't really get any value out of it. They build a bunch of dashboards that nobody looks at. Then they do a bunch of analysis, that is the actual thing that helps people make decisions, but they immediately forget as soon as they make that decision. To me, the thing that's actually missing here is we have all of these governance tools for dashboards, but the actual value of our data is what analysts do with it, that we don't keep track of at all. We don't actually learn from that outside of the individual decision you make. I don't know what that solution is, and this is something that's very adjacent to what Mode does. 

We can talk about all of the technology that is missing, but fundamentally people invest a lot of money in data and probably don't get that much value out the other end. They expect to get all this stuff and are going to make these brilliant decisions, but do we make that many better decisions now than we did 20 years ago? Probably not. We can keep track of what's happening better, but I don't know that we're actually making that many better decisions, which given the promise of what this is, it feels like the real piece that we're missing.

Egor Gryaznov (BigEye)

I 100% agree that data is a cost center at every single business. I think the hardest part there is measuring what is the value of the decisions you're making off your data? If I’m an executive and I go look at a dashboard, and that dashboard tells me something that I change about my business, what was the ROI? I don't know. But now as the executive, I get all the kudos for that. And all of the work that the analysts, data scientist, and data engineer did to get that data to me, has gone unrecognized. So I think we are making better decisions now than we were 20 years ago. I just think that the respect and the ROI finger pointing is in the wrong direction, and if we can somehow start measuring that, then would that really turn it around.

Benn Stancil (Mode)

One thing I want to point out is that execs don't look at dashboards and make decisions. Execs look at dashboards and ask someone else to help them make the decision. I think that actually is an important difference. There is a lot of work that goes into any decision process. The dashboard is kind of the starting point of what we do now. Then the follow up work is what gets lost into the ether. That's one of the things we're really missing. It’s one of the things we really should be capturing better. That's probably one reason why the ROI is so hard to figure out is because the dashboard produced a question, but not really the decision of what to do.

Egor Gryaznov (BigEye)

In terms of the decisions in the ROI that I'm thinking about, it’s at that macro level where you are a marketing executive and your team presents your dashboard. The decision that needs to be made is, do I spend more money on Facebook ads or do I spend more money on Instagram ads? If I can look at a dashboard, and trust my data, and can say “yes, I should really just stop spending money on Facebook because it's costing me $250k a week, and I'm not getting anything out of it,” then that is a serious ROI to the business. There’s a delta, but we need to come full circle and we need to get better. It really is about those macro decisions being made in business. 

Click here to watch the video, and be on the lookout for additional blogs and videos from the event including authentic community building and more. In the meantime, we invite you to try Druid on Rill to see if it fits your needs for operational analytics.

Ready for faster dashboards?

Try for free today.