Data Talks on the Rocks - SF

Michael Driscoll
Author
December 5, 2023
Date

On November 15, 2023, we held a lively fireside chat held at Northern Ducks in SF where Guillermo Rauch (founder & CEO of Vercel) and Ryan Blue (founder & CEO of Tabular) talked about serverless technologies and next-gen data infrastructure.

Below is the video and full transcript of the event where you will hear about:

  • Philosophies of no-code and low-code vs codeful architectures and frameworks
  • Positions on open vs closed source models
  • Next-generation infrastructure alternatives for data teams running at scale

Michael Driscoll

A lot of founders talk about their early days and getting that hockey stick-like growth. This was not like how AirBnB’s founder talks about how it was an overnight success that took four years. What was the inflection point for Vercel? When did you start to see things really working, and what would you attribute to that sort of change?

Guillermo Rauch 00:30

Nowadays we call this magical thing that became the inflection point of Vercel “framework defined infrastructure”. It’s interesting because our CTO who is here today, Malte, and I kind of reverse-engineered our success in many ways. We ask how did we get here? How are we growing so fast? What things are working well? How do we explain the combination of this framework we created and the 35+ frameworks that we support, NextJS, as well as our infrastructure? How are they cooperating? What we figured out, in a nutshell, is the idea of giving the customer the boundaries of your software, in this case, the frontend framework that we created NextJS and other frameworks that we support, and then create this almost like a cloud compiler that turns the application source into an intermediate representation that then becomes the infrastructure. And this was not just so that we could make serverless easy and fix a lot of problems with serverless. When you think about serverless from our view of the world, it was always that perpetual dream that everyone wanted to get to. But there were a lot of obvious obstacles. For example, one was local development, trying to do a local development like dropping cloud primitives off of serverless. And it basically took me 45 minutes of actually downloading the Docker containers to spin up some simulation of the cloud. And then I was like, “Holy crap, I'm glad we exist.” It was like that moment that founders sometimes say, I'm glad I'm here.

Michael Driscoll 02:17

There's that meme that says “it works in my laptop”

Guillermo Rauch 02:20

But it’s the opposite with serverless. It works in the cloud

Michael Driscoll 02:25

And Docker is that we shipped your laptop to us.

Guillermo Rauch 02:30

We tried that. We tried the whole like simulate the entirety of the operating system that's running in the cloud and everything about the environment and whatever. There were all these developer experience issues, but then there was also the issue of how do you map the application that ultimately product engineering is working on to globally distributed infrastructure? And that really was a very challenging problem. The solution that we provided was twofold. This is where I always encourage entrepreneurs to not just think about developer experience because that can be limiting. We solve for the developer experience and we also sell for something very enticing, which was to actually make the application faster. So in a nutshell and honestly, the reality of the real world is more complex than this, but what it allowed us to do was I can go to the VP, the CTO, the person that just wants a faster website that doesn't crash on Black Friday and I can tell him that's value of Vercel. And simultaneously I can go to the developer and say, and your life just became a lot easier. I talked to one such developer at a very large corporation today about this, and he was almost physically ill when describing the current infra and the homegrown frameworks and the homegrown infrastructure. So because we had that two-pronged approach to the market that was the unlocker for us.

Michael Driscoll 03:57

So you and I were talking just before we came in here about the philosophies of no code and low-code versus codeful architectures and frameworks. How has Vercel thought about that? What is your perspective on this today looking at this from the lens of 2023?

Guillermo Rauch 04:19

It is a great question, which was prompted by an observation that when I went to the RillData.com landing page, I loved  that it said all my BI dashboards are the output of code and my framework defining infrastructure brain immediately kicks off - oh I can use Rill as a framework, I can push it up to the cloud, it defines some really cool views of my data, which is connected already to all my data integrations. 

Michael Driscoll 04:44

We're going to put you into product marketing, right?

Guillermo Rauch 04:45

Put me in.

Guillermo Rauch 04:50

Obviously I like it because it resonates with our story. So local dev is really important, but even more important, I call this the reasonable effectiveness of hyperlinks. Even more important is that for any atomic unit of work that you do on a day-to-day basis. A fundamental principle of mine is you should have a hyperlink that you can share. So if I make a fix to a front end, I should have an ephemeral link to that preview URL that I can share. A lot of companies whose success we  might take for granted basically are glorified hyperlink collaboration systems on top of more boring infrastructure. GitHub - Here's a PR hyperlink to get GitHub or Git storage site hosting. Figma for design files. Vercel for front-end deployments, etc. I am glad to hear that someone is doing this for BI as well. So on low code and no code whenever I would explain the value of Vercel in terms of increasing your iteration velocity, making your operations really lean, folks would almost immediately react with “so it's a low code and no code platform” And that’s like for a founder a battle when someone kind of gets you, but then they stab you in a comment like that.  And this is something I credit Malte with, he would respond to this question with “No, no, no, we love code. We're the opposite”. Just make it really clear that we love code, but we're almost taking away the bad parts. I respect the folks that are really trying to make no code and low code work because it's almost like the OG serverless, its almost like an opinionated runtime that's running in the cloud, for which they define sharp corners and boundaries that then they could reason about that scale that I wanted as well. So it's almost a no code and low code system, get that for free because it's so limited. Here's where all the amazing evolutions that have happened in the world of virtualized compute have helped Vercel things like Firecracker where I can now expand the boundary of my runtime, I can now give you a much bigger playing field for touring complete infinity of logic. Right? So bringing it back, I think the fact that we make code get some of those properties of no code and low code systems gives us the entry point is very low. Literally, my wife today deployed her website in Brazil and she's never written a line of code. I can then take the same system and sell it to folks and now they're in Black Friday are going to do tens of billions of requests for a week. So that's the other property that I think if you don't really if you have not thought deeply about that serverless scalability system, you may end up either too far on one side of the spectrum, which reshapes your growth strategy, if we want it to be PLG, we also need you to have that property of like the 0 to 1 is very, very fast and cost-efficient.

Michael Driscoll 08:08

I've heard that one of the reasons why The Simpsons is a popular show is that their humor works on many levels, but clearly great products work for many personas. What I see in the evolution of Vercel, you have made it more accessible to folks like your wife. But clearly, there's still a billboard, on the 101 that has a command that is not designed. So let's talk about a little bit here about AI. One of the folks I had at the Data Talk on the Rocks event in New York City was Edo Liberty, founder of Pinecone. And you had mentioned that you're doing work with vector databases. We've got Simon here from TurboPuffer who made an appearance working on a serverless vector database. We think of Vercel as a place where people can build, create and publish applications, but ultimately these applications need to talk to infrastructure, AI of course comes to mind. We need to be able to pull these vectors and embeddings back. How is Vercel working in the AI context with stateful infrastructures like Pinecone, and more broadly other stable infrastructure out there?

Guillermo Rauch 09:40

When I explain our contributions to the AI landscape, they are on two levels. One is that we want to continue to lower the barrier to entry by creating a universal platform. And I think a lot of great companies have been on a journey similar to ours where Bill Gates started with a basic compiler and then he created Windows, which we can have reservations about, but reaches a huge number of people. Our contribution there is a big zero. It's this text to generating UI program. You enter some English text prompts and we give you work and react to code that you can throw into your project to make a UI very, very simply so that this is what it's one of those technologies that I think with AI we will broaden access to creation. I hear from a lot of folks that do backend that they still cannot center a div in 2023. So in his era, you can say, “Hey, create a rectangle that is centered in the middle of the screen.” So that's the whole like lower barrier entry thing, which is huge for us. And also even make the existing engineers, even the pros more efficient. The other level that you're talking about is providing the infrastructure to create AI native products. And I'm really big on this because I think that we can now speak of the cloud in two chapters. Cloud 1.0, Cloud 2.0. The stack of Cloud 1.0 are the things that we know really well now. We've been to that because of object storage, RDS, SQS, SNS, etc. Every cloud vendor has created a version of this premise they've become the standard building blocks going back to their compiler ideas like the instruction set of the cloud, the things that you output out to use this cloud computer. Now I think we're seeing this cloud 2.0, which introduces new instruction, new premise. We have the vector databases once as one such primitive. I also think it elevates the importance of certain primitives that are moving outside of Cloud 1.0. For example, on the Vercel side we've gotten really good at streaming user interfaces. We can server render and then we can stream as the data and the rendering of that data becomes available to create this smooth as possible web experience. Turns out it’s very necessary in the world of AI when you have these machines that are thinking and they take their time to output their thoughts. So to create a very slick UI, you need a web infrastructure such as Vercel, but you also have the introduction of these things that previously didn’t exist or they weren't as widely distributed. Those are the vector database and, of course, the LLM. And now we're seeing and I think even higher level services come to market like the assistant API, even services that aggregate some of those primitives and then make them even more attractive to developers. And that's the area where Vercel is very interested in investing. Can I give you a very simple API to leverage AI without having to understand too deeply all the nuances of, you know, how all the systems seem to connect together, which are very difficult engineering challenges, right? Like you have an unreliable LLM, you have to try to capture that craziness as it's going on. You may have to retry it. The services themselves are unreliable, you might have to do fail over. So simplifying that whole landscape of cloud 2.0 and keeping it really easy for developers to create these AI products.

Michael Driscoll 13:32

We've seen here in Silicon Valley, investors lining up on both sides of this debate, open versus closed, regulated versus unregulated. Where does Guillermo stand on these questions of open versus closed models, open source foundation models? And secondly, whether or not we should regulate it?

Guillermo Rauch 14:10

So on the first one, it’s a little bit of both because right now it's obvious that some of the proprietary systems have strong and reasoning advantages that are undeniable. And you might want to build an application that needs very general, very powerful reasoning in conversational skills. However, that's not the case for every single application. What I have seen with our developer audience is when they don't need the pinnacle of reasoning performance, they go open source all the way. Why? It makes the most business sense. It makes the most sense from a community and ecosystem perspective. We're even talking to chip manufacturers that are creating new kinds of chip accelerators for LLMs. And what are they testing against? They're testing whatever is available. And what is available - llama and mistral. So I think the tailwinds of how fast open source can evolve and get better and the rate at which it can get better cannot be underestimated. And I think that's what over time makes me extremely bullish about open source. On regulation, it's very clear to me that AI when misused, can present dangers to society very much like every other disruptive technology. But I’m very much pro let's get this technology in the hands of every everyone because the good guys will vastly outnumber the bad guys. I talked to a founder yesterday who was building an AI firewall inspired by the concept that when you think of Vercel as a multi-tier system the ingress layer first supplies a WAAF or first of all TCB you need termination, TLC, the AWS mitigation, then web application firewall, then routing. So you have all these layers. I think what we're going to see is the folks adding layers that you can sort of slot into this LLMs to make them more predictable, to make them safer, to detect potential attacks from the abuser. It’s really fascinating to keep track of this because just like when we have the lab start, people started figuring out, my God, little Bobby tables again, I can seek a NSQL injection. We lived and we learned, I think hopefully most of us lived and learned and we identified all these categories of attacks of the cloud one windows that access all kinds of cookie issues and third-party cookie issues and SQL injections. I think we're now finding out which ones of these apply to AI models. We are the ones that we have to have all of the tools in our hands to create this mitigation to create a framework, to create the security solutions. That's what I'm excited about.

Michael Driscoll 18:10

You follow a lot of new technologies, new frameworks, new tools, and a lot of people here are working on new frameworks, new tools. What are you most excited about right now? Not to play favorites, but what comes top of mind for a few tools and technologies that you're paying attention to and checking out?

Guillermo Rauch 18:47 

One controversial thought that I have is folks might have overindexed a little bit on foundation models, and not enough on what we can do with them. So I actually encourage the experimentation of GPT wrapping and I'm always blown away by how many successful businesses are being created in very specific verticals. So if I have to solve problems for folks editing video, editing podcasts, asking financial questions, support, eve at Vercel we haven't found a home run solution for searching documentation. Someone is asking today the tax rate community, what are the solutions that are actually working so they can easily deploy the ability to search my documentation, I think folks are underestimating that. The example that I like is from the moment that we conceived the self-driving car that leverages deep learning and neural nets was possible to Waymo, so for the moment that we got excited to 95% and Waymo can drive in Arizona, that was pretty quick - from the controlled environments to actually get it to work here in SF. I remember we took it to North Beach and we hit every single possible edge case. This is a big city and everything is curated and normal around these blocks, but we have every possible adversarial situation that happened. So that last 5% is what makes me very bullish that the general purpose, here's an app that does absolutely everything is not the way to go. Entrepreneurs are going to tackle more specific verticals and they're going to have a lot of challenges solving that last 5%. They're going to find security problems, edge cases, custom UI on top of the streaming tag so that things are not just text because that's not how we use computers. But that last 5% is where a lot of the value like occurs and it’s really freaking hard. That's what creates this complete product that you can actually ship to large numbers of people. And there are not a lot of those and the ones that are there are extremely successful businesses. 

Michael Driscoll 21:21 

Thank you so much for sharing your thoughts, wisdom, experiences and predictions with us. I'm going to do this talk show style and have you shift to the left of the couch. Now let’s bring up our friend Ryan Blue, founder of Tabular and creator of Iceberg up here. Ryan, first of all, how did I convince you to come here tonight? I think we sort of had crossed paths back in your Netflix days. I was aware of your team working on some of the most serverful and scalable architecture on the planet, many of us enjoy the fruits of your labor every night when we watch movies on our iPads. I would like to start with, for those who don't know, tell us a bit about the Apache Iceberg, Tabular, and the vision to compliment the stateful, durable storage compliment to everything that's been built at Vercel.

Ryan Blue 22:50

We started with the open-source project which is Apache Iceberg and essentially we wanted to fix the datalake world. We had all of these problems that were really stemming from the fact that none of us and I say us because I was at Cloudera at the time, knew what we were doing in the database world when we started building these distributed systems that would eventually become real databases or thought of as real databases. And we learned a lot and we wanted to essentially fix that and bring the guarantees of the database back to the datalake space. It's funny because as you guys were talking about AI and everything, I was like, I'm trying to get people to 1992 forever. So we basically built feed guarantees and SQL behavior of real data warehouses into the datalake stack. And then we sort of went, with that, that's actually really important. We inadvertently solved this challenge of sharing and sharing database tables underneath large-scale databases, which I guess hadn't been done. We'd been sharing data, but no one took it seriously because it had a terrible performance and terrible side effects and you couldn't rename a column safely and things like that. But once we actually applied and stole a lot of ideas from the database world, we came up with something really special, which is that Databricks and Snowflake can actually work at the same time on the same data sets.

Michael Driscoll 24:31

So let's talk about that for a moment. Databricks and Snowflake, these are kind of the Biggie vs Tupac, West Coast vs East Coast, this is the defining rivalry right now of of the data infrastructure space. You and I talked when we spoke a couple of weeks ago about how in theory they should be very reluctant to work with an open standard because if I'm a Snowflake, why would I want my data ever to leave the perimeter of the Snowflake service? And likewise, if I'm Databricks, why would I ever want to leave the confines of Delta Lake? And I guess my two-part question. One is why are they embracing Apache Iceberg as a standard? And what you predict is the endgame here?

Ryan Blue 25:24

So, yeah, you're right. The business model of the database in the last 30 years is to basically sit on your data, you put your data into this database and they release it back to you as long as you're paying bills. That's completely flipped with a shared storage model where you control the data and you let vendors talk to you. So I don't think that database vendors are really happy about this situation. But they are following the data where it is. So if you want access to run compute workloads on someone's data, you have to build a case and go there. In the end, what we're moving towards is the sort of independent data layer where you let the vendors out and everyone wants to play in that space if they can because they have to have your data.

Michael Driscoll 26:26

What is going to stop Tabular from going the other way? What's stopping Tabular from just building ultimately a database? If Tabular is providing the consistency, the table schema is the lineage, what's to stop Tabular from not just going up a layer and actually providing compute in addition to this storage layer?

Ryan Blue 27:13

I think the temptation is there because compute is the lucrative part of this. So for anyone who doesn't know, I sort of skirted your question earlier about what is Tabular. Tabular is building on Apache Iceberg to build that lower share storage layer and to think through what do we need to do in order to share storage underneath these databases? Well, it turns out we need to move security from the query layer down into the storage layer so that no matter what accesses the data, you get the same access controls. So we're building that system and trying to make it so that you can use your data in any of the tools that you want, including Databricks and Snowflake and all of the AWS tools like Athena. If you want to use your data with it, it should be secure and you should be able to do that. I think that to answer your latest question, it's tempting to provide compute, but you know our business model is built around making sure that you can use your data however you want it. We want to make sure that we're still aligned with the incentives because the incentive structure where you put your data in Snowflake and if you misconfigure it, you might have a higher Snowflake bill. We want to be the provider that you trust to be pointing things out and say, hey, we can re-cluster that and you're going to save $5 million. So, you know, we think that setting that alignment of incentives is more important than winning those compute workloads. But hey, we haven't pivoted yet.

Michael Driscoll 29:00

So we could talk, of course, about Snowflake and Databricks, those are the big guys in the space. But many of us who are playing in the data world have been paying a lot of attention to things like DuckDB. And then, just as soon as we get things working with DuckDB, we've got folks who are working on new things like the Polars project, we got folks rewriting Python to be more performant. What are you seeing in terms of the heterogeneity of compute that people are running on top of Iceberg, even if Tabular doesn't move in that direction, what are the alternatives in terms of next-generation infrastructure to that compute infrastructure that you see data teams and data engineering teams running at scale?

Ryan Blue 29:55 

So I definitely see a lot of interest in basically C++ and other non JVM frameworks. Those are seeing a ton of contributions these days. But you know, we still in at least our enterprise customer space, we still see a lot of the traditional ones. I'm really excited about the move to make it more flexible, whether you want to use DuckDB and do everything on a single node and really fast, or if you do need those big compute workloads. We want that mobility primarily so that you can use whatever tool is right for the job and not be, stuck or pulled by your data into one tool or another.

Michael Driscoll 30:43

Is there a way for us to Iceberg locally?

Ryan Blue 30:47

Yeah, of course. Iceberg scales from very, very tiny all the way up.

Michael Driscoll 30:53

All right, I'm going to shift to questions. I’m actually going to put Guillermo on the spot, then we’ll going to get into Q&A. We have a lot of founders here. And so while we're talking tech, infrastructure, philosophies of open source, one of the things that I always find most valuable talking to founders that have been through the gauntlet is advice if I were starting over. So Guillermo you've been at it for a while now, what would be a couple of pieces of advice for some of the folks who are just starting? I met a few people on the way in here who just got their first seed check. They're at the beginning of their journey, and maybe I'll give you a quote, which people have seen on Twitter, we do these things not because we thought because they're easy, but because we thought they would be easy. That’s the open question - advice for these early stage founders here.

Guillermo Rauch 32:00

We're just talking about this earlier. I think if you're building infrastructure, you have to be aware that it's a multi-year journey and it's really, really, really difficult. I actually think that because we can't always perfectly visualize these systems as monumentally obvious as they actually become. I'm kind of obsessed right now in having a phase of the biggest sphere. You look at it I always forget it’s taken all this money and time and like power and look at how bright it is. Infrastructure is a lot like that. But it's just not obvious, you don't see it. So this is like similar to the Jensen advise it's very difficult and you have to have this multi-year, very long term vision. The thing that's also very difficult and is very necessary is you have to incrementally deliver value. Open source is a great way to be incrementally delivering value. You're incrementally delivering value to a huge set of stakeholders. We spent years working on NextJS, fixing issues, and prioritizing issues that were coming from larger companies, even if we weren't sure that those companies were going to be immediately our customers. But it was a way of incrementally delivering value and knowing that you have this very difficult Catch-22 type of problem where you have to build up this huge structure and at the same time everything point to it and be like, Look, it's going up. And I think for us, open source was that tool.

Michael Driscoll 33:33

That’s great, and the same question for Ryan but I'm going to be more specific, what lessons have you taken in terms of commercializing an open-source project and advice for other founders? Again, we've got open-source founders out here trying to go to market. What have you learned as you've gone from Apache Iceberg to a commercial company, Tabular that is promoting that? 

Ryan Blue 34:05

I think I came to it with a lot of opinions about this because I had seen it go wrong. And I think really, just be honest with your community. Talk about the things that are important to you and still value the things that are important to them. They're a huge asset and open source works on, human capital, the relationships, the trust, and you cannot lose the trust with your community. If you do, you're just like, I don't know what you do. So it's an enormous benefit, but you have to treat them with respect and just be honest. Where we have conflicts of interest, we talk to the community about it like, “Hey, we actually sell this thing”. I guess we don't actually have a conflict of interest there because we actually use a lot of the open source directly and contribute it back, so that's a bad example. But you know, where we do we're saying like, we don't think that this is part of the project, and we have a conflict of interest.

Michael Driscoll 35:20

Okay. I'm going to ask the same the same question to you Guillermo about managing open source communities.

Guillermo Rauch 35:24

The beauty of open source is that it's a constant stream of feedback. So for me, it's just all about engaging with that feedback, and seeking it out. One flipside of that really quick is that there are a lot of passive users of open source and really a lot of very vocal, very reactive to every GitHub issue with five reactions kind of users. And you have to be aware that both exist. It's amazing. The other day I heard one of the top three largest banks in the world is massively embracing NextJS. They've never reached out to us until the very end of the project. Oh and by the way, on Monday it's going live. And of course, that’s the beauty of open-source you can do whatever the heck you want. But I was like, wow, it's amazing how self-serving this thing is, like, you can just like you figure it out with copilot or like ChatGPT, it's amazing. So not overreacting too much on any particular source of feedback, but there's just tons, and I think acting very transparently and with my background in real-time and whenever possible, whether it's that the NextJS template doesn't have the right height for mobile devices or there's opportunities to improve documentation, there's just constant opportunities for feedback. I think what X is doing in terms of the open source dialog, there is also a really interesting way we can receive feedback and act really fast and like make huge numbers of people really happy. And so that's a small thing. So feedback to me is king and that's the essence of open source.

Michael Driscoll 37:04

I do notice that you're quite active on Twitter and engaging with that community, which is sometimes quite vocal and can be quite critical. So I've noticed keeping a very calm demeanor is critical if people really go for blood sometimes. Okay, so I do have some questions here from the from the field. So first question is going to go to Ryan because that's how it works here. So, Ryan, what was a strongly held belief it could be about a technology or startups and the sort of infrastructure that you changed your mind on? What was something that you learned that surprised you on the journey with Tabular?

Ryan Blue 37:48

I'm a little unprepared for that, and so I think one of the things that we learned early on was, we thought, hey, we're going to build data security later like that sounds hard. And we realized very quickly that this was a very critical piece because no one is going to trust the startup when the security person comes into the room and like, you want us to just open up the data lake to these people, no. So that was one thing where we really didn't think that we would be going on a multi-month project to figure out, how we're going to do this and really design it properly. A little stressful to do it right on the runway, but it happened, and I'm glad that we did.

Michael Driscoll 38:45

That's what the market is for,  to change your mind. The same question to you. 

Guillermo Rauch 38:48

I think for me, the biggest change I would say I had a strong belief was that a general-purpose tool was going to be very useful, and the way that translates with the market. One of the things you asked earlier was what arethe inflection points in growth, I think was when we narrowed focus and attached ourselves to some very well-understood ways. So obviously TypeScript was on the rise. Obviously React was on the rise. I'd rather do an amazing job or try to do an amazing job for 100 people to love me than having 100,000 people that kind of like it. And that kind of has become a mantra and I’m trying to recommend it to everybody. But that was not very obvious in the beginning. And I think as a founder, you're always like, they need to make everybody the happiest. And in that process, you make it for a guy who like mediocre happy. So focusing and then expanding from there.

Michael Driscoll 40:00

We're almost out of time here. It's almost 7:00. And I do want to make sure we have time for mingling and also to respect the time of our panelists. I am going to in the last 4 minutes provide a quick word from the sponsor Rill Data, though frankly Guillermo probably did a better job than I could. We're not here to pitch any of our businesses. The goal here is to share wisdom and thoughts with founders. But since we've had this banner ad up behind us the entire time, I will just say that Rill certainly is a beneficiary of the work that folks like Ryan is doing on data infrastructure. However, leaning heavily into the world that you're building with data lakes as well, that Guillermo and Vercel have been inspiration for our BI-as-code approach to the world.  So I'm not going to do a demo, but I would ask all of you in exchange for spending time here tonight having a few drinks, please do go check out RillData.com and our operational business intelligence tool. And most importantly, I want to thank our guests here. And thank all of you for coming to an in-person event here at the X headquarters. And I hope you all get a chance to chat with each other. This is what in-person events are all about. 

Ready for faster dashboards?

Try for free today.