
Summer 2025 Edition
Insights for your beach reading ☀️
This is your curated vacation companion for people curious about data engineering. From quick podcast episodes to deep-dive papers for those lazy afternoons when you're soaking up the sun.
Podcasts
Keep updated on the go - perfect for beach walks or poolside listening
- The Data Engineering Podcast by Tobias Macey Latest news about data engineering with practitioners and creators of DE tooling
- The Data Stack Show by Rudderstack About the modern data stack and its ecosystem
- Data Talks on the Rocks by Rill Casual conversations with data leaders and insights
Newsletters
Stay up to date with the latest trends and insights in data engineering
- Practical Data Modeling by Joe Reis Data modeling community and modeling techniques reimagined
- Data Engineering Weekly by Ananth Packkildurai One of the first and most comprehensive weekly data engineering newsletters
- Data News by Christophe Blefari High-quality curated data engineering news and insights from Europe
Blogs
Light reading with deep insights from industry experts
- Start Data Engineering by Joseph Machado Practical tutorials and guides for building data engineering systems
- SeattleDataGuy's Substack by Ben Rogojan Long-standing blog with deep thoughts about the world of data
- Benn's Substack by Benn Stencil Prolific writer and storyteller, all about data and BI
YouTube Channels
Visual learning for when you want to see concepts in action
- Seattle Data Guy by Ben Rogojan One of the first YouTube channels dedicated to data engineering content
- Kahan Data Solutions by Michael Kahan Clear explanations of data engineering concepts, especially data modeling
- Darshil Parmar Channel by Darshil Parmar Open source data engineering projects and hands-on tutorials
GitHub Handbooks
Comprehensive handbooks to reference for any data engineering topic
- Data Engineer Handbook by Zach Wilson Comprehensive repo covering almost everything you'd want to learn about data engineering
- Data Engineering Handbook by David Gasquez (Obsidian) Interconnected knowledge base built with Obsidian covering data engineering topics
- The Data Engineering Cookbook by Andreas Kretz The original data engineering handbook tackling the complete lifecycle
Whitepapers
Deep dives into research - save these for those long summer evenings
- Spanner: Becoming a SQL System Google Research How Google Spanner evolved from a distributed key-value store to a full SQL system
- Dremel: Interactive Analysis of Web-Scale Datasets Google Research (2011) The foundational paper that inspired Apache Parquet and columnar analytics
- Lakehouse: A New Generation of Open Platforms Databricks Founders Unifying data warehousing based on files and open table format
Glossaries
Deepen your knowledge with interconnected digital gardens and comprehensive glossaries
- Data Engineering Wiki by Reddit Community Community-driven knowledge base covering data engineering fundamentals
- Modern Data Definitions Glossary by Secoda Comprehensive glossary of modern data terminology and concepts
- Data Engineering Vault by Me Interconnected second brain with curated data engineering resources
Courses
Perfect for learning new skills during your summer downtime
- Efficient Data Processing in Spark by Joseph Machado Practical course on learning Spark basics and beyond
- Free Data Engineering Course by DataTalksClub 9-Week Hands-on bootcamp covering the fundamentals of data engineering
- Learn Data Engineering by Andreas Kretz Comprehensive learning platform and academy covering the full stack (paid)
Getting Started
Essential resources to begin your data engineering journey
- Data Engineering Trilogy by Maxime Beauchemin Start with the rise and downfall of the data engineer from 2017, then continue with explanations of functional data engineering
- Free DE Course for Beginners by freeCodeCamp A complete 3-hour course to learn the essentials
- People of Data Engineering by me Follow the data engineering people on their newsletter, social or blogs
Books
Essential reading for those lazy beach days and poolside relaxation
- The Data Warehouse Toolkit by Ralph Kimball (1996) The definitive guide to data warehouse modeling, still relevant today
- Fundamentals of Data Engineering by Reis & Housley First book about fundamentals and concepts behind data engineering systems
- Designing Data-Intensive Applications by Kleppmann The best book about working with distributed data and computing principles
Open Source Projects
Hands-on experience with open-source data engineering projects
- Opendata Stack Platform Scalable data platform Collection of projects and pipelines built with open data stack tools
- Finnhub Streaming Pipeline Spark, Kafka, Kubernetes Real-time financial data streaming pipeline with modern tools
- Curated List of Open-Source Projects From simple to advanced End-to-end project from web scraping to visualization with any data stack
Communities
Connect with fellow data engineers and expand your network
- Practical Data Engineering Discord by Joe Reis Super active and welcoming community for data engineering practitioners
- Technical Freelancer Academy by Ben Rogojan Community for technical freelancing, but also a lot of active practitioners from the data space
- Locally Optimistic Slack by Locally Optimistic Aspiring analytics leaders discuss and share lessons and challenges from their experience
Trends
What's heating up this summer in the data engineering world
- Lakehouse like Architectures Iceberg, Delta, DuckDB Build warehouses on distributed object storage with open table formats
- Rise of Declarative Data Stack Modular open data stacks A set of tools and its configs can be defined in a single function
- Code-First and Conversational BI Integration MCP frameworks, configurable tools LLM-contextual tools that automate data workflows and fulfill self-serve BI
Rising Tools
Lightning-fast tools to beat the summer heat in your data stack
- DuckDB Lightweight OLAP Super fast and lightweight (just MBs) analytical database
- Rill Developer Code-first BI tool The BI tool that supports both code-first and fast data exploration
- ClickHouse Petabyte-scale analytics Faster version of DuckDB that scales to petabytes and beyond
🏖️ Ready to dive deeper?
Check out the deep dives that originate from real business use-cases and challenges.
Explore the Deep Dives