The Data Engineering Show

Hosted by The Firebolt Data Bros

TechnologyInterviews guests

Website RSS feed

Episodes

Latest episode

May 2026

Language

About the show

The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting. SEASON 1 DATA BROS Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse. SEASON 2 DATA BROS In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space. For inquiries contact tamar@firebolt.io Website: https://www.firebolt.io

Listen to episodes

59 recent

May 7, 2026Episode 5618 min

AI Won't Replace Engineers, But This Framework Will Change How They Build with Rohit Girme

What if you could build AI features with confidence while moving at the pace of innovation? In this episode, Benjamin Wagner sits down with Rohit Girma, Staff Software Engineer at Airbnb, to explore how to evaluate generative AI in production, why breaking down complex problems into smaller chunks accelerates development, and the key strategies for scaling AI-powered products beyond zero-to-one. Whether you're shipping AI features or transforming your engineering workflow, this conversation offers practical insights on building reliable AI systems, leveraging LLMs as orchestration tools, and the future of software development. Tune in to discover why humans remain essential in the scaling phase and how your team can move faster without sacrificing quality.

April 28, 2026Episode 5522 min

The Framework Canva Uses for 200M+ Designers with Paul Tune

In this episode of The Data Engineering Show, Benjamin sits down with Paul Tune, Staff Research Scientist at Canva, to explore the advancement of machine learning at one of the world's leading design platforms. Learn how Canva is transitioning from traditional ML like recommendation engines for templates to cutting-edge agentic workflows that allow users and AI to collaborate on complex design tasks. Whether you're interested in the infrastructure behind distributed training or the nuances of post-training LLMs for aesthetic tasks, this deep dive offers a masterclass in scaling ML for millions of creative users.

April 8, 2026Episode 5422 min

Llama 2 & 3 Safety: Soumya Batra on Agentic AI Training

What if the expertise that built foundation models could reshape how you think about AI's future? In this episode, Benjamin sits down with Soumya Batra, founder and CEO of WisePort AI and former safety lead on Llama 2 and Llama 3 at Meta, to explore how foundation models evolved from traditional NLP, why post-training holds the highest leverage for safety and controllability, and what natively agentic AI means for the next frontier of AI development. Whether you're curious about the model training lifecycle or wondering what comes after large language models, this conversation unpacks the technical strategies and vision shaping tomorrow's AI systems.

March 24, 2026Episode 5318 min

The Data Fusion Secret & Why Custom Query Engines Fail with Nikita Lapkov

What if building a distributed SQL engine meant rethinking everything about how query execution works at scale? In this episode, Benjamin sits down with Nikita, Senior Software Engineer at Cloudflare, to explore how R2 SQL leverages object storage and distributed computing to power analytics across 300 global locations, why backward compatibility becomes critical when you can't control infrastructure rollouts, and the key strategies for handling joins and adaptive query execution in a stateless, point-to-point network architecture. Whether you're designing distributed systems or curious about how Cloudflare processes petabytes of data, this conversation reveals the real-world engineering challenges and innovations shaping the future of cloud data platforms.

March 10, 2026Episode 5224 min

How Zipline AI Turns Weeks of Engineering Into Minutes of SQL Queries ft. Nikhil Simha

What if you could deploy ML features and real-time data pipelines without building complex infrastructure from scratch? In this episode, host Benjamin sits down with Nikhil Simha, CTO at Zipline AI and co-author of Chronon AI, to explore how Chronon, an open-source system that generates data infrastructure from simple queries, is transforming feature engineering at companies like OpenAI and Airbnb. Learn why iteration speed matters for fraud detection, how to serve thousands of signals at a massive scale, and what the future of analytical databases looks like in an AI-first world. Whether you're scaling real-time ML systems or building customer-facing analytics, this conversation is packed with practical insights on bridging the gap between data scientists and ML engineers.

February 19, 2026Episode 5116 min

The Geo-Data Problem Nobody Talks About And How Voi Solved It ft. Magnus Dahlbäck

What if your data platform could power both critical business decisions and real-time product features at scale? In this episode, host Benjamin sits down with Magnus Dahlbäck, Senior Director of Data and Platform at Voi, to explore how a metrics-first approach and semantic layers transform data accessibility, why traditional ML and LLMs require different strategies for different problems, and how to balance FinOps costs while processing billions of IoT events daily. Whether you're building data infrastructure for a high-growth company or rethinking how your organization consumes data, this conversation is packed with practical strategies for unlocking data value and preparing your platform for AI. Tune in to discover how Voi ditched traditional BI tools and revolutionized their approach to enterprise analytics.

February 3, 2026Episode 5029 min

Why 99% of Data Teams Give Up on Real-Time And How Artie Changes That

What happens when a team of seven engineers spends a year trying to build a production-ready CDC connector and fails? For Artie CTO and co-founder Robin Tang, it was the spark needed to build a platform that makes data streaming accessible. In this episode, Robin joins Benjamin to discuss the "DFS" (Deep First Search) approach to data sources, the engineering hurdles of real-time Postgres-to-Snowflake pipelines, and why "theoretically correct" architectures often fail in practice.

December 16, 2025Episode 4925 min

The $100M Problem: How Lyft's Data Platform Prevents ML Failures with Ritesh Varyani at Lyft

What if your data platform could serve AI-native workloads while scaling reliably across your entire organization? In this episode, Benjamin sits down with Ritesh, Staff Engineer at Lyft, to explore how to build a unified data stack with Spark, Trino, and ClickHouse, why AI is reshaping infrastructure decisions, and the strategies powering one of the industry's most sophisticated data platforms. Whether you're architecting data systems at scale or integrating AI into your analytics workflow, this conversation delivers actionable insights into reliability, modernization, and the future of data engineering. Tune in to discover how Lyft is balancing open-source investments with cutting-edge AI capabilities to unlock better insights from data.

November 19, 2025Episode 4819 min

60 Billion Predictions Daily: Inside Credit Karma’s Agentic Data Layer with Maddie Daianu

What does MLOps look like when you are deploying 60 billion machine learning predictions a day? Maddie Daianu, Head of Data and AI at Intuit Credit Karma, joins the Data Bros to pull back the curtain on one of the most high-volume data environments in FinTech. With a 100-person team serving 140 million members, standard data practices break down. Maddie shares how her team manages terabytes of daily data on Google Cloud and explains the massive strategic pivot they are undertaking right now: The move from "Information" to "Agency."

October 7, 2025Episode 4720 min

Block Bad Data Before the Write with Nike’s Ashok Singamaneni

Nike’s Principal Data Engineer Ashok Singamaneni joins Benjamin and Eldad to discuss his open-source data quality framework, Spark Expectations. Ashok explains how the tool, which was inspired by Databricks DLT Expectations, shifts data quality checks to before the data is written to a final table. This proactive approach uses row-level, aggregation-level, and query data quality checks to fail jobs, drop bad records, or alert teams - ultimately saving huge costs on recompute and engineering effort in mission-critical data pipelines.