The MLOps Podcast

Hosted by Dean Pleban @ DagsHub

TechnologyInterviews guests

Website RSS feed

Episodes

Latest episode

May 2025

Language

About the show

A podcast from DagsHub about bringing machine learning into the real world. Each episode features a conversation with top data science and machine learning practitioners, who'll share their thoughts, best practices, and tips for promoting machine learning to production

Listen to episodes

35 recent

May 13, 2025Episode 1252 min

👏 A Practical Approach to Building LLM Applications with Liron Itzhaki Allerhand

Dean Pleban and Liron Itzhakhi Allerhand explore what it really takes to move LLMs into production. They cover how to define clear requirements, prep data for RAG, engineer effective prompts, and evaluate model performance using concrete metrics. The conversation dives into managing sensitive data, avoiding leakage, and why crisp outputs and clear user intent matter. Plus: future trends like in-context learning and the decoupling of foundation models from vertical apps.Join our Discord community:https://discord.gg/tEYvqxwhah ---Timestamps:00:00 Introduction01:48 Phases of LLM Project Development03:32 Defining the Problem09:35 Data Preparation and Understanding23:59 Multimodal RAG26:28 Prompt Engineering & Model Selection27:58 Model Fine-tuning & Customization33:18 LLM as a Judge38:58 Evaluating Model Performance and Handling Hallucinations41:02 Using LLMs with sensitive data45:24 Other ideas for model evaluation and guardrails49:28 Recommendations for the audience➡️ Liron Itzhaki Allerhand on LinkedIn – https://www.linkedin.com/in/liron-izhaki-allerhand-16579b4/🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://x.com/TheRealDAGsHub ➡️ Dean Pleban: https://x.com/DeanPlbn

December 16, 2024Episode 1135 min

📡 Building Scalable ML Models with Natanel Davidovits

In this episode, Dean and Natanel Davidovits explore the intricacies of AI and machine learning, focusing on model efficiency, the use of APIs versus self-hosting, and the importance of defining success metrics in real-world applications. They discuss the challenges of data quality and labeling, the evolving role of data scientists in the age of LLMs, and the significance of effective communication between data science and product teams. The conversation also touches on the future of robotics in AI and the need for specialization in a rapidly changing landscape. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction to Natanel Davidovits 02:10 Optimizing AI Models for Real-World Tasks 03:47 Success Metrics in Industry vs. Academia 07:52 The Importance of Communication Between Teams 11:33 Handling Data Quality and Labeling Challenges 12:11 The Impact of LLMs on Data Science Careers 16:29 Navigating Specialized Domain Data 22:15 Trends in Machine Learning and AI 27:27 The Future of AI and Robotics 28:28 The Role of AI in Physics 33:36 Controversial Views on AI and Machine Learning 34:05 Final Thoughts and Recommendations ➡️ Natanel Davidovits on LinkedIn – https://www.linkedin.com/in/natanel-davidovits-28695312/ 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://x.com/TheRealDAGsHub ➡️ Dean Pleban: https://x.com/DeanPlbn

October 31, 2024Episode 1050 min

💼 AI in the Enterprise with Jeremie Dreyfuss

In this episode, Dean speaks with Jeremie Dreyfuss, Head of AI Research and Development at Intel, about the evolving role of AI in the enterprise. Jeremie shares insights into scaling machine learning solutions, the challenges of building AI infrastructure, and the future of AI-driven innovation in large organizations. Learn how enterprises are leveraging AI for efficiency, the latest advancements in AI research, and the strategies for staying competitive in a rapidly changing landscape. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction and Overview 00:55 Challenges of Data Collection and Infrastructure 05:00 Optimizing Test Recommendations 14:42 Tips for Deploying Entire ML Pipelines 21:19 The Impact of Large Language Models (LLMs) 25:30 How to Decide About LLM Investment in the Enterprise 29:29 Evaluating Models and Using Synthetic Data 35:34 Choosing the Right Tools for ML and LLM Projects 45:21 The Beauty of Small Data in Machine Learning 48:22 Recommendations for the Audience ➡️ Jeremie Dreyfuss on LinkedIn – https://www.linkedin.com/in/jeremie-dreyfuss/ 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://x.com/TheRealDAGsHub ➡️ Dean Pleban: https://x.com/DeanPlbn

September 15, 2024Episode 950 min

🌲 Machine Learning in Agriculture: Scaling AI for Crop Management with Dror Haor

In this episode, Dean speaks with Dror Haor, CTO at SeeTree, about the challenges of deploying AI in agriculture at scale. They explore how SeeTree integrates AI and sensor fusion to manage vast amounts of remote sensing data, helping farmers improve crop yields with high accuracy at low costs. Dror shares insights on handling data drift, customizing models for different regions, and balancing the trade-offs between cost and performance. This conversation dives deep into practical machine learning applications in agriculture, offering valuable lessons for anyone working with large-scale data and AI. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction 00:32 Production in machine learning at SeeTree 07:34 Sensor fusion in machine learning 16:26 Balancing accuracy and cost in agriculture 20:09 Customizing models for different customers and crops 24:19 Dealing with data in different domains 30:10 Tools and processes for ML at SeeTree 35:58 Building for scale 40:17 Collecting user feedback and self-improving products 42:45 Exciting developments in ML & AI 45:12 Hot takes in ML - Overfitting is good 46:34 Recommendations for the Audience ➡️ Dror Haor on LinkedIn – https://www.linkedin.com/in/dror-haor-phd-77152322/ ➡️ Dror Haor on Twitter – https://x.com/DrorHaor 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://x.com/TheRealDAGsHub ➡️ Dean Pleban: https://x.com/DeanPlbn

August 15, 2024Episode 839 min

📊 Data-Driven Decisions: ML in E-Commerce Forecasting with Federico Bacci

In this episode, Dean speaks with Federico Bacci, a data scientist and ML engineer at Bol, the largest e-commerce company in the Netherlands and Belgium. Federico shares valuable insights into the intricacies of deploying machine learning models in production, particularly for forecasting problems. He discusses the challenges of model explainability, the importance of feature engineering over model complexity, and the critical role of stakeholder feedback in improving ML systems. Federico also offers a compelling perspective on why LLMs aren't always the answer in AI applications, emphasizing the need for tailored solutions. This conversation provides a wealth of practical knowledge for data scientists and ML engineers looking to enhance their understanding of real-world ML operations and challenges in e-commerce. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction and Background 01:59 Owning the ML Pipeline 02:56 Deployment Process 05:58 Testing and Feedback 07:40 Different Deployment Strategies 11:19 Explainability and Feature Importance 13:46 Challenges in Forecasting 22:33 ML Stack and Tools 26:47 Orchestrating Data Pipelines with Airflow 31:27 Exciting Developments in ML 35:58 Recommendations and Closing Links Dwarkesh podcast with Anthropic and Gemini team members – https://www.dwarkeshpatel.com/p/sholto-douglas-trenton-bricken ➡️ Federico Bacci on LinkedIn – https://www.linkedin.com/in/federico-bacci/ ➡️ Federico Bacci on Twitter – https://x.com/fedebyes 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://x.com/TheRealDAGsHub ➡️ Dean Pleban: https://x.com/DeanPlbn

July 15, 2024Episode 739 min

🚗 Driving Innovation: Machine Learning in Auto Claims Processing

In this episode, Dean speaks with Michał Oleszak, an ML engineering manager at Solera. Michał shares insights into how his team is using machine learning to transform the automotive claims process, from recognizing vehicle damages in images to estimating repair costs. The conversation covers the challenges of deploying ML pipelines in production, managing data quality for computer vision tasks, and balancing technical implementation with business needs. Michał also discusses his approach to model evaluation, the benefits of monorepo architecture, and his views on exciting developments in self-supervised learning for computer vision. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction 00:42 Production for Machine Learning at Solera 03:49 Transitioning from Images to Structured Data 04:58 Combining Deep Learning and Non-Deep Learning Models 05:15 Deployment Process for Machine Learning Models 08:01 Challenges and Solutions in Monorepo Adoption 12:57 Evaluating Model and Pipeline Versions 21:57 Tools for ML Projects: Monorepo, Pants, GitHub Actions 24:04 Data Management and Data Quality 30:14 Challenges in ML Efforts: Data Quality 30:37 Excitement about Self-Supervised Learning and JEPA Architectures 34:45 Controversial Opinion: Importance of Statistics for ML 36:40 Recommendations Links 🌎Prisoners of Geography by Tim Marshall: https://www.amazon.com/Prisoners-Geography-Explain-Everything-Politics/dp/1501121472 ➡️ Michał Oleszak on LinkedIn – https://www.linkedin.com/in/michal-oleszak/ ➡️ Michał Oleszak on Twitter – https://x.com/MichalOleszak 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

June 10, 2024Episode 650 min

🚑 ML in the Emergency Room with Ljubomir Buturovic

In this episode, I chat with Ljubomir Buturovic, VP of ML and Informatics at Inflammatix. We discuss using ML to diagnose infections and blood tests in the emergency room. We dive into the challenges of building diagnostic (classification) and prognostic (predictive) modes, with takeaways related to building datasets for production use cases. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 What is Inflammatix and how do they use ML7:32 Edge Device Deployment: The Future of Model Deployment21:16 Navigating Regulatory Submission for Medical Products 26:01 Evolution of Regulatory Processes in ML for Medical Applications30:18 Challenges and Solutions in ML for Medical Applications 34:00 The Future of AI in Clinical Care40:25 The Overrated Concept of Interpretability in AI and ML45:32 RecommendationsLinks 🌎📈 Our world in data: https://ourworldindata.org/ 🚀 Profiles of the future: https://www.amazon.com/Profiles-Future-Arthur-C-Clarke-ebook/dp/B00BY7GITK ➡️ Ljubomir Buturovic on LinkedIn – https://www.linkedin.com/in/ljubomir-buturovic-798156/ ➡️ Ljubomir Buturovic on Twitter – https://x.com/ljbuturovic 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

May 16, 2024Episode 51 hr 2 min

🌊 AI-Native with Idan Gazit – The future of AI products and interfaces + Getting AI to production

In this episode, Idan Gazit, Senior Director of Research at GitHub Next, discusses his role in exploring strategic technologies and incubating long bet projects. He explains how the GitHub Next team chooses research projects and the process of exploration and theme selection. Idan also shares insights into the ML focus at GitHub Next and the challenges of evaluating the impact of AI products. He reflects on his journey into the AI space and provides advice for testing AI products in smaller organizations. Finally, he shares his thoughts on the future of AI interfaces. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction and Background 00:56 Choosing Research Projects at GitHub Next 06:09 ML Focus in GitHub Next 10:52 ML Work and the Leaky Abstraction 13:16 Idan's Journey into the AI Space 17:54 Evaluating the Impact of AI Products 24:36 Testing AI Products in Smaller Organizations 32:52 The Future of AI Interfaces 40:01 Transitioning from Prototype to Product 46:45 Challenges in the ML/AI Space 56:03 Recommendations ➡️ Idan Gazit on LinkedIn – https://www.linkedin.com/in/idangazit/ ➡️ Idan Gazit on Twitter – https://twitter.com/idangazit 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

April 18, 2024Episode 432 min

🍪 Machine Learning in the cookie-less era with Uri Goren

In this episode, I chatted with Uri Goren, founder and CEO of Argmax, about Machine Learning and the future of digital advertising in a world moving away from cookies due to privacy laws like GDPR and CCPA. We chat about challenges in maintaining personalized ads while respecting user privacy, and new methods like probabilistic models and contextual features to cover some of the gap left by removing cookies. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Introduction 00:35 The Rise of Privacy Regulations 1:40 The Impact of Losing Cookies 2:48 Understanding Cookies 4:33 Reasons for the Decline of Cookies 8:47 ML Leveraging Cookies in Advertising 10:32 The Shift to Contextual Features 12:53 The Future of ML without Cookies 15:23 New and Old Ways of Generating Contextual Features 20:33 Regulatory Conspiracies 22:33 Unsolved Problems in ML and AI 24:39 Predictions for the Next Year in AI and ML 26:17 Controversial Take: Overuse of LLMs 28:03 Recommendations ➡️ Uri Goren on LinkedIn – https://www.linkedin.com/in/ugoren/ 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn

March 18, 2024Episode 31 hr 5 min

🛰️ Modern & Realistic MLOps with Han-chung Lee

In this episode, I speak with Han-Chung Lee, a machine learning engineer with a lot of interesting takes on ML and AI. We dive into the buzz around natural language processing and the big waves in generative AI. They chat about how newcomers are racing through NLP’s history, mixing old school and new tech, and the shift towards smarter databases. Han-Chung breaks it down with his straightforward takes, making complex AI trends feel like coffee chat topics. It’s a perfect listen for anyone keen on where AI’s headed, minus the jargon. Join our Discord community: https://discord.gg/tEYvqxwhah --- Timestamps: 00:00 Intro 0:41 State of NLP and LLMs 1:33 Repeating the past in NLP 3:29 Vector databases vs. classical databases 8:49 Choosing the right LLM for an application 12:13 Advantages and disadvantages of LLMs 16:10 Where LLMs are most useful 21:13 The dark side of LLMs and can we detect it? 25:19 Thoughts on LLM leaderboard metrics 31:19 Using LLMs in regulated industries 36:40 Creating a moat in the LLM world 40:20 Evaluating LLMs 44:20 Impact of LLM on non-english languages 48:35 Thoughts on MLOps and getting ML into production 56:48 The Hardest Unsolved Problem in ML and AI 59:09 Predictions for the Future of ML and AI 1:03:25 Recommendations and Conclusion ➡️ Han Lee on Twitter – https://twitter.com/HanchungLee ➡️ Han Lee on LinkedIn – https://www.linkedin.com/in/hanchunglee/ 🌐 Check Out Our Website! https://dagshub.com Social Links: ➡️ LinkedIn: https://www.linkedin.com/company/dagshub ➡️ Twitter: https://twitter.com/TheRealDAGsHub ➡️ Dean Pleban: https://twitter.com/DeanPlbn