Ensuring Data Quality in Metals Manufacturing: Techniques and Challenges with SCADA and Databricks
In this episode of the Smart Metals Podcast, hosts Luke van Enkhuizen and Denis Gontcharov explore the critical topic of data quality in metals manufacturing, with a strong focus on SCADA systems and modern cloud platforms like Databricks. Denis kicks off with a big announcement: his business is now refocused on integrating legacy SCADA architectures with scalable cloud-native environments such as Azure Databricks. Together, Luke and Denis dive into the key challenges of aligning SCADA data with business use cases, the erosion of trust caused by bad data, and the urgent need for automated monitoring. The discussion emphasizes how companiesâfrom SMBs to enterprisesâcan implement robust data quality testing using open-source frameworks like Soda and Great Expectations. Youâll learn how to embed testing into ETL pipelines, use Databricks to store and analyze data reliably, and ensure high-quality inputs within a Unified Namespace (UNS).  Timestamps: 00:00 Introduction to the Smart Metals Podcast 00:44 Big Announcement: Refocusing Business Activities 01:12 Understanding SCADA and Data Quality Challenges 04:37 Importance of Data Quality in Manufacturing 07:22 Real-World Data Quality Issues and Consequences 11:04 Steps to Ensure High Data Quality 27:00 Open Source Solutions for Data Quality Testing  Notable Quotes: âSCADA is essentially the second layer of the automation pyramidâsupervisory control and data acquisition. It collects data from PLCs and individual machines. The challenge is moving this high-frequency, millisecond-level time series data to the cloud. Data quality is one of the key problems in this area.â â Denis GontcharovâMy new focus is helping companies integrate legacy SCADA systems into modern platforms like Azure Databricks, where they can finally get control over their industrial data.â â Denis GontcharovâAlmost any factory using modern machinery has multiple layersâsensors, PLCs, SCADA, MES, ERP, and eventually the cloud. Much of this may be hidden inside vendor-specific solutions, but understanding these layers is essential.â â Luke van EnkhuizenâBad data completely erodes trust. If your dashboard shows an off number and you canât explain it, users stop trusting your data platformâno matter if itâs SCADA or Databricks behind the scenes.â â Denis GontcharovâYou canât manually verify data coming from hundreds of time series across SCADA systems. You need an automated application watching your data 24/7 and flagging anomalies before they affect operations.â â Denis GontcharovâWhere should you do data quality checks? Ideally, inside your pipelineâafter transformationsâwhether youâre using SCADA historians or sending data into Databricks. This prevents dirty data from entering your clean system.â â Denis GontcharovâETL stands for extract, transform, load. As you bring SCADA data into Databricks or your UNS, every step must be monitored and tested.â â Denis GontcharovâJust like raw ore needs refining before it becomes usable gold, raw SCADA data must be cleaned, structured, and testedâoften inside platforms like Databricksâto unlock its real business value.â â Luke van Enkhuizen Relevant Links: đ Follow the show: https://smartmetals.transistor.fm/đ About Denis Gontcharov: https://gontcharov.eu/đ About Luke van Enkhuizen: https://vanenkhuizen.com/





