Satellite image deep learning

Hosted by Robin Cole

Technology ScienceInterviews guests

Website RSS feed

Episodes

Latest episode

Jun 2026

Language

About the show

Dive into the world of deep learning for satellite images with your host, Robin Cole. Robin meets with experts in the field to discuss their research, products, and careers in the space of satellite image deep learning. Stay up to date on the latest trends and advancements in the industry - whether you’re an expert in the field or just starting to learn about satellite image deep learning, this a podcast for you. Head to https://www.satellite-image-deep-learning.com/ to learn more about this fascinating domain www.satellite-image-deep-learning.com

Listen to episodes

45 recent

June 17, 202632 min

A Single GPU Is All You Need for Self-Supervised Pretraining

In this episode I sat down with Lakshay Sharma, a machine learning scientist at Instacart and former member of Microsoft’s geospatial AI team, to discuss self-supervised learning for remote sensing and his recent research on efficient pretraining for semantic segmentation. Lakshay explains the evolution of self-supervised learning, covering predictive, generative, and contrastive approaches, and discusses how foundation models such as DINO have transformed computer vision and geospatial machine learning. We explore the unique challenges of applying these techniques to remote sensing imagery, where assumptions that work for natural images often break down.We then dive into Lakshay’s recent paper, Sub-Image Overlap Prediction: Task-Aligned Self-Supervised Pretraining for Semantic Segmentation in Remote Sensing Imagery, presented at the Computer Vision for Earth Observation Workshop at WACV 2026. He walks through the intuition behind the method, which trains models to localize extracted sub-images within larger scenes as a proxy task for semantic segmentation. We discuss the experimental setup, comparisons against established self-supervised learning approaches, and the surprising finding that the method achieves competitive or superior results using only thousands of pretraining images rather than millions. Along the way, we explore transfer learning across datasets, the growing importance of data efficiency, and why targeted pretraining may offer a compelling alternative to increasingly resource-intensive foundation model development for niche geospatial applications.* 📺 Video of this conversation on YouTube* 👤 Lakshay on LinkedIn* 🖥️ Personal website of Lakshay* 📖 PaperBio: Lakshay Sharma is a Senior Machine Learning Scientist / Engineer at Instacart. His research spans Computer Vision (CV) and Vision-Language Models (VLMs) with a focus on Self-Supervised and Semi-Supervised Learning. He has previously worked at Microsoft on multi-modal representation learning, and using aerial/satellite and streetside imagery for maps and geospatial applications. He has also worked at Amazon where he was focused on representation learning for videos. Based in New York City, Lakshay is an avid fan of soccer, snowboarding, and cricket. He often daydreams of some day applying his computer vision chops to sports. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.satellite-image-deep-learning.com

June 10, 202634 min

Mapping The World at Taylor Geospatial

In this episode I sat down with Jennifer Marcus and Isaac Corley from Taylor Geospatial to explore Fields of the World - an open initiative to create globally consistent agricultural field boundary datasets from satellite imagery using AI and cloud-native geospatial infrastructure. Taylor Geospatial, a newly formed research organization, is building openly licensed global datasets as foundational public goods. Jen and Isaac explain the motivation behind the project, the challenges of scaling machine learning beyond well-labelled regions, and why openness in datasets, tooling, and intermediate model outputs, is central to their approach.We dive into the technical details behind the first global release: assembling noisy and uneven benchmark datasets from around the world, training models that generalise across diverse agricultural systems, and releasing everything from Sentinel-2 mosaics and raw segmentation probabilities to polygonised field boundaries through Source Cooperative. Along the way, we discuss community-driven improvement loops inspired by OpenStreetMap, the limitations of 10 m imagery for smallholder agriculture, and the importance of pairing academic researchers with engineering teams to rapidly operationalise new methods. Finally, we look ahead to Taylor Geospatial’s next phase - richer agricultural datasets, “Features of the World,” and a benchmarking initiative aimed at improving evaluation standards and reproducibility across geospatial foundation models.* 📺 Video of this conversation on YouTube* 🖥️ Taylor Geospatial website* 🖥️ FTW websiteBio: Jennifer Marcus is Vice President of Strategic Innovation Programs at Taylor Geospatial, where she advances partnerships and programs that translate breakthrough geospatial AI research into real-world impact. With deep experience across defence, federal government, and open-source geospatial ecosystems, Jennifer brings decades of expertise translating emerging technologies into mission-critical impact. She previously served as the inaugural Executive Director of Taylor Geospatial Engine, which in 2024, launched what would become Fields of The World, and has held leadership roles at Planet, Boundless Spatial, and Northrop Grumman.Bio: Isaac Corley is Director of AI/ML Research at Taylor Geospatial, where he leads a team to build the models behind earth observation research and to create open data products that elevate the geospatial market and community as a whole. Isaac builds and publishes geospatial AI from research through production, including the RasterFlow platform at Wherobots, which was used to run Fields of The World. He has served as PI on the IARPA SMART program at BlackSky and maintains widely-used open-source projects, including TorchGeo and SMP. Check out his blog with Caleb Robinson at geospatialml.com. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.satellite-image-deep-learning.com

April 29, 202626 min

BetaEarth: Open Embeddings of Sentinel-2 and Sentinel-1 with a Little Help of AlphaEarth

In this episode I sat down with Mikolaj (Miko) Czerkawski from Asterisk Labs to explore BetaEarth, an experimental open-source emulator trained on AlphaEarth Foundations' public embedding archive. AEF — released by Google and Google DeepMind as a global 10 m embedding product derived from a wide range of Earth-observation modalities — is what makes BetaEarth possible: its openness lets the community build lightweight independent emulators that approximate AEF's pixelwise outputs from standard Sentinel inputs, and use them to probe how much of a model's behaviour is captured in its public embeddings. Miko walks through BetaEarth's design — compact architectures based on SegFormer-B2 with separate per-modality encoders, and a shared DINOv3 backbone over 3-band spectral primitives — and the surprising finding that reasonably strong approximations can be achieved even from simple RGB inputs.We then dive into a live demo: generating BetaEarth embeddings for arbitrary regions and time ranges using Sentinel-1, Sentinel-2, and COP-DEM data. Along the way, we cover practical considerations such as cloud contamination, modality trade-offs, tiling artefacts, and strategies for merging multi-temporal signals. Finally, we discuss what this complementary tooling enables for the geospatial ML community — embeddings as pretraining or regularisation signals, lightweight local inference alongside AEF's global annual rasters, and what the combination of large proprietary archives and open emulator-style tools could unlock next.* 📺 Video of this conversation & demo on YouTube* 🖥️ BetaEarth Github page* 🖥️ BetaEarth demo on HuggingfaceBio: Miko is a researcher specialising in AI, computer vision, signal processing and Earth observation. Before co-founding Asterisk Labs he was a postdoctoral research fellow at the European Space Agency. His research interests include data-centric analyses of large-scale Earth observation data, dataset curation, generative modelling, and restoration tasks for satellite imagery. He is a co-founder of the Major TOM community project, a platform for collaborating and reusing Earth observation datasets designed specifically for AI pipelines. He received the B.Eng. degree in electronic and electrical engineering in 2019 from the University of Strathclyde in Glasgow, United Kingdom, and the Ph.D. degree in 2023 at the same institution, specialising in applications of computer vision to Earth observation data. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.satellite-image-deep-learning.com

April 23, 202611 min

Geospatial Annotation with LabelMe and Segment Anything

In this episode I sat down with Kentaro Wada, a computer vision engineer at Mujin and creator of LabelMe, to explore the evolution of image annotation workflows. We discuss how his need to label data for a robotics challenge led to building one of the most widely used open-source annotation tools, and how it has evolved alongside the shift from traditional computer vision to deep learning. Kentaro explains the impact of foundation models like Segment Anything (SAM), and how annotation is rapidly moving toward a prompt-and-verify paradigm where models do the heavy lifting and humans focus on quality control. We also dive into his recent work integrating SAM into LabelMe, the challenges of applying these models to satellite imagery, and why approaches like bounding-box prompting outperform text in that domain. Finally, we cover new support for large, multi-channel geospatial data, practical deployment considerations, and what this means for scaling annotation in real-world machine learning systems. Note that a recording of this conversation, along with a demonstration of geospatial annotation using LabelMe, is available on YouTube via the links below:* 🖥️ LabelMe website* 🖥️ Kentaro’s personal website* 📺 Video of this conversation on YouTube* 📺 Demo video on YouTube Bio: Kentaro Wada was born in Japan in 1994. He received his B.Sc. (2016) and M.Sc. (2018) from Mechanical Engineering and Computer Science Department in The University of Tokyo (UTokyo). In his research at UTokyo, he was working on learning-based scene understanding for robotic manipulation at JSK Laboratory supervised by Prof. Masayuki Inaba and Prof. Kei Okada. He received his PhD in 2022, at Dyson Robotics Laboratory in Imperial College London supervised by Prof. Andrew Davison. During his PhD, he worked on object-level semantic scene understanding, a general scene representation useful for robotic manipulation, and showed several novel capabilities of robots. He joined Mujin, Inc. in 2022 as a computer vision engineer, and is working on advancing robots' capabilities in the real-world environment. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.satellite-image-deep-learning.com

April 1, 202630 min

Mapping South America and Beyond with Fields of The World V2

In this episode I sat down with Hannah Kerner and Tristan Grupp to discuss Fields of The World (FTW), an open-source benchmark and ecosystem for global field boundary segmentation from satellite imagery. We explore the core challenge of building models that generalise across vastly different agricultural systems, and why data diversity, rather than model architecture, is often the limiting factor. Hannah and Tristan explain how targeted annotation in underperforming regions can dramatically improve results, how combining global and local training data avoids catastrophic forgetting, and what they learned from large-scale model experimentation. We also dig into practical evaluation beyond standard IOU metrics, including consistency and throughput, and how small modelling choices like boundary loss weighting can have outsized impact on usability. Finally, we cover the growing tooling ecosystem, real-world user feedback, and what’s coming next, including improved models and a global map of predicted field boundaries.* 🖥️ FTW website* 📺 Recording of this conversation on YouTubeBio Hannah: Hannah Kerner is an Assistant Professor in the School of Computing and Augmented Intelligence at Arizona State University. Her research focuses on advancing the foundations and applications of machine learning to foster a more sustainable, responsible, and fair future for all. Her lab’s research topics include machine learning for remote sensing, algorithmic bias, and machine learning theory. She translates research advances to real-world impact through her roles as the AI/Machine Learning Lead for NASA Harvest and NASA Acres, Center Faculty for the ASU Center for Global Discovery and Conservation Science (GDCS), and Research Director for Taylor Geospatial. She has been recognised by multiple research awards including NSF CAREER (2025), Schmidt Sciences AI2050 Early Career Fellowship (2025), and Forbes 30 Under 30 in Science (2021). Bio Tristan: Tristan Grupp is an Agricultural Data Scientist in the Food, Land, and Water Program and Data Lab at the World Resources Institute. He collaborates closely with Land and Carbon Lab. His current research focuses on applying remote sensing and machine learning to monitor deforestation and natural land conversion driven by agricultural supply chains, supporting commodity traceability and corporate sustainability compliance, including under the EU Deforestation Regulation (EUDR). His work spans forest change monitoring, climate adaptation, and the intersections of food systems and natural landscapes. Beyond WRI, Grupp has contributed to research on climate change adaptation tracking in support of national adaptation planning under the UNFCCC, protected area policy evaluation in the EU, and tropical forest dynamics in the Peruvian Amazon. He has presented his work at international venues including AGU, COP, and the UN National Adaptation Planning Conference. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.satellite-image-deep-learning.com

February 4, 202630 min

State Of The Art Object Detection

In this episode I sat down with Isaac to discuss RF-DETR, a new state-of-the-art family of real-time object detection and segmentation models from Roboflow. We cover the motivation for building models that are not just accurate but also fast, cost-efficient, and deployable across diverse hardware and data regimes, and why moving beyond fixed architectures is key to achieving that. Isaac explains how RF-DETR combines strong foundation backbones like DINOv2 with efficient neural architecture search to unlock novel speed–accuracy trade-offs, including dropping decoder layers and queries after training. We also discuss the model’s strong transfer performance on domains far from COCO, the introduction of a memory-efficient instance segmentation head, and the team’s unusually rigorous benchmarking approach, before closing on the challenges of open-source research and upcoming improvements to inference and platform integration.* 👤 Isaac on LinkedIn* 🖥️ RF-DETR on Github* 📖 Paper* 📺 Video of this conversation on YouTubeBio: Isaac Robinson is a Machine Learning Research Engineer at Roboflow. He’s worked across the field of computer vision, from real-time stereo depth estimation on household robots to biomedical research at the NIH to founding a zero shot computer vision infrastructure startup. Isaac focusses on the intersection of low latency and high performance, with the goal of helping people unlock new capabilities through vision. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.satellite-image-deep-learning.com

January 21, 202623 min

Tessera: A Temporal Foundation Model for Earth Observation

In this episode I caught up with Sadiq Jaffer and Frank Feng to discuss Tessera, a large-scale foundation model for Earth observation that produces annual, pixel-level temporal embeddings from multi-sensor satellite data. They explain why moving beyond single-date imagery is essential for understanding phenology, land cover, and environmental change, and how aggregating a full year of Sentinel-1 and Sentinel-2 observations enables far richer representations of the Earth’s surface. We dive into the unique engineering challenges behind Tessera, including its unusual cost profile where inference is more expensive than training, the need to ingest petabyte-scale archives, and the design choices required to scale a pixel-based model without representation collapse. Frank walks through their self-supervised training strategy based on redundancy reduction (Barlow Twins), while Sadiq highlights how downstream evaluations—from wildfire analysis to land-cover mapping—demonstrate that the embeddings already encode meaningful temporal and semantic structure. We also discuss the practical impact for ecology and conservation, where Tessera dramatically accelerates research workflows and reduces label requirements, and look ahead to Tessera v2, which will incorporate Landsat data to extend embeddings back to the 1970s and unlock new capabilities in change detection and forecasting.* 📺 This conversation on YouTube* 🖥️ Tessera on Github* 📖 Paper* 🖥️ Franks website* 🖥️ Sadiqs websiteSlides discussed in the episode This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.satellite-image-deep-learning.com

December 12, 202520 min

AutoML for Spaceborne AI

In this episode I caught up with Roberto del Prete to learn about his work on AutoML for in-orbit model deployment, and how it enables satellites to run highly efficient AI models under severe power and hardware constraints. Roberto explains why traditional computer-vision architectures—optimised for ImageNet or COCO—are a poor fit for narrow, mission-specific tasks like wildfire or vessel detection, and why models must be co-designed with the actual edge devices flying in space. He describes PyNAS, his neural architecture search framework, in which a genetic algorithm drives the optimisation process, evolving compact, hardware-aware neural networks and profiling them directly on representative onboard processors such as Intel Myriad and NVIDIA Jetson. We discuss the multiobjective challenge of balancing accuracy and latency, the domain gap between training data and new sensor imagery, and how lightweight models make post-launch fine-tuning and updates far more practical. Roberto also outlines the rapidly changing ecosystem of spaceborne AI hardware and why efficient optimisation will remain central to future AI-enabled satellite constellations.* 🖥️ PyNAS on Github* 📖 Nature paper* 📺 Video of this conversation on YouTube* 👤 Roberto on LinkedInBioRoberto is an Internal Research Fellow at ESA Φ-lab specialising in deep learning and edge computing for remote sensing. He focuses on improving time-critical decision-making through advanced AI solutions for space missions and Earth monitoring. He holds a Ph.D. at the University of Naples Federico II, where he also earned his Master’s and Bachelor’s degrees in Aerospace Engineering. His notable work includes the development of “FederNet,” a terrain relative navigation system. Del Prete’s professional experience includes roles as a Visiting Researcher at the European Space Agency’s Φ-Lab and SmartSat CRC in Australia. He has contributed to key projects like Kanyini Mission, and developed AI algorithms for real-time maritime monitoring and thermal anomaly detection. He co-developed the award-winning P³ANDA project, a compact AI-powered imaging system, earning the 2024 Telespazio Technology Contest prototype prize. Co-author of more than 30 scientific publications, Del Prete is dedicated to leveraging advanced technologies to address global challenges in remote sensing and AI. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.satellite-image-deep-learning.com

December 5, 202516 min

Methane Plume Detection with AutoML

In this episode I caught up with Julia Wąsala to learn about methane plume detection using AutoML, and how her research bridges atmospheric science and machine learning. Julia explains the unique challenges of working with TROPOMI data—extremely coarse spatial resolution, single-channel methane measurements, and complex auxiliary fields that sometimes create plume-like artefacts leading to false detections. She walks through how her approach generalises a traditional two-stage detection pipeline to multiple gases using AutoMergeNet, a neural architecture search framework that automatically designs multimodal CNNs tailored to different atmospheric gases. We discuss why methane matters, how model performance shifts dramatically between curated test sets and real-world global data, and the ongoing effort to understand sampling bias and improve operational precision.* 📖 AutoMergeNet paper* 🖥️ Code on Github* 🖥️ Julia’s homepage* 📺 Recording of this conversation on YouTubeBio: Julia Wąsala is currently working toward the Ph.D. degree in automated machine learning for Earth observation with the Leiden Institute for Advanced Computer Science, Leiden University, Leiden, The Netherlands, and with Space Research Organisation Netherlands, Leiden, The Netherlands. Her research focuses on the field of automated machine learning for earth observation focuses on designing new methods and validating them in real-world applications, such as atmospheric plume detection. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.satellite-image-deep-learning.com

November 26, 202523 min

Democratising access to GeoAI with InstaGeo

In this episode, I caught up with Ibrahim Salihu Yusuf from InstaDeep’s AI for Social Good team to hear the story behind InstaGeo, an open-source geospatial machine learning framework built to make multispectral satellite data easy to use for real-world applications. Ibrahim explains how the 2019–2020 locust outbreak exposed a gap between freely available satellite imagery, existing machine learning models, and the lack of tools to turn raw data into model-ready inputs. He walks through how InstaGeo bridges this gap - fetching, processing, and preparing multispectral data; fine-tuning models such as NASA IBM’s Prithvi; and delivering end-to-end inference and visualisation in a unified app. The conversation also covers practical use cases, from locust breeding ground detection to damage assessment, air quality, and biomass estimation, as well as the team’s efforts to partner with field organisations to drive on-the-ground impact.* 👤 Ibrahim on LinkedIn* 🖥️ InstaGeo on Github* 📖 Paper on InstaGeo* 📺 Video of this conversation on YouTube* 📺 Demo of InstaGeo on YouTubeBio: Ibrahim is a Senior Research Engineer and Technical Lead of the AI for Social Good team at InstaDeep’s Kigali office, where he applies artificial intelligence to address real-world challenges and drive social impact across Africa and beyond. With expertise spanning geospatial machine learning, computer vision, and computational biology, he has led high-impact projects in food security, disaster response, and immunology research. He also leads the development of InstaGeo, a platform designed to democratise access to AI-powered insights from open-source satellite imagery, reflecting his commitment to using cutting-edge AI for meaningful societal benefit. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.satellite-image-deep-learning.com