Talking Papers Podcast

Hosted by Itzik Ben-Shabat

Technology Science EducationInterviews guests

Website RSS feed

Episodes

Latest episode

Feb 2025

Language

EN-US

About the show

Talking Papers Podcast: deep dives into research papers in computer vision, 3D, machine learning, and AI, with the authors who wrote them. Where research meets conversation. By researchers, for researchers. Each episode is structured like the paper itself: a TL;DR / abstract to set the stage, then related work, approach, results, conclusions, and future work. We close with a bonus segment called "What did Reviewer 2 say?", where the authors share the candid peer-review story behind the publication. Hosted by Itzik Ben-Shabat. Guests are PhD students, postdocs, and faculty from leading labs across academia and industry. Aimed at fellow researchers and graduate students who want the candid version of the work, not a polished press release.

Listen to episodes

36 recent

February 17, 2025Episode 11 hr 37 min

The PhD Advisor Hunt - A Student's Perspective w Derek Liu

The PhD Advisor Hunt - A Student's PerspectiveChoosing the right PhD advisor is a game-changer for your academic journey. In this episode, Derek and Itzik break down everything you need to know when selecting the perfect advisor — from assessing their mentoring style to understanding lab culture and research alignment. 🧠✨Key Takeaways:How to evaluate an advisor’s support, availability, and career guidance 🤝The importance of finding the right fit for your research interests 🔍Red flags to avoid when making your decision 🚩We share tips on talking to current and past students, how to spot potential issues, and how to make sure your advisor helps you thrive. Whether you're just starting your PhD or planning a change, this checklist will help you make an informed decision! ✅Timestamps:00:00 - Our personal academic journey ✨04:23 - Why is Choosing the Right Advisor SO Important? 💡11:06 - Pre vs Post Tenure Advisor 🔄20:42 - Does (group) Size Matter? 👥26:24 - The Lab 🔬30:29 - Infrastructure and Resources 🏗️33:38 - Collaborations 🤝37:48 - Money Money Money 💰49:33 - References 📚1:02:39 - The Research Topic 🔬1:10:40 - Success Perspective 🚀1:18:01 - How to find and impress a supervisor? 💼1:27:17 - The Ultimate Checklist ✅1:30:47 - Red Flag List 🚩1:34:17 - Closing Remarks 🎤Listen to the full podcast! Follow us on social media:Twitter 🐦BlueSky 🌐You can also check out the full blog post.Don't miss the episode on YouTube!🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎥YouTube Channel: https://bit.ly/3eQOgwP

July 11, 2024Episode 3557 min

3D Paintbrush - Dale Decatur

🎙️ Welcome to the latest episode of the Talking Papers Podcast! In this exciting installment, I had the pleasure of hosting Dale Decatur, a talented 3rd year PhD student from the University of Chicago's 3DL lab, where he studies computer graphics, 3D computer vision, and deep learning. 📄 In this episode, we delved into Dale's groundbreaking paper titled "3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation," which was recently published in CVPR 2024. The paper introduces a novel technique, 3D Paintbrush, that enables automatic texturing of local semantic regions on meshes through text descriptions. By creating texture maps that seamlessly integrate into standard graphics pipelines, Dale's method not only streamlines the texturing process but also enhances the quality of localization and stylization.🌟 The innovative Cascaded Score Distillation (CSD) technique developed in this paper leverages multiple stages of a cascaded diffusion model to supervise local editing with generative priors learned from images at varying resolutions. This approach grants users control over both granularity and global understanding of the editing process, opening up new possibilities for simplifying the editing of 3D assets.💡 My insights: This paper marks a significant advancement in democratizing 3D asset editing by leveraging text prompts, a trend gaining traction in the research community. The meticulous approach taken by Dale and his collaborators sets a new standard for local editing and paves the way for more accessible content creation in the 3D space.🔍 Dale's journey to developing 3D Paintbrush is truly inspiring. Our paths first crossed at CVPR 2023 when he presented his 3D highlighter paper. Despite not featuring it on the podcast back then, our mutual acquaintance, Itai Land, reintroduced us at CVPR 2024, showcasing Dale's remarkable progress with 3D Paintbrush. It was evident that having Dale join us on the podcast was a must, and I'm thrilled to share his insights with our audience.🔗 Don't miss out on this enlightening discussion about the future of 3D asset editing! Subscribe to the Talking Papers Podcast for more captivating conversations with emerging academics and PhD students. Let me know your thoughts in the comments below! Thanks for tuning in and stay tuned for more groundbreaking research discussions! 🚀All links and resources are available in the blogpost: https://www.itzikbs.com/3dpaintbrush🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎥YouTube Channel: https://bit.ly/3eQOgwP

June 3, 2024Episode 3430 min

3DInAction - Yizhak Ben-Shabat

🎙️ **Unveiling 3DInAction with Yizhak Ben-Shabat | Talking Papers Podcast** 🎙️📚 *Title:* 3DInAction: Understanding Human Actions in 3D Point Clouds 📅 *Published In:* CVPR 2024 👤 *Guest:* Yizhak (Itzik) Ben-ShabatWelcome back to another exciting episode of the Talking Papers Podcast, where we bring you the latest breakthroughs in academic research directly from early career academics and PhD students! This week, we have the pleasure of hosting Itzik Ben-Shabat to discuss his groundbreaking paper *3DInAction: Understanding Human Actions in 3D Point Clouds*, published in CVPR 2024 as a highlight.In this episode, we delve into a novel method for 3D point cloud action recognition. Itzik explains how this innovative pipeline addresses the major limitations of point cloud data, such as lack of structure, permutation invariance, and varying number of points. With patches moving in time (t-patches) and a hierarchical architecture, 3DInAction significantly enhances spatio-temporal representation learning, achieving superior performance on datasets like DFAUST and IKEA ASM. **Main Contributions:** 1. Introduction of the 3DInAction pipeline for 3D point cloud action recognition. 2. Detailed explanation of t-patches as a key building block. 3. Presentation of a hierarchical architecture for improved spatio-temporal representations. 4. Demonstration of enhanced performance on existing benchmarks.**Host Insights:** Given my involvement in the project, I can share that when I embarked on this journey, there were only a handful of studies tackling the intricate task of 3D action recognition from point cloud data. Today, this has burgeoned into an active and evolving field of research, showing just how pivotal and timely this work is.**Anecdotes and Behind the Scenes:** The title "3DInAction" signifies the culmination of three years of passionate research coinciding with my fellowship's theme. This episode is unique as it's hosted by an AI avatar created by Synthesia—Itzik was looking for an exciting way to share this story using the latest technology. While there is no sponsorship, the use of AI avatars adds an innovative twist to our discussion. Don't miss this intellectually stimulating conversation with Itzik Ben-Shabat. Be sure to leave your thoughts and questions in the comments section below—we’d love to hear from you! And if you haven't already, hit that subscribe button to stay updated with our latest episodes.🔗 **Links and References:**- Watch the full episode: [Podcast Link]- Read the full paper: [Paper Link]📢 **Engage with Us:**- What are your thoughts on 3D point cloud action recognition? Drop a comment below!- Don’t forget to like, subscribe, and hit the notification bell for more insightful episodes!Join us in pushing the boundaries of what's possible in research and technology!--- Ready to be part of this journey? Click play and let’s dive deep into the world of 3D action recognition! 🚀All links and resources are available in the blogpost: https://www.itzikbs.com/3dinactionNote that the host of this episode is not a real person. It is an AI generated avatar and everything she said in the episode was fully scripted.🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎥YouTube Channel: https://bit.ly/3eQOgwP

March 13, 2024Episode 3342 min

Cameras as Rays - Jason Y. Zhang

Talking Papers Podcast Episode: "Cameras as Rays: Pose Estimation via Ray Diffusion" with Jason ZhangWelcome to the latest episode of the Talking Papers Podcast! This week's guest is Jason Zhang, a PhD student at the Robotics Institute at Carnegie Mellon University who joined us to discuss his paper, "Cameras as Rays: Pose Estimation via Ray Diffusion". The paper was published in the highly-respected conference ICLR, 2024.Jason's research hones in on the pivotal task of estimating camera poses for 3D reconstruction - a challenge made more complex with sparse views. His paper proposes an inventive and out-of-the-box representation that perceives camera poses as a bundle of rays. This innovative perspective makes a substantial impact on the issue at hand, demonstrating promising results even in the context of sparse views.What's particularly exciting is that his work, be it regression-based or diffusion-based, showcases top-notch performance on camera pose estimation on CO3D, and effectively generalizes to unseen object categories as well as captures in the wild. Throughout our conversation, Jason explained his insightful approach and how the denoising diffusion model and set-level transformers come into play to yield these impressive results. I found his technique a breath of fresh air in the field of camera pose estimation, notably in the formulation of both regression and diffusion models. On a more personal note, Jason and I didn't know each other before this podcast, so it was fantastic learning about his journey from the Bay Area to Pittsburgh. His experiences truly enriched our discussion and coined one of our most memorable episodes yet. We hope you find this podcast as enlightening as we did creating it. If you enjoyed our chat, don't forget to subscribe for more thought-provoking discussions with early career academics and PhD students. Leave a comment below sharing your thoughts on Jason's paper! Until next time, keep following your curiosity and questioning the status quo. #TalkingPapersPodcast #ICLR2024 #CameraPoseEstimation #3DReconstruction #RayDiffusion #PhDResearchers #AcademicResearch #CarnegieMellonUniversity #BayArea #PittsburghAll links and resources are available in the blogpost: https://www.itzikbs.com/cameras-as-rays🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎥YouTube Channel: https://bit.ly/3eQOgwP

February 16, 2024Episode 3252 min

Instant3D - Jiahao Li

Welcome to another exciting episode of the Talking Papers Podcast! In this episode, I had the pleasure of hosting Jiahao Li, a talented PhD student at Toyota Technological Institute at Chicago (TTIC), who discussed his groundbreaking research paper titled "Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model". This paper, published in ICLR 2024, introduces a novel method that revolutionizes text-to-3D generation.Instant3D addresses the limitations of existing methods by combining a two-stage approach. First, a fine-tuned 2D text-to-image diffusion model generates a set of four structured and consistent views from the given text prompt. Then, a transformer-based sparse-view reconstructor directly regresses the NeRF from the generated images. The results are stunning: high-quality and diverse 3D assets are produced within a mere 20 seconds, making it a hundred times faster than previous optimization-based methods.As a 3D enthusiast myself, I found the outcomes of Instant3D truly captivating, especially considering the short amount of time it takes to generate them. While it's unusual for a 3D person like me to experience these creations through a 2D projection, the astonishing results make it impossible to ignore the potential of this approach. This paper underscores the importance of obtaining more and better 3D data, paving the way for exciting advancements in the field.Let me share a little anecdote about our guest, Jiahao Li. We were initially introduced through Yicong Hong, another brilliant guest on our podcast. Yicong, who was a PhD student at ANU during my postdoc, and Jiahao interned together at Adobe while working on this very paper. Coincidentally, Yicong also happens to be a coauthor of Instant3D. It's incredible to see such brilliant minds coming together on groundbreaking research projects.Now, unfortunately, the model developed in this paper is not publicly available. However, given the computational resources required to train these advanced models and obvious copyright issues, it's understandable that Adobe has chosen to keep it proprietary. Not all of us have a hundred GPUs lying around, right?Remember to hit that subscribe button and join the conversation in the comments section. Let's delve into the exciting world of Instant3D with Jiahao Li on this episode of Talking Papers Podcast!#TalkingPapersPodcast #ICLR2024 #Instant3D #TextTo3D #ResearchPapers #PhDStudents #AcademicResearchAll links and resources are available in the blogpost: https://www.itzikbs.com/instant3d🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎥YouTube Channel: https://bit.ly/3eQOgwP

December 14, 2023Episode 3141 min

Variational Barycentric Coordinates - Ana Dodik

In this exciting episode of #TalkingPapersPodcast, we have the pleasure of hosting Ana Dodik, a second-year PhD student at MIT. We delve into her research paper titled "Variational Barycentric Coordinates." Published in SIGGRAPH Asia, 2023, this paper significantly contributes to our understanding of the optimization of generalized barycentric coordinates. The paper introduces a robust variational technique that offers further control as opposed to existing models. Traditional practices are restrictive due to the representation of barycentric coordinates utilizing meshes or closed-form formulae. However, Dodik's research defies these limits by directly parameterizing the continuous function that maps any coordinate concerning a polytope's interior to its barycentric coordinates using a neural field. A profound theoretical characterization of barycentric coordinates is indeed the backbone of this innovation. This research demonstrates the versatility of the model by deploying variety of objective functions and also suggests a practical acceleration strategy.My take on this is rather profound: this tool can be very useful for artists. It sparks a thrill of anticipation of their feedback on its performance. Melding classical geometry processing methods with newer, Neural-X methods, this research stands as a testament to the significant advances in today's technology landscape.My talk with Ana was delightfully enriching. In a unique online setting, we discussed how the current times serve as the perfect opportunity to pursue a PhD. We owe that to improvements in technology.Remember to hit the subscribe button and leave a comment about your thoughts on Ana's research. We'd love to hear your insights and engage in discussions to further this fascinating discourse in academia.All links and resources are available in the blogpost: https://www.itzikbs.com/variational-barycentric-coordinates🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎥YouTube Channel: https://bit.ly/3eQOgwP

November 22, 2023Episode 301 hr 8 min

Reverse Engineering SSL - Ravid Shwartz-Ziv

Welcome to another exciting episode of the Talking Papers Podcast! In this episode, we delve into the fascinating world of self-supervised learning with our special guest, Ravid Shwartz-Ziv. Together, we explore and dissect their research paper titled "Reverse Engineering Self-Supervised Learning," published in NeurIPS 2023.Self-supervised learning (SSL) has emerged as a game-changing technique in the field of machine learning. However, understanding the learned representations and their underlying mechanisms has remained a challenge - until now. Ravid Shwartz-Ziv's paper provides an in-depth empirical analysis of SSL-trained representations, encompassing various models, architectures, and hyperparameters.The study uncovers a captivating aspect of the SSL training process - its inherent ability to facilitate the clustering of samples based on semantic labels. Surprisingly, this clustering is driven by the regularization term in the SSL objective. Not only does this process enhance downstream classification performance, but it also exhibits a remarkable power of data compression. The paper further establishes that SSL-trained representations align more closely with semantic classes than random classes, even across different hierarchical levels. What's more, this alignment strengthens during training and as we venture deeper into the network.Join us as we discuss the insights gained from this exceptional research. One remarkable aspect of the paper is its departure from the trend of focusing solely on outperforming competitors. Instead, it dives deep into understanding the semantic clustering effect of SSL techniques, shedding light on the underlying capabilities of the tools we commonly use. It is truly a genre of research that holds immense value.During our conversation, Ravid Shwartz-Ziv - a CDS Faculty Fellow at NYU Center for Data Science - shares their perspectives and insights, providing an enriching layer to our exploration. Interestingly, despite both of us being in Israel at the time of recording, we had never met in person, highlighting the interconnectedness and collaborative nature of the academic world.Don't miss this thought-provoking episode that promises to expand your understanding of self-supervised learning and its impact on representation learning mechanisms. Subscribe to our channel now, join the discussion, and let us know your thoughts in the comments below! All links and resources are available in the blogpost: https://www.itzikbs.com/revenge_ssl🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎥YouTube Channel: https://bit.ly/3eQOgwP

November 9, 2023Episode 2959 min

CSG on Neural SDFs - Zoë Marschner

Welcome to another exciting episode of the Talking Papers Podcast! In this installment, I had the pleasure of hosting the brilliant Zoë Marschner as we delved into the fascinating world of Constructive Solid Geometry on Neural Signed Distance Fields. This exceptional research paper, published in SIGGRAPH Asia 2023, explores the cutting-edge potential of neural networks in shaping geometric representations.In our conversation, Zoë enlightened us on the challenges surrounding the editing of shapes encoded by neural Signed Distance Fields (SDFs). While common geometric operators seem like a promising solution, they often result in incorrect outputs known as Pseudo-SDFs, rendering them unusable for downstream tasks. However, fear not! Zoë and her team have galvanized this field with groundbreaking insights.They characterize the space of Pseudo-SDFs and proffer a novel regularizer called the closest point loss. This ingenious technique encourages the output to be an exact SDF, ensuring accurate shape representation. Their findings have profound implications for operations like CSG (Constructive Solid Geometry) and swept volumes, revolutionizing their applications in fields such as computer-aided design (CAD).As a former mechanical engineer, I find the concept of combining CSGs with Neural Signed Distance fields to be immensely empowering. The potential for creating intricate and precise designs is mind-boggling!On a personal note, I couldn't be more thrilled about this episode. Not only were two of the co-authors, Derek and Silvia, previous guests on the podcast, but I also had the pleasure of virtually meeting Zoë for the first time. Recording this episode with her was an absolute blast, and I must say, her enthusiasm and expertise shine through, despite being in the early stages of her career. It's worth mentioning that she has even collaborated with some of the most senior figures in the field!Join us on this captivating journey into the world of Neural Signed Distance Fields. Don't forget to subscribe and leave your thoughts in the comments section below. We would love to hear your take on this groundbreaking research!All links and resources are available in the blogpost: https://www.itzikbs.com/CSG_on_NSDF #TalkingPapersPodcast #SIGGRAPHAsia2023 #SDFs #CSG #shapeediting #neuralnetworks #CAD #research🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎥YouTube Channel: https://bit.ly/3eQOgwP

September 29, 2023Episode 2835 min

HMD-NeMo - Sadegh Aliakbarian

🎙️Join us on this exciting episode of the Talking Papers Podcast as we sit down with the talented Sadegh Aliakbarian to explore his groundbreaking ICCV 2023 paper "HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations" . Our guest, will take us on a journey through this pivotal research that addresses a crucial aspect of immersive mixed reality experiences.🌟 The quality of these experiences hinges on generating plausible and precise full-body avatar motion, a challenge given the limited input signals provided by Head-Mounted Devices (HMDs), typically head and hands 6-DoF. While recent approaches have made strides in generating full-body motion from such inputs, they assume full hand visibility. This assumption, however, doesn't hold in scenarios without motion controllers, relying instead on egocentric hand tracking, which can lead to partial hand visibility due to the HMD's field of view.🧠 "HMD-NeMo" presents a groundbreaking solution, offering a unified approach to generating realistic full-body motion even when hands are only partially visible. This lightweight neural network operates in real-time, incorporating a spatio-temporal encoder with adaptable mask tokens, ensuring plausible motion in the absence of complete hand observations.👤 Sadegh is currently a senior research scientist at Microsoft Mixed Reality and AI Lab-Cambridge (UK), where he's at the forefront of Microsoft Mesh and avatar motion generation. He holds a PhD from the Australian National University, where he specialized in generative modeling of human motion. His research journey includes internships at Amazon AI, Five AI, and Qualcomm AI Research, focusing on generative models, representation learning, and adversarial examples.🤝 We first crossed paths during our time at the Australian Centre for Robotic Vision (ACRV), where Sadegh was pursuing his PhD, and I was embarking on my postdoctoral journey. During this time, I had the privilege of collaborating with another co-author of the paper, Fatemeh Saleh, who also happens to be Sadegh's life partner. It's been incredible to witness their continued growth. 🚀 Join us as we uncover the critical advancements brought by "HMD-NeMo" and their implications for the future of mixed reality experiences. Stay tuned for the episode release! All links and resources are available in the blogpost: https://www.itzikbs.com/hmdnemo🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎥YouTube Channel: https://bit.ly/3eQOgwP

September 28, 2023Episode 2756 min

CC3D - Jeong Joon Park

Join us on this exciting episode of the Talking Papers Podcast as we sit down with the brilliant Jeong Joon Park to explore his groundbreaking paper, "CC3D: Layout-Conditioned Generation of Compositional 3D Scenes," just published at ICCV 2023.Discover CC3D, a game-changing conditional generative model redefining 3D scene synthesis. Unlike traditional 3D GANs, CC3D boldly crafts complex scenes with multiple objects, guided by 2D semantic layouts. With a novel 3D field representation, CC3D delivers efficiency and superior scene quality. Get ready for a deep dive into the future of 3D scene generation.My journey with Jeong Joon Park began with his influential SDF paper at CVPR 2019. We met in person at CVPR 2022, thanks to mutual guest Despoina, who was also a guest on our podcast. Now, as Assistant Professor at the University of Michigan CSE, JJ leads research in realistic 3D content generation, offering opportunities for students to contribute to the frontiers of computer vision and AI.Don't miss this insightful exploration of this ICCV 2023 paper and the future of 3D scene synthesis.CC3D: Layout-Conditioned Generation of Compositional 3D ScenesAuthorsSherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Xingguang Yan, Gordon Wetzstein, Leonidas Guibas, Andrea TagliasacchiAbstractIn this work, we introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts, trained using single-view images. Different from most existing 3D GANs that limit their applicability to aligned single objects, we focus on generating complex scenes with multiple objects, by modeling the compositional nature of 3D scenes. By devising a 2D layout-based approach for 3D synthesis and implementing a new 3D field representation with a stronger geometric inductive bias, we have created a 3D GAN that is both efficient and of high quality, while allowing for a more controllable generation process. Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality in comparison to previous works.All links and resources are available on the blog post: https://www.itzikbs.com/cc3d Subscribe and stay tuned! 🚀🔍🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎥YouTube Channel: https://bit.ly/3eQOgwP