Find partners
TechcraftingAI NLP

TechcraftingAI NLP

Hosted by Brad Edwards

Episodes

271

Latest episode

Jun 2024

Language

EN-US

About the show

TechcraftingAI NLP brings you daily summaries of the latest arXiv Computation and Language research.

Listen to episodes

60 recent
June 15, 202434 min

Ep. 263 - Part 2 - June 13, 2024

ArXiv NLP research for Thursday, June 13, 2024. 00:20: Chain-of-Though (CoT) prompting strategies for medical error detection and correction 01:31: CoastTerm: a Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature 02:52: RH-SQL: Refined Schema and Hardness Prompt for Text-to-SQL 04:01: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs 05:24: Leveraging Explicit Reasoning for Inference Integration in Commonsense-Augmented Dialogue Models 06:38: Investigating the translation capabilities of Large Language Models trained on parallel data only 07:56: LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks 09:09: DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation 11:20: Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning 12:46: Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations 13:53: Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn't 14:47: ReadCtrl: Personalizing text generation with readability-controlled instruction learning 16:32: Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models 17:49: Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs 19:18: End-to-end Streaming model for Low-Latency Speech Anonymization 20:22: Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback 22:25: On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models 23:33: Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models 24:35: Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech 25:47: AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models 27:15: Transformers meet Neural Algorithmic Reasoners 28:32: REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space 30:02: Learning from Natural Language Explanations for Generalizable Entity Matching 31:14: ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models 32:29: DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding 33:43: Improving Autoregressive Training with Dynamic Oracles

June 15, 202437 min

Ep. 263 - Part 1 - June 13, 2024

ArXiv NLP research for Thursday, June 13, 2024. 00:20: Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning 01:53: Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models 03:26: Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory 04:33: Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination 06:05: DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage 07:26: Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning 08:41: ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions 10:07: An Approach to Build Zero-Shot Slot-Filling System for Industry-Grade Conversational Assistants 11:42: Plan, Generate and Complicate: Improving Low-resource Dialogue State Tracking via Easy-to-Difficult Zero-shot Data Augmentation 12:42: No perspective, no perception!! Perspective-aware Healthcare Answer Summarization 14:28: Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models 16:02: An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios 17:21: Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors 18:48: Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning 19:52: Word Order in English-Japanese Simultaneous Interpretation: Analyses and Evaluation using Chunk-wise Monotonic Translation 21:12: Multi-Agent Software Development through Cross-Team Collaboration 22:55: LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models 24:14: Bayesian Statistical Modeling with Predictors from LLMs 25:39: ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models 27:28: Language Models are Crossword Solvers 28:32: MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning 29:51: CUDRT: Benchmarking the Detection of Human vs. Large Language Models Generated Texts 31:29: Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning? 32:59: 3M: Multi-modal Multi-task Multi-teacher Learning for Game Event Detection 34:08: Modeling Comparative Logical Relation with Contrastive Learning for Text Generation 35:42: SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models

June 13, 202454 min

Ep. 262 - June 12, 2024

ArXiv NLP research for Wednesday, June 12, 2024. 00:19: VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment 02:05: BookSQL: A Large Scale Text-to-SQL Dataset for Accounting Domain 03:15: Designing a Dashboard for Transparency and Control of Conversational AI 04:46: Label-aware Hard Negative Sampling Strategies with Momentum Contrastive Learning for Implicit Hate Speech Detection 05:51: Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 06:53: Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations 07:52: Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation 08:55: DeTriever: Decoder-representation-based Retriever for Improving NL2SQL In-Context Learning 10:20: Automated Information Extraction from Thyroid Operation Narrative: A Comparative Study of GPT-4 and Fine-tuned KoELECTRA 11:35: Large Language Model Unlearning via Embedding-Corrupted Prompts 13:17: Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation 14:46: Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling 16:02: LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning 17:18: Guiding In-Context Learning of LLMs through Quality Estimation for Machine Translation 18:37: It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF 20:02: Adversarial Evasion Attack Efficiency against Large Language Models 21:06: Learning Job Title Representation from Job Description Aggregation Network 21:59: Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey 23:35: AustroTox: A Dataset for Target-Based Austrian German Offensive Language Detection 24:38: Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation 25:56: Multimodal Table Understanding 27:20: CoXQL: A Dataset for Parsing Explanation Requests in Conversational XAI Systems 28:51: Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling 30:36: Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets 31:57: Semi-Supervised Spoken Language Glossification 33:16: Underneath the Numbers: Quantitative and Qualitative Gender Fairness in LLMs for Depression Prediction 34:37: A Dialogue Game for Eliciting Balanced Collaboration 35:23: Transformer-based Model for ASR N-Best Rescoring and Rewriting 36:16: SumHiS: Extractive Summarization Exploiting Hidden Structure 36:53: Figuratively Speaking: Authorship Attribution via Multi-Task Figurative Language Modeling 38:08: Leveraging Large Language Models for Web Scraping 39:51: M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation 41:15: Is Programming by Example solved by LLMs? 42:29: Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques 43:42: Towards Unsupervised Speech Recognition Without Pronunciation Models 44:50: cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers 45:57: Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models 47:02: Tailoring Generative AI Chatbots for Multiethnic Communities in Disaster Preparedness Communication: Extending the CASA Paradigm 48:12: Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL 49:56: TasTe: Teaching Large Language Models to Translate through Self-Reflection 51:28: OLMES: A Standard for Language Model Evaluations 52:47: Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

June 13, 202438 min

Ep. 261 - Part 2 - June 11, 2024

ArXiv NLP research for Tuesday, June 11, 2024. 00:20: Scientific Computing with Large Language Models 01:08: Speaking Your Language: Spatial Relationships in Interpretable Emergent Communication 02:19: Bilingual Sexism Classification: Fine-Tuned XLM-RoBERTa and GPT-3.5 Few-Shot Learning 03:51: Fine-tuning with HED-IT: The impact of human post-editing for dialogical language models 05:26: Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? 07:03: Joint Learning of Context and Feedback Embeddings in Spoken Dialogue 07:57: BertaQA: How Much Do Language Models Know About Local Culture? 09:17: MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting 10:20: CTC-based Non-autoregressive Textless Speech-to-Speech Translation 11:21: Toxic Memes: A Survey of Computational Perspectives on the Detection and Explanation of Meme Toxicities 13:27: GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews 14:40: BvSP: Broad-view Soft Prompting for Few-Shot Aspect Sentiment Quad Prediction 16:32: When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models 18:01: Limited Out-of-Context Knowledge Reasoning in Large Language Models 19:36: MINERS: Multilingual Language Models as Semantic Retrievers 20:42: Learning Domain-Invariant Features for Out-of-Context News Detection 22:03: Textual Similarity as a Key Metric in Machine Translation Quality Estimation 23:02: On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations 24:31: Multimodal Belief Prediction 25:29: Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing 26:56: Paraphrasing in Affirmative Terms Improves Negation Understanding 27:37: CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization 29:38: TextGrad: Automatic "Differentiation" via Text 31:35: Just Because We Camp, Doesn't Mean We Should: The Ethics of Modelling Queer Voices 32:35: THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report 33:51: Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling 35:22: Simple and Effective Masked Diffusion Language Models 36:35: Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena

June 13, 202438 min

Ep. 261 - Part 1 - June 11, 2024

ArXiv NLP research for Tuesday, June 11, 2024. 00:20: A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation 01:41: Post-Hoc Answer Attribution for Grounded and Trustworthy Long Document Comprehension: Task, Insights, and Challenges 02:32: A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation 04:08: Evolving Subnetwork Training for Large Language Models 05:31: Missingness-resilient Video-enhanced Multimodal Disfluency Detection 06:37: Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models 08:14: Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference 09:33: Delving into ChatGPT usage in academic writing through excess vocabulary 10:53: Paying More Attention to Source Context: Mitigating Unfaithful Translations from Large Language Model 12:12: CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation 13:26: Effectively Compress KV Heads for LLM 15:00: Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study 16:54: Reading Miscue Detection in Primary School through Automatic Speech Recognition 18:09: HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation 20:01: DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs 21:15: Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning 22:35: Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees 24:42: Translating speech with just images 25:35: Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement 26:51: Teaching Language Models to Self-Improve by Learning from Language Feedback 28:25: Merging Improves Self-Critique Against Jailbreak Attacks 29:18: Towards Human-AI Collaboration in Healthcare: Guided Deferral Systems with Large Language Models 30:11: Improving Autoformalization using Type Checking 31:37: Improving Commonsense Bias Classification by Mitigating the Influence of Demographic Terms 33:19: Decipherment-Aware Multilingual Learning in Jointly Trained Language Models 34:20: DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms 35:20: On the Hallucination in Simultaneous Machine Translation 36:07: MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs 37:42: Scholarly Question Answering using Large Language Models in the NFDI4DataScience Gateway

June 11, 202449 min

Ep. 260 - June 10, 2024

ArXiv NLP research for Monday, June 10, 2024. 00:19: Shoulders of Giants: A Look at the Degree and Utility of Openness in NLP Research 00:59: HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs 02:29: The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models 03:24: MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models 04:51: A Multidimensional Framework for Evaluating Lexical Semantic Change with Social Science Applications 05:49: Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text 07:10: Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval 09:08: Recurrent Context Compression: Efficiently Expanding the Context Window of LLM 10:35: Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation 11:26: Verifiable Generation with Subsentence-Level Fine-Grained Citations 12:36: Comparing Data Augmentation Methods for End-to-End Task-Oriented Dialog Systems 13:55: Building Bridges: A Dataset for Evaluating Gender-Fair Machine Translation into German 15:28: Can I understand what I create? Self-Knowledge Evaluation of Large Language Models 16:28: Language Models Resist Alignment 17:58: LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages 19:27: Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning 20:27: Combining Embeddings and Domain Knowledge for Job Posting Duplicate Detection 21:37: MaskLID: Code-Switching Language Identification through Iterative Masking 22:49: Multi-Prompting Decoder Helps Better Language Understanding 24:22: Tx-LLM: A Large Language Model for Therapeutics 26:21: Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching 27:43: A Parameter-efficient Language Extension Framework for Multilingual ASR 29:06: MedExQA: Medical Question Answering Benchmark with Multiple Explanations 30:36: Sustained Vowels for Pre- vs Post-Treatment COPD Classification 31:49: MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows 33:40: Symmetric Dot-Product Attention for Efficient Training of BERT Language Models 35:00: Annotation alignment: Comparing LLM and human annotations of conversational safety 36:07: mHuBERT-147: A Compact Multilingual HuBERT Model 37:27: Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue 39:00: INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition 40:06: Meta Learning Text-to-Speech Synthesis in over 7000 Languages 40:59: Controlling Emotion in Text-to-Speech with Natural Language Prompts 41:55: Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain 43:29: Multimodal Contextualized Semantic Parsing from Speech 44:25: Interpretability of Language Models via Task Spaces 45:45: Evaluating the Retrieval Component in LLM-Based Question Answering Systems 46:52: Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies 48:08: Can Language Models Serve as Text-Based World Simulators?

June 11, 202437 min

Ep. 259 - June 9, 2024

ArXiv NLP research for Sunday, June 09, 2024. 00:19: How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States 01:40: DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation 03:25: Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses 05:08: MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations 06:17: SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models 08:11: Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions 09:54: MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation 11:20: QGEval: A Benchmark for Question Generation Evaluation 12:44: MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model 13:43: Arabic Diacritics in the Wild: Exploiting Opportunities for Improved Diacritization 14:46: The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models 16:30: RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation 18:14: Hidden Holes: topological aspects of language models 19:46: Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper 20:40: Seventeenth-Century Spanish American Notary Records for Fine-Tuning Spanish Large Language Models 22:02: MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering 23:12: II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models 25:17: Zero-Shot End-To-End Spoken Question Answering In Medical Domain 26:27: Are Large Language Models Actually Good at Text Style Transfer? 27:32: Feriji: A French-Zarma Parallel Corpus, Glossary & Translator 28:56: TTM-RE: Memory-Augmented Document-Level Relation Extraction 30:12: Why Don't Prompt-Based Fairness Metrics Correlate? 31:27: Hello Again! LLM-powered Personalized Agent for Long-term Dialogue 33:12: Semisupervised Neural Proto-Language Reconstruction 34:12: Prompting Large Language Models with Audio for General-Purpose Speech Summarization 35:14: A Dual-View Approach to Classifying Radiology Reports by Co-Training 36:07: ThaiCoref: Thai Coreference Resolution Dataset

June 11, 202430 min

Ep. 258 - June 8, 2024

ArXiv NLP research for Saturday, June 08, 2024. 00:19: MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention 01:44: Toward Reliable Ad-hoc Scientific Information Extraction: A Case Study on Two Materials Datasets 02:30: Flexible and Adaptable Summarization via Expertise Separation 04:18: Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization 06:07: CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation 07:23: Venn Diagram Prompting : Accelerating Comprehension with Scaffolding Effect 08:45: VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers 10:19: Planning Like Human: A Dual-process Framework for Dialogue Planning 11:48: Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas 12:57: Recent advancements in computational morphology : A comprehensive survey 14:01: MaTableGPT: GPT-based Table Data Extractor from Materials Science Literature 15:41: Design of reliable technology valuation model with calibrated machine learning of patent indicators 17:08: Fighting Against the Repetitive Training and Sample Dependency Problem in Few-shot Named Entity Recognition 18:59: Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation 20:25: Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities 21:47: ThatiAR: Subjectivity Detection in Arabic News Sentences 23:07: Do LLMs Recognize me, When I is not me: Assessment of LLMs Understanding of Turkish Indexical Pronouns in Indexical Shift Contexts 24:49: Creativity Has Left the Chat: The Price of Debiasing Language Models 25:57: CERET: Cost-Effective Extrinsic Refinement for Text Generation 27:05: GrowOVER: How Can LLMs Adapt to Growing Real-World Knowledge? 28:07: Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives 29:03: ATLAS: Improving Lay Summarisation with Attribute-based Control

June 10, 202452 min

Ep. 257 - June 7, 2024

ArXiv NLP research for Friday, June 07, 2024. 00:19: Key-Element-Informed sLLM Tuning for Document Summarization 01:22: Low-Resource Cross-Lingual Summarization through Few-Shot Learning with Large Language Models 02:42: Large Language Model-guided Document Selection 04:13: More Victories, Less Cooperation: Assessing Cicero's Diplomacy Play 05:24: DiNeR: a Large Realistic Dataset for Evaluating Compositional Generalization 06:43: MATTER: Memory-Augmented Transformer Using Heterogeneous Knowledge Sources 08:01: Mixture-of-Agents Enhances Large Language Model Capabilities 09:09: AICoderEval: Improving AI Domain Code Generation of Large Language Models 11:00: CRAG -- Comprehensive RAG Benchmark 13:04: CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models 14:52: Think out Loud: Emotion Deducing Explanation in Dialogues 16:43: WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild 18:46: SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals 19:58: BERTs are Generative In-Context Learners 20:43: Annotating FrameNet via Structure-Conditioned Language Generation 21:49: Revisiting Catastrophic Forgetting in Large Language Model Tuning 22:43: FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models 24:33: Do Language Models Exhibit Human-like Structural Priming Effects? 25:27: Uncertainty Aware Learning for Language Model Alignment 26:50: The Russian Legislative Corpus 27:24: ComplexTempQA: A Large-Scale Dataset for Complex Temporal Question Answering 28:53: HateDebias: On the Diversity and Variability of Hate Speech Debiasing 30:29: A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques 32:00: Sexism Detection on a Data Diet 33:18: XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model 34:21: Through the Thicket: A Study of Number-Oriented LLMs derived from Random Forest Models 35:32: LLM-based speaker diarization correction: A generalizable approach 36:52: TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models 38:10: BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense 39:10: Quantifying Geospatial in the Common Crawl Corpus 40:14: MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter 41:47: Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences 43:19: Compositional Generalization with Grounded Language Models 44:26: Scenarios and Approaches for Situated Natural Language Explanations 46:04: Are Large Language Models More Empathetic than Humans? 47:38: SUMIE: A Synthetic Benchmark for Incremental Entity Summarization 48:52: Multi-Head RAG: Solving Multi-Aspect Problems with LLMs 50:33: An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

June 7, 202438 min

Ep. 256 - Part 2 - June 6, 2024

ArXiv NLP research for Thursday, June 06, 2024. 00:20: The syntax-semantics interface in a child's path: A study of 3- to 11-year-olds' elicited production of Mandarin recursive relative clauses 02:17: Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models 03:39: Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster 04:36: Intention and Face in Dialog 05:48: Uncovering Limitations of Large Language Models in Information Seeking from Tables 07:15: Are We Done with MMLU? 08:41: Legal Judgment Reimagined: PredEx and the Rise of Intelligent AI Interpretation in Indian Courts 09:53: Do Language Models Understand Morality? Towards a Robust Detection of Moral Content 11:47: Every Answer Matters: Evaluating Commonsense with Probabilistic Measures 12:49: Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness 14:26: Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness 15:35: Confabulation: The Surprising Value of Large Language Model Hallucinations 16:42: DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning 18:25: Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model 19:32: ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models 20:50: mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans 22:21: What Do Language Models Learn in Context? The Structured Task Hypothesis 23:38: Rethinking LLM and Linguistic Steganalysis: An Efficient Detection of Strongly Concealed Stego 24:58: BEADs: Bias Evaluation Across Domains 26:41: FairytaleQA Translated: Enabling Educational Question and Answer Generation in Less-Resourced Languages 28:03: Benchmark Data Contamination of Large Language Models: A Survey 29:02: Transformers need glasses! Information over-squashing in language tasks 30:26: Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models 31:58: Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People 33:44: ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions 35:19: What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages 36:41: PaCE: Parsimonious Concept Engineering for Large Language Models

Is this your show?

Claim this listing to keep it up to date, reach guests who want to pitch you, and manage bookings with Guestify.

Claim this listing

More Technology podcasts