TechcraftingAI NLP

Hosted by Brad Edwards

Technology

Website RSS feed

Episodes

271

Latest episode

Jun 2024

Language

EN-US

About the show

TechcraftingAI NLP brings you daily summaries of the latest arXiv Computation and Language research.

Listen to episodes

60 recent

June 15, 202434 min

Ep. 263 - Part 2 - June 13, 2024

ArXiv NLP research for Thursday, June 13, 2024. 00:20: Chain-of-Though (CoT) prompting strategies for medical error detection and correction 01:31: CoastTerm: a Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature 02:52: RH-SQL: Refined Schema and Hardness Prompt for Text-to-SQL 04:01: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs 05:24: Leveraging Explicit Reasoning for Inference Integration in Commonsense-Augmented Dialogue Models 06:38: Investigating the translation capabilities of Large Language Models trained on parallel data only 07:56: LASER: Learning by Aligning Self-supervised Representations of Speech for Improving Content-related Tasks 09:09: DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation 11:20: Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning 12:46: Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations 13:53: Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn't 14:47: ReadCtrl: Personalizing text generation with readability-controlled instruction learning 16:32: Self-Training for Sample-Efficient Active Learning for Text Classification with Pre-Trained Language Models 17:49: Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs 19:18: End-to-end Streaming model for Low-Latency Speech Anonymization 20:22: Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback 22:25: On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models 23:33: Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models 24:35: Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech 25:47: AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models 27:15: Transformers meet Neural Algorithmic Reasoners 28:32: REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space 30:02: Learning from Natural Language Explanations for Generalizable Entity Matching 31:14: ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models 32:29: DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding 33:43: Improving Autoregressive Training with Dynamic Oracles

June 15, 202437 min

Ep. 263 - Part 1 - June 13, 2024

ArXiv NLP research for Thursday, June 13, 2024. 00:20: Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning 01:53: Mixture-of-Skills: Learning to Optimize Data Usage for Fine-Tuning Large Language Models 03:26: Automated Essay Scoring Using Grammatical Variety and Errors with Multi-Task Learning and Item Response Theory 04:33: Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination 06:05: DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage 07:26: Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning 08:41: ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions 10:07: An Approach to Build Zero-Shot Slot-Filling System for Industry-Grade Conversational Assistants 11:42: Plan, Generate and Complicate: Improving Low-resource Dialogue State Tracking via Easy-to-Difficult Zero-shot Data Augmentation 12:42: No perspective, no perception!! Perspective-aware Healthcare Answer Summarization 14:28: Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models 16:02: An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios 17:21: Navigating the Shadows: Unveiling Effective Disturbances for Modern AI Content Detectors 18:48: Exploring Multilingual Unseen Speaker Emotion Recognition: Leveraging Co-Attention Cues in Multitask Learning 19:52: Word Order in English-Japanese Simultaneous Interpretation: Analyses and Evaluation using Chunk-wise Monotonic Translation 21:12: Multi-Agent Software Development through Cross-Team Collaboration 22:55: LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models 24:14: Bayesian Statistical Modeling with Predictors from LLMs 25:39: ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models 27:28: Language Models are Crossword Solvers 28:32: MiLoRA: Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning 29:51: CUDRT: Benchmarking the Detection of Human vs. Large Language Models Generated Texts 31:29: Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning? 32:59: 3M: Multi-modal Multi-task Multi-teacher Learning for Game Event Detection 34:08: Modeling Comparative Logical Relation with Contrastive Learning for Text Generation 35:42: SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models

June 13, 202454 min

Ep. 262 - June 12, 2024

ArXiv NLP research for Wednesday, June 12, 2024. 00:19: VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment 02:05: BookSQL: A Large Scale Text-to-SQL Dataset for Accounting Domain 03:15: Designing a Dashboard for Transparency and Control of Conversational AI 04:46: Label-aware Hard Negative Sampling Strategies with Momentum Contrastive Learning for Implicit Hate Speech Detection 05:51: Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 06:53: Exploring Self-Supervised Multi-view Contrastive Learning for Speech Emotion Recognition with Limited Annotations 07:52: Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation 08:55: DeTriever: Decoder-representation-based Retriever for Improving NL2SQL In-Context Learning 10:20: Automated Information Extraction from Thyroid Operation Narrative: A Comparative Study of GPT-4 and Fine-tuned KoELECTRA 11:35: Large Language Model Unlearning via Embedding-Corrupted Prompts 13:17: Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation 14:46: Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling 16:02: LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning 17:18: Guiding In-Context Learning of LLMs through Quality Estimation for Machine Translation 18:37: It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF 20:02: Adversarial Evasion Attack Efficiency against Large Language Models 21:06: Learning Job Title Representation from Job Description Aggregation Network 21:59: Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey 23:35: AustroTox: A Dataset for Target-Based Austrian German Offensive Language Detection 24:38: Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation 25:56: Multimodal Table Understanding 27:20: CoXQL: A Dataset for Parsing Explanation Requests in Conversational XAI Systems 28:51: Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling 30:36: Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets 31:57: Semi-Supervised Spoken Language Glossification 33:16: Underneath the Numbers: Quantitative and Qualitative Gender Fairness in LLMs for Depression Prediction 34:37: A Dialogue Game for Eliciting Balanced Collaboration 35:23: Transformer-based Model for ASR N-Best Rescoring and Rewriting 36:16: SumHiS: Extractive Summarization Exploiting Hidden Structure 36:53: Figuratively Speaking: Authorship Attribution via Multi-Task Figurative Language Modeling 38:08: Leveraging Large Language Models for Web Scraping 39:51: M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation 41:15: Is Programming by Example solved by LLMs? 42:29: Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques 43:42: Towards Unsupervised Speech Recognition Without Pronunciation Models 44:50: cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers 45:57: Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models 47:02: Tailoring Generative AI Chatbots for Multiethnic Communities in Disaster Preparedness Communication: Extending the CASA Paradigm 48:12: Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL 49:56: TasTe: Teaching Large Language Models to Translate through Self-Reflection 51:28: OLMES: A Standard for Language Model Evaluations 52:47: Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

June 13, 202438 min

Ep. 261 - Part 2 - June 11, 2024

ArXiv NLP research for Tuesday, June 11, 2024. 00:20: Scientific Computing with Large Language Models 01:08: Speaking Your Language: Spatial Relationships in Interpretable Emergent Communication 02:19: Bilingual Sexism Classification: Fine-Tuned XLM-RoBERTa and GPT-3.5 Few-Shot Learning 03:51: Fine-tuning with HED-IT: The impact of human post-editing for dialogical language models 05:26: Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? 07:03: Joint Learning of Context and Feedback Embeddings in Spoken Dialogue 07:57: BertaQA: How Much Do Language Models Know About Local Culture? 09:17: MM-KWS: Multi-modal Prompts for Multilingual User-defined Keyword Spotting 10:20: CTC-based Non-autoregressive Textless Speech-to-Speech Translation 11:21: Toxic Memes: A Survey of Computational Perspectives on the Detection and Explanation of Meme Toxicities 13:27: GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews 14:40: BvSP: Broad-view Soft Prompting for Few-Shot Aspect Sentiment Quad Prediction 16:32: When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models 18:01: Limited Out-of-Context Knowledge Reasoning in Large Language Models 19:36: MINERS: Multilingual Language Models as Semantic Retrievers 20:42: Learning Domain-Invariant Features for Out-of-Context News Detection 22:03: Textual Similarity as a Key Metric in Machine Translation Quality Estimation 23:02: On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations 24:31: Multimodal Belief Prediction 25:29: Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing 26:56: Paraphrasing in Affirmative Terms Improves Negation Understanding 27:37: CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization 29:38: TextGrad: Automatic "Differentiation" via Text 31:35: Just Because We Camp, Doesn't Mean We Should: The Ethics of Modelling Queer Voices 32:35: THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report 33:51: Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling 35:22: Simple and Effective Masked Diffusion Language Models 36:35: Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena

June 13, 202438 min

Ep. 261 - Part 1 - June 11, 2024

ArXiv NLP research for Tuesday, June 11, 2024. 00:20: A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation 01:41: Post-Hoc Answer Attribution for Grounded and Trustworthy Long Document Comprehension: Task, Insights, and Challenges 02:32: A Probabilistic Framework for LLM Hallucination Detection via Belief Tree Propagation 04:08: Evolving Subnetwork Training for Large Language Models 05:31: Missingness-resilient Video-enhanced Multimodal Disfluency Detection 06:37: Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models 08:14: Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference 09:33: Delving into ChatGPT usage in academic writing through excess vocabulary 10:53: Paying More Attention to Source Context: Mitigating Unfaithful Translations from Large Language Model 12:12: CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation 13:26: Effectively Compress KV Heads for LLM 15:00: Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study 16:54: Reading Miscue Detection in Primary School through Automatic Speech Recognition 18:09: HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation 20:01: DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs 21:15: Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning 22:35: Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees 24:42: Translating speech with just images 25:35: Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement 26:51: Teaching Language Models to Self-Improve by Learning from Language Feedback 28:25: Merging Improves Self-Critique Against Jailbreak Attacks 29:18: Towards Human-AI Collaboration in Healthcare: Guided Deferral Systems with Large Language Models 30:11: Improving Autoformalization using Type Checking 31:37: Improving Commonsense Bias Classification by Mitigating the Influence of Demographic Terms 33:19: Decipherment-Aware Multilingual Learning in Jointly Trained Language Models 34:20: DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms 35:20: On the Hallucination in Simultaneous Machine Translation 36:07: MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs 37:42: Scholarly Question Answering using Large Language Models in the NFDI4DataScience Gateway

June 11, 202449 min

Ep. 260 - June 10, 2024

ArXiv NLP research for Monday, June 10, 2024. 00:19: Shoulders of Giants: A Look at the Degree and Utility of Openness in NLP Research 00:59: HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs 02:29: The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models 03:24: MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models 04:51: A Multidimensional Framework for Evaluating Lexical Semantic Change with Social Science Applications 05:49: Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text 07:10: Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval 09:08: Recurrent Context Compression: Efficiently Expanding the Context Window of LLM 10:35: Enhancing Long-Term Memory using Hierarchical Aggregate Tree for Retrieval Augmented Generation 11:26: Verifiable Generation with Subsentence-Level Fine-Grained Citations 12:36: Comparing Data Augmentation Methods for End-to-End Task-Oriented Dialog Systems 13:55: Building Bridges: A Dataset for Evaluating Gender-Fair Machine Translation into German 15:28: Can I understand what I create? Self-Knowledge Evaluation of Large Language Models 16:28: Language Models Resist Alignment 17:58: LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages 19:27: Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning 20:27: Combining Embeddings and Domain Knowledge for Job Posting Duplicate Detection 21:37: MaskLID: Code-Switching Language Identification through Iterative Masking 22:49: Multi-Prompting Decoder Helps Better Language Understanding 24:22: Tx-LLM: A Large Language Model for Therapeutics 26:21: Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching 27:43: A Parameter-efficient Language Extension Framework for Multilingual ASR 29:06: MedExQA: Medical Question Answering Benchmark with Multiple Explanations 30:36: Sustained Vowels for Pre- vs Post-Treatment COPD Classification 31:49: MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows 33:40: Symmetric Dot-Product Attention for Efficient Training of BERT Language Models 35:00: Annotation alignment: Comparing LLM and human annotations of conversational safety 36:07: mHuBERT-147: A Compact Multilingual HuBERT Model 37:27: Should We Fine-Tune or RAG? Evaluating Different Techniques to Adapt LLMs for Dialogue 39:00: INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition 40:06: Meta Learning Text-to-Speech Synthesis in over 7000 Languages 40:59: Controlling Emotion in Text-to-Speech with Natural Language Prompts 41:55: Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain 43:29: Multimodal Contextualized Semantic Parsing from Speech 44:25: Interpretability of Language Models via Task Spaces 45:45: Evaluating the Retrieval Component in LLM-Based Question Answering Systems 46:52: Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies 48:08: Can Language Models Serve as Text-Based World Simulators?

June 11, 202437 min

Ep. 259 - June 9, 2024

ArXiv NLP research for Sunday, June 09, 2024. 00:19: How Alignment and Jailbreak Work: Explain LLM Safety through Intermediate Hidden States 01:40: DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation 03:25: Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses 05:08: MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations 06:17: SinkLoRA: Enhanced Efficiency and Chat Capabilities for Long-Context Large Language Models 08:11: Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions 09:54: MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation 11:20: QGEval: A Benchmark for Question Generation Evaluation 12:44: MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model 13:43: Arabic Diacritics in the Wild: Exploiting Opportunities for Improved Diacritization 14:46: The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models 16:30: RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation 18:14: Hidden Holes: topological aspects of language models 19:46: Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper 20:40: Seventeenth-Century Spanish American Notary Records for Fine-Tuning Spanish Large Language Models 22:02: MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering 23:12: II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models 25:17: Zero-Shot End-To-End Spoken Question Answering In Medical Domain 26:27: Are Large Language Models Actually Good at Text Style Transfer? 27:32: Feriji: A French-Zarma Parallel Corpus, Glossary & Translator 28:56: TTM-RE: Memory-Augmented Document-Level Relation Extraction 30:12: Why Don't Prompt-Based Fairness Metrics Correlate? 31:27: Hello Again! LLM-powered Personalized Agent for Long-term Dialogue 33:12: Semisupervised Neural Proto-Language Reconstruction 34:12: Prompting Large Language Models with Audio for General-Purpose Speech Summarization 35:14: A Dual-View Approach to Classifying Radiology Reports by Co-Training 36:07: ThaiCoref: Thai Coreference Resolution Dataset

June 11, 202430 min

Ep. 258 - June 8, 2024

ArXiv NLP research for Saturday, June 08, 2024. 00:19: MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention 01:44: Toward Reliable Ad-hoc Scientific Information Extraction: A Case Study on Two Materials Datasets 02:30: Flexible and Adaptable Summarization via Expertise Separation 04:18: Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization 06:07: CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation 07:23: Venn Diagram Prompting : Accelerating Comprehension with Scaffolding Effect 08:45: VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers 10:19: Planning Like Human: A Dual-process Framework for Dialogue Planning 11:48: Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas 12:57: Recent advancements in computational morphology : A comprehensive survey 14:01: MaTableGPT: GPT-based Table Data Extractor from Materials Science Literature 15:41: Design of reliable technology valuation model with calibrated machine learning of patent indicators 17:08: Fighting Against the Repetitive Training and Sample Dependency Problem in Few-shot Named Entity Recognition 18:59: Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation 20:25: Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities 21:47: ThatiAR: Subjectivity Detection in Arabic News Sentences 23:07: Do LLMs Recognize me, When I is not me: Assessment of LLMs Understanding of Turkish Indexical Pronouns in Indexical Shift Contexts 24:49: Creativity Has Left the Chat: The Price of Debiasing Language Models 25:57: CERET: Cost-Effective Extrinsic Refinement for Text Generation 27:05: GrowOVER: How Can LLMs Adapt to Growing Real-World Knowledge? 28:07: Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives 29:03: ATLAS: Improving Lay Summarisation with Attribute-based Control

June 10, 202452 min

Ep. 257 - June 7, 2024

ArXiv NLP research for Friday, June 07, 2024. 00:19: Key-Element-Informed sLLM Tuning for Document Summarization 01:22: Low-Resource Cross-Lingual Summarization through Few-Shot Learning with Large Language Models 02:42: Large Language Model-guided Document Selection 04:13: More Victories, Less Cooperation: Assessing Cicero's Diplomacy Play 05:24: DiNeR: a Large Realistic Dataset for Evaluating Compositional Generalization 06:43: MATTER: Memory-Augmented Transformer Using Heterogeneous Knowledge Sources 08:01: Mixture-of-Agents Enhances Large Language Model Capabilities 09:09: AICoderEval: Improving AI Domain Code Generation of Large Language Models 11:00: CRAG -- Comprehensive RAG Benchmark 13:04: CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models 14:52: Think out Loud: Emotion Deducing Explanation in Dialogues 16:43: WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild 18:46: SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals 19:58: BERTs are Generative In-Context Learners 20:43: Annotating FrameNet via Structure-Conditioned Language Generation 21:49: Revisiting Catastrophic Forgetting in Large Language Model Tuning 22:43: FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models 24:33: Do Language Models Exhibit Human-like Structural Priming Effects? 25:27: Uncertainty Aware Learning for Language Model Alignment 26:50: The Russian Legislative Corpus 27:24: ComplexTempQA: A Large-Scale Dataset for Complex Temporal Question Answering 28:53: HateDebias: On the Diversity and Variability of Hate Speech Debiasing 30:29: A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques 32:00: Sexism Detection on a Data Diet 33:18: XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model 34:21: Through the Thicket: A Study of Number-Oriented LLMs derived from Random Forest Models 35:32: LLM-based speaker diarization correction: A generalizable approach 36:52: TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models 38:10: BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense 39:10: Quantifying Geospatial in the Common Crawl Corpus 40:14: MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter 41:47: Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences 43:19: Compositional Generalization with Grounded Language Models 44:26: Scenarios and Approaches for Situated Natural Language Explanations 46:04: Are Large Language Models More Empathetic than Humans? 47:38: SUMIE: A Synthetic Benchmark for Incremental Entity Summarization 48:52: Multi-Head RAG: Solving Multi-Aspect Problems with LLMs 50:33: An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

June 7, 202438 min

Ep. 256 - Part 2 - June 6, 2024

ArXiv NLP research for Thursday, June 06, 2024. 00:20: The syntax-semantics interface in a child's path: A study of 3- to 11-year-olds' elicited production of Mandarin recursive relative clauses 02:17: Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models 03:39: Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster 04:36: Intention and Face in Dialog 05:48: Uncovering Limitations of Large Language Models in Information Seeking from Tables 07:15: Are We Done with MMLU? 08:41: Legal Judgment Reimagined: PredEx and the Rise of Intelligent AI Interpretation in Indian Courts 09:53: Do Language Models Understand Morality? Towards a Robust Detection of Moral Content 11:47: Every Answer Matters: Evaluating Commonsense with Probabilistic Measures 12:49: Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and Forgetfulness 14:26: Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness 15:35: Confabulation: The Surprising Value of Large Language Model Hallucinations 16:42: DICE: Detecting In-distribution Contamination in LLM's Fine-tuning Phase for Math Reasoning 18:25: Legal Documents Drafting with Fine-Tuned Pre-Trained Large Language Model 19:32: ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models 20:50: mCSQA: Multilingual Commonsense Reasoning Dataset with Unified Creation Strategy by Language Models and Humans 22:21: What Do Language Models Learn in Context? The Structured Task Hypothesis 23:38: Rethinking LLM and Linguistic Steganalysis: An Efficient Detection of Strongly Concealed Stego 24:58: BEADs: Bias Evaluation Across Domains 26:41: FairytaleQA Translated: Enabling Educational Question and Answer Generation in Less-Resourced Languages 28:03: Benchmark Data Contamination of Large Language Models: A Survey 29:02: Transformers need glasses! Information over-squashing in language tasks 30:26: Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models 31:58: Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People 33:44: ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions 35:19: What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages 36:41: PaCE: Parsimonious Concept Engineering for Large Language Models