Ars Technica AIMay 28, 2026, 9:29 PMKyle Orland重要 74
LLMs believe false statements even after explicit warnings that they're false
Fine-tuning on negated false claims can still make LLMs learn those claims as true.
A new study describes “Negation Neglect,” where LLMs fine-tuned on documents that explicitly mark claims as false still learn the claims as true. Experiments with fabricated statements found models often absorb entity-event associations more strongly than surrounding warnings or negations. The finding raises concerns for fine-tuning pipelines, misinformation handling, and AI safety datasets that include harmful or false content with disclaimers.
想看英文原文 / 完整內容?
前往 Ars Technica AI 原文 →摘要由 AI 整理,以原文為準。