Ars Technica AIMay 28, 2026, 9:29 PMKyle Orland重要 74

LLMs believe false statements even after explicit warnings that they're false

Fine-tuning on negated false claims can still make LLMs learn those claims as true.

A new study describes “Negation Neglect,” where LLMs fine-tuned on documents that explicitly mark claims as false still learn the claims as true. Experiments with fabricated statements found models often absorb entity-event associations more strongly than surrounding warnings or negations. The finding raises concerns for fine-tuning pipelines, misinformation handling, and AI safety datasets that include harmful or false content with disclaimers.

想看英文原文 / 完整內容?

前往 Ars Technica AI 原文 →

摘要由 AI 整理,以原文為準。