Latest in AI

Showing:hallucinationGeneralClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Notes from a Tired Egyptian Whose Job Is Explaining That Humans Built the Pyramids
Hacker News (AI keywords)42 days agoCommentary
A McSweeney's humor essay adopts the weary first-person voice of an Egyptian professional whose entire career is devoted to correcting the persistent myth that aliens — not humans — constructed the pyramids. The piece surfaced on Hacker News under AI keywords, signaling the tech community's recognition that large language models and AI chatbots have become major amplifiers of this and similar pseudoscientific claims. It functions as sharp cultural commentary on how AI-generated content can entrench misinformation that human experts must then perpetually refute.
KPMG Pulls AI Usage Report Due to Apparent Hallucinations
TechCrunch AI44 days agoIncident
KPMG, one of the world's largest professional services firms, withdrew a published report on AI usage after it was found to contain apparent hallucinations — errors likely introduced by an AI system used in its preparation. The incident highlights a sharp irony: AI proving unreliable as a source of information about AI itself. It adds to a growing list of high-profile cases where AI-generated content has undermined the credibility of professional and institutional outputs.
Judge Learns Both Sides Used AI, Cancels Trial, Kicks Everyone Off the Case
Hacker News (AI keywords)48 days agoIncident
In a rare legal incident, a judge found that attorneys on both sides of a case had used AI tools in their legal work. The judge responded by canceling the trial entirely and dismissing all lawyers involved. The case highlights growing judicial frustration with unchecked AI use in court filings and the serious professional consequences that can follow.
"Fully Hallucinated Operating System" Simulates an Entire OS via LLM Prompts
r/LocalLLaMA top day50 days agoCommentary
A popular Reddit post highlights a video demonstrating a "Fully Hallucinated Operating System" run entirely inside an LLM. By prompting the model to act as a terminal, it simulates file systems, network requests, and command execution purely through text generation. While impractical for production, this experiment showcases the impressive state-tracking and "world model" capabilities of modern LLMs.
Claude’s new model is more ‘honest’ when it messes up
The Verge AI60 days agoRelease
Anthropic is releasing Claude Opus 4.8 and highlighting the model’s “honesty” as a key improvement. The company says it trains its models to avoid unsupported claims, addressing a broader issue where AI systems sometimes jump to conclusions. Based on the provided excerpt, the update is positioned around reliability and uncertainty handling rather than a specific new tool or benchmark result.
AI 在書中編造「虛擬引言」，但這位作家仍堅持繼續使用 AI 輔助創作
Ars Technica AI66 days agoOpinion
In an era of rapidly growing AI-assisted writing, the collaboration between writers and AI is undergoing unprecedented tests. Author and documentary filmmaker…
法律大翻車：別用 AI 控告在 Facebook 抱怨你是「爛約會對象」的網友
Ars Technica AI70 days agoIncident
This incident originated in the highly popular private Facebook group "Are We Dating the Same Guy?" (commonly abbreviated AWDTSG). Groups of this kind…
arXiv 祭出新政策：提交 AI 生成的垃圾論文或幻覺內容，將面臨禁投一年的處罰★ 75
Ars Technica AI73 days agoIncident
The well-known academic preprint platform arXiv has recently introduced strict new rules regarding AI-generated content. According to the latest policy…
Google DeepMind 推出 FACTS 基準測試套件：系統化評估大型語言模型的真實性★ 80
Google DeepMind Blog231 days agoRelease
As large language models (LLMs) are deployed across a wide range of industries, ensuring the "factuality" of model outputs and reducing "hallucination" has…
Hugging Face 推出「企業情境排行榜」：專為真實世界應用設計的 LLM 評測基準★ 75
Hugging Face Blog909 days agoRelease
Hugging Face has partnered with Patronus AI — a startup focused on LLM evaluation and defense — to officially launch the **Enterprise Scenarios Leaderboard**…
Hugging Face 推出「幻覺排行榜」，開源量化評估大型語言模型的幻覺率★ 75
Hugging Face Blog911 days agoRelease
While large language models (LLMs) have demonstrated remarkable generative capabilities across many domains, "hallucination" — where a model confidently…