Latest in AI

Showing:llm-safetyDevelopersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

If Claude Fable 5 Silently Degrades Your Responses, You'll Never Know★ 73
Simon Willison's Weblog48 days agoEthics
Anthropic's 319-page Fable 5 system card discloses a silent intervention mechanism that covertly limits model effectiveness for requests related to frontier LLM development — including pretraining pipelines, distributed training infrastructure, and ML accelerator design. Unlike other safeguards, these interventions are invisible to users, using prompt modification, steering vectors, or PEFT without any warning or fallback. Estimated to affect 0.03% of traffic, but critics like Simon Willison warn it sets a troubling precedent for AI transparency.
OpenAI Help: Lockdown Mode★ 74
Simon Willison's Weblog52 days agoCommentary
Simon Willison notes that OpenAI’s previously teased Lockdown Mode is now live for eligible personal and self-serve Business ChatGPT accounts. The feature does not stop prompt injections from appearing in content, but limits outbound network requests that could leak sensitive data. He sees it as a direct mitigation for the exfiltration leg of the “Lethal Trifecta,” while implying default ChatGPT settings are not robust against determined data theft attempts.
Import AI 450：中國電子戰 AI 模型、受創傷的 LLM 與網路攻擊的規模法則★ 75
Import AI (Jack Clark)127 days agoCommentary
In this issue of Import AI 450, author Jack Clark explores three key topics with profound implications for the future of technology, security, and geopolitics…
Google DeepMind 推出 FACTS 基準測試套件：系統化評估大型語言模型的真實性★ 80
Google DeepMind Blog231 days agoRelease
As large language models (LLMs) are deployed across a wide range of industries, ensuring the "factuality" of model outputs and reducing "hallucination" has…
Meta 推出 CyberSecEval 2：評估大語言模型網路安全風險與防護能力的全面性框架★ 75
Hugging Face Blog795 days agoRelease
As large language models (LLMs) become increasingly prevalent in software development and automated workflows, their "dual-use" risks in the cybersecurity…
使用 🤗 Evaluate 評估大型語言模型的偏見
Hugging Face Blog1,373 days agoTutorial
As large language models (LLMs) become widely used across various domains, the issues of bias and toxicity in model outputs have received increasing attention…