Latest in AI

Showing:guardrailsClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Anthropic Apologizes for Hidden Claude Fable Guardrails
The Verge AI47 days agoIncident
Anthropic apologized for launching Claude Fable 5 with hidden safeguards that silently altered or degraded answers when the system suspected model-distillation attempts. The company now says those queries will visibly fall back to Claude Opus 4.8, matching how Fable handles other high-risk areas. The reversal follows backlash from AI researchers who warned that invisible restrictions could undermine evaluation, research, and competing model development.
Security Researchers Criticize Anthropic Fable Safeguards as Too Strict
Hacker News (AI keywords)47 days agoEthics
Anthropic released Fable as a public but limited version of its cybersecurity-focused Mythos model. Security researchers say its guardrails trigger on broad cyber-related wording, blocking tasks like blog analysis, secure coding, and code review. The restrictions aim to reduce malware, software compromise, and biology-related misuse, but the current implementation may frustrate legitimate security work.
Cybersecurity Researchers Criticize Anthropic's Fable for Overly Strict Guardrails
TechCrunch AI47 days agoIncident
Anthropic's latest model Fable is drawing complaints from the cybersecurity research community over guardrails deemed excessively restrictive. Researchers say the model's content filters block even legitimate security tasks, hampering professional workflows. The incident highlights a persistent tension between AI safety measures and the practical needs of security professionals who must engage with offensive techniques defensively.
Anthropic Releases Claude Fable 5, Its First Public Mythos-Class Model, With Guardrails for High-Risk Domains★ 76
TechCrunch AI48 days agoRelease
Anthropic has released Claude Fable 5, marking the first time a model from its high-capability Mythos family is available to the general public. The model includes built-in guardrails that restrict responses in high-risk domains such as cybersecurity and biology to mitigate misuse potential. The launch comes just days after Anthropic publicly warned that AI technology is becoming increasingly and alarmingly dangerous.
ZeroDrift raises $10 million to protect AI models from themselves
TechCrunch AI56 days agoBusiness
ZeroDrift raised $10 million for an AI compliance service. The service sits between AI models and end users, checking messages before delivery. When an output might create a compliance problem, the system flags and replaces it, adding an intermediary control layer for AI applications.
ServiceNow AI 推出 AprielGuard：提升現代 LLM 系統安全與對抗防禦能力的防護欄模型★ 75
Hugging Face Blog217 days agoRelease
As large language models (LLMs) are widely deployed across enterprises and various applications, ensuring the safety of their outputs and defending against…
OpenAI 的 GPT-OSS-Safeguard-20B 安全模型現已在 Vercel AI Gateway 中推出★ 75
Vercel Changelog272 days agoRelease
Vercel announced in its Changelog that it is officially adding support for OpenAI's new safety guardrail model, **GPT-OSS-Safeguard-20B**, within the Vercel AI…
Meta 推出 CyberSecEval 2：評估大語言模型網路安全風險與防護能力的全面性框架★ 75
Hugging Face Blog795 days agoRelease
As large language models (LLMs) become increasingly prevalent in software development and automated workflows, their "dual-use" risks in the cybersecurity…
介紹 Chatbot Guardrails Arena：評估大語言模型安全防護網的全新競技場★ 75
Hugging Face Blog859 days agoRelease
As large language models (LLMs) have been widely adopted across industries, ensuring AI systems remain safe and compliant while preventing harmful outputs has…