Latest in AI

Showing:jailbreakResearchersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Is the US Government's Anthropic Ban Accidentally Helping the Brand?★ 72
TechCrunch AI39 days agoRegulation
The US government ordered Anthropic to withdraw Fable 5 and Mythos 5, citing national security concerns after Amazon researchers reportedly found a method to bypass Fable 5's safety guardrails. Cybersecurity researchers fired back with an open letter calling the move dangerous, while Anthropic noted the same jailbreaks exist across other AI models. The controversy raises the question of whether the government's intervention is inadvertently amplifying Anthropic's public profile.
The US Banned Anthropic's Fable 5 Release, but the Numbers Don't Seem to Care★ 75
TechCrunch AI39 days agoRegulation
The US government compelled Anthropic to withdraw its two newest models, Fable 5 and Mythos 5, citing national security concerns after Amazon researchers allegedly found a way to bypass Fable 5's safety guardrails. Cybersecurity researchers responded with an open letter calling the ban itself dangerous, while Anthropic argued the same jailbreak techniques exist in competing models. Despite the forced pullback, usage numbers appear to show continued demand for the models.
From Jailbreaking to Vibe Hacking: AI Security Shifts to "Psychocybersecurity"
INSIDE 硬塞 AI64 days agoEthics
AI security is shifting from technical jailbreaks to "Vibe Hacking," where attackers use social engineering and psychological tactics to manipulate an LLM's simulated persona. By exploiting the model's behavioral tendencies rather than code vulnerabilities, this trend establishes "psychocybersecurity" as a critical new frontier for AI alignment and safety.
Hackers are learning to exploit chatbot ‘personalities’ for security exploits★ 72
The Verge AI65 days agoEthics
As AI chatbots adopt increasingly sophisticated personas, hackers are shifting from basic prompt injections to social engineering attacks targeting these "personalities." Researchers warn that manipulating a chatbot's defined role (e.g., customer service or empathetic companion) makes it easier to bypass safety guardrails. This evolution poses a significant threat to agentic AI workflows that rely on consistent role-playing and external data integration.
介紹 Chatbot Guardrails Arena：評估大語言模型安全防護網的全新競技場★ 75
Hugging Face Blog859 days agoRelease
As large language models (LLMs) have been widely adopted across industries, ensuring AI systems remain safe and compliant while preventing harmful outputs has…
Hugging Face 推出 Red-Teaming 抗性排行榜：評估 LLM 抵禦惡意越獄與對抗性攻擊的能力★ 75
Hugging Face Blog886 days agoRelease
### Background: The Shortcomings of Static Safety Evaluations As large language models (LLMs) are widely adopted across industries, AI safety has become an…