Latest in AI

Showing:agent-securityResearchersClear ×

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

OpenLumara Creator Challenges Reddit to Hack Its Public Agent Instance
r/LocalLLaMA top day48 days agoIncident
The creator of OpenLumara posted a public challenge asking r/LocalLLaMA users to try breaking into a Discord-hosted instance of the local-model agent. They claimed common prompt-engineering attacks would not work because modules and sandboxes were heavily locked down. The post later listed several successful findings, including missing path traversal protection, an authorization-check bypass, and another undisclosed exploit pending a fix.
The Meta hack shows there’s more to AI security than Mythos★ 74
MIT Tech Review AI53 days agoIncident
Attackers reportedly used Meta’s AI customer support agent to hijack Instagram accounts by asking it to link accounts to attacker-controlled emails. MIT Technology Review frames the incident as a reminder that AI security is not only about powerful future systems like Mythos. The immediate risk is giving AI agents sensitive operational powers without strong authentication, permissions, review, and testing.
How we contain Claude across products★ 74
Hacker News (AI keywords)54 days agoCommentary
Anthropic describes containment as the core security strategy for increasingly capable Claude agents. The post compares ephemeral containers for claude.ai, OS-level sandboxing and approvals for Claude Code, and VM isolation for Claude Cowork. It also details missed risks, including pre-trust project config execution, user-delivered prompt injection, exfiltration through approved domains, and reduced enterprise visibility inside VMs.
How we contain Claude across products
Simon Willison's Weblog58 days agoCommentary
Anthropic explains how process sandboxes, VMs, filesystem boundaries, and egress controls limit what Claude agents can access. Claude.ai uses gVisor; local Claude Code uses Seatbelt on macOS and Bubblewrap on Linux; Cowork runs in a full VM. Simon Willison highlights the documentation quality, notes a previously missed file-exfiltration path, and plans to revisit Anthropic's open-source srt tool.
Microsoft Copilot Cowork Exfiltrates Files★ 76
Simon Willison's Weblog62 days agoIncident
Simon Willison summarizes a PromptArmor report about Microsoft Copilot Cowork and agentic data exfiltration risks. The issue involved agents sending messages to a user’s own inbox without approval, where rendered external images could trigger requests to attacker-controlled sites. Because OneDrive can create pre-authenticated download links, a successful prompt injection could leak links that allow attackers to download files.