Hacker News (AI keywords)Jun 4, 2026, 12:56 AMjc4p

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it

An informal benchmark tested whether LLM agents could exploit a vulnerable mobile app.

The author built a vulnerable React Native app with a Python backend and a Firebase access-control flaw. GPT 5.5 solved 7 of 10 runs, while Deepseek and Claude variants solved fewer attempts. Many other models failed due to refusals, API-focused tunnel vision, false positives, or inability to use the exposed Firebase path correctly.

想看英文原文 / 完整內容?

前往 Hacker News (AI keywords) 原文 →

摘要由 AI 整理,以原文為準。