Hacker News (AI keywords)Jun 4, 2026, 12:56 AMjc4p
I built a vulnerable app and spent $1,500 seeing if LLMs could hack it
An informal benchmark tested whether LLM agents could exploit a vulnerable mobile app.
The author built a vulnerable React Native app with a Python backend and a Firebase access-control flaw. GPT 5.5 solved 7 of 10 runs, while Deepseek and Claude variants solved fewer attempts. Many other models failed due to refusals, API-focused tunnel vision, false positives, or inability to use the exposed Firebase path correctly.
想看英文原文 / 完整內容?
前往 Hacker News (AI keywords) 原文 →相關
摘要由 AI 整理,以原文為準。