I built a vulnerable app and spent $1,500 seeing if LLMs could hack it

An informal benchmark tested whether LLM agents could exploit a vulnerable mobile app.

The author built a vulnerable React Native app with a Python backend and a Firebase access-control flaw. GPT 5.5 solved 7 of 10 runs, while Deepseek and Claude variants solved fewer attempts. Many other models failed due to refusals, API-focused tunnel vision, false positives, or inability to use the exposed Firebase path correctly.

This article is an experimental, informal test of LLM security capabilities. The author built a fake book-review app called BookNook, with a React Native Expo frontend and a Python / FastAPI backend, and the challenge goal was to find a flag within a particular user's private reviews. The real vulnerability was not in the API; the API itself was designed to be relatively secure, but the app's google-services.json exposed Firebase information, and the data layer's Firebase / Firestore permission configuration allowed an attacker to bypass the backend, register a user directly, and read the data. The author notes that such problems are common in both Firebase and Supabase apps and can be classified as Broken Access Control or Missing Object-Level Authorization.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.