What One Prompt Built
The question was simple: how far can one prompt go?

That’s Rogue Extract. A Vampire Survivors-style roguelite in Godot 4. One prompt produced the entire thing: a playable game, full design documentation, a multi-version art pipeline, and a strategy document that was more honest about the project’s gaps than most human-written postmortems.

What the prompt produced
The first git commit contained 344 GDScript files, 181 scene files, and 38 resources. Five working weapons (Ice Spear, Tornado, Javelin, Acid Flask, Toxic Cloud), seventeen enemy types with coded behavior, a meta-progression system with gold economy and permanent unlocks, and multi-platform export for both Windows and web.
It also produced a 179-line game design document, a 668-line strategy assessment, and a prioritized work queue. And the art pipeline: four Python scripts totaling over 4,000 lines, the most advanced version running 2,292 lines with reference-driven generation, variant scoring, and auto-deployment into Godot scene files.
The sprite coherence problem
The interesting research wasn’t the game code. It was the art pipeline. A “toxic slime” and a “plague bat” generated separately share nothing: palette, proportions, outline weight, all different. The fix was a strict style prefix with locked hex codes:
STYLE_PREFIX = (
"16-bit retro indie pixel art, "
"BOLD shapes, THICK 2-pixel black outlines, HIGH CONTRAST. "
"Limited 24-color palette: purple-blacks (#0D0B1A, #1A1333), "
"toxic greens (#2D8B4E, #3EBF68), amber golds (#D4A030, #F0C850), "
"corrupted reds (#8B2D2D, #BF3E3E), bone whites (#D4C8B0, #E8DCC8). "
)Walk cycles were harder. Four frames of one character with identical proportions and subtle pose changes. Ask Gemini for all four at once and you get four different characters. The pipeline solved this with two-stage generation: create a reference sprite first, then feed it back as a multimodal input with frame-by-frame instructions.






The honest assessment
The strategy doc is the most interesting artifact. It didn’t just plan. It graded itself:
| Area | Designed | Built | Gap |
|---|---|---|---|
| Run length | 15 minutes | 5 minutes | 67% short |
| Characters | 6 | 1 | No variety |
| Enemy art | 17 coded | 6 with art | 11 on placeholders |
| Sound assets | 47 needed | ~12 | 74% missing |
| Behavior AI | Beehave (installed) | Unused | All basic vector math |
Those numbers came from the same prompt that built the game. The system that generated 344 scripts also generated a document explaining exactly where those scripts fall short. “Every run feels identical,” it wrote. “Same arena, same enemy sequence, same weapon options, one character. Zero run variety.”
What broke

Green fringe bleeds through the chromakey removal. Eleven of seventeen enemies still run on placeholder art. The player character vanishes into its own floor tiles (dark purple on dark purple). The 519% CPU crash got fixed with 135 lines of GDScript that throttle spawning below 30 FPS and cull the farthest enemies when the count exceeds 200. The white flash shader remains broken.
What happened next
The commit went in at 11:42 PM. By 11:44, six art improvement iterations had run. By 1:31 AM, ten gameplay iterations had completed: bug fixes, balance tuning, weapon adjustments. The overnight automation loop took over and has been running nightly since, each cycle pulling from the work queue, testing, committing if stable, rolling back if not.
The prompt produced a playable game. The automation loop is trying to produce a good one.