If you’ve spent any meaningful time with AI coding agents — Claude Code, Codex, Amp, or any of the growing list — you’ve probably hit the same wall. You ask the agent to build something. It starts strong, then drifts. It refactors files you didn’t ask it to touch. It invents features that don’t exist in your plan. It loses track of where it is. By the time you notice, you’re three rabbit holes deep, your git history is a mess, and the agent has confidently implemented something you never wanted.

The root cause is simple: AI agents work best with constraints, and a vague prompt is the absence of constraints.

The solution isn’t a better prompt. It’s a better document.

The documentation gap in agent coding

Traditional software documentation — wikis, Notion pages, lengthy specs — is written for humans. Humans can skim, infer context, and hold a mental model of an entire project while reading a paragraph. AI agents can’t. They operate within a context window, they process text literally, and they have no persistent memory between sessions.

This creates a fundamental mismatch. You might have a beautiful 50-page product spec, but the agent has no way to extract from it: “What exactly should I do next? What files should I touch? How do I know when I’m done?”

What agents need isn’t more documentation — it’s machine-readable documentation with an explicit execution order. This is a direct application of the principle that the quality and structure of data determines whether AI systems produce real value. In data-driven systems, poor data quality undermines even capable models. In coding systems, poor requirement clarity undermines even capable agents. The fix is the same: invest in input quality before relying on model capability.

The PRD.json: documentation the agent can actually use

The idea of structuring a PRD as JSON initially caught my attention through the Ralph Loop, an open-source autonomous agent loop by snarktank that uses a prd.json file as its single source of truth. Ralph spawns fresh AI instances iteratively, each one reading the same JSON to pick up the next incomplete task. It’s a clever system.

I don’t use Ralph itself — my workflow is different, more hands-on, with an orchestrator/delegate architecture where I stay in the loop. But the core insight from Ralph stuck with me: if you give an agent a structured JSON file with prioritized, atomic tasks and clear acceptance criteria, the agent stops wandering.

So I took that concept and adapted it heavily for my own project, turning the PRD.json into something closer to an execution contract between me and the agent.

Here’s the idea distilled to its simplest form:

{
  "project": "My App",
  "stack": {
    "frontend": "Flutter",
    "backend": "Laravel"
  },
  "phases": [
    {
      "id": "PHASE-1",
      "title": "Project setup",
      "steps": [
        {
          "step": 1,
          "target": "US-001",
          "description": "Create the Flutter project scaffold"
        },
        {
          "step": 2,
          "target": "US-002",
          "description": "Configure dependencies in pubspec.yaml"
        }
      ],
      "userStories": [
        {
          "id": "US-001",
          "title": "Create Flutter project",
          "acceptanceCriteria": [
            "Run: flutter create --org com.myapp my_app",
            "Verify it compiles: flutter run -d chrome",
            "Initialize git: git init && git add -A && git commit -m 'init'"
          ],
          "priority": 1,
          "passes": false
        },
        {
          "id": "US-002",
          "title": "Configure dependencies",
          "acceptanceCriteria": [
            "Add get: ^4.7.3 to pubspec.yaml",
            "Add http: ^1.6.0 to pubspec.yaml",
            "Run flutter pub get — no errors"
          ],
          "priority": 2,
          "passes": false
        }
      ]
    }
  ]
}

That’s it. No prose, no ambiguity, no room for interpretation. Each task has:

  • An ID the agent can reference and track
  • Acceptance criteria that are verifiable actions, not vague descriptions
  • A priority that enforces execution order
  • A passes flag that records completion state

Why this works (and why natural language doesn’t)

When you tell an agent “set up the Flutter project with the right dependencies”, you’re relying on the agent to decide what “right” means, what order to do things in, and when to stop. That’s where hallucinations creep in — the agent fills gaps in your instructions with its own assumptions.

A PRD.json eliminates that gap. Without it, the agent is operating inside what amounts to a sandbox with no shared reality — it has no access to your intentions, your constraints, or your definition of done, so it fills those gaps with statistically plausible completions that may have nothing to do with what you actually need. With the PRD.json, the agent reads it and knows:

  1. What to do — the description and acceptance criteria are explicit
  2. In what order — priority numbers and step sequences are unambiguous
  3. When it’s done — each criterion is a binary check, not a judgment call
  4. What NOT to do — if it’s not in the JSON, it’s not in scope

This last point is underestimated. Half the battle with AI agents is preventing them from doing too much. A structured PRD acts as a fence: the agent operates within it and has no reason to wander outside. This shift from vague prompts to explicit structured intentions is exactly the transformation described in agentic programming as a move from procedural instruction to intentional constraint. You are no longer transmitting procedures. You are projecting possibilities and defining clear boundaries around them.

What this looks like in practice

To make this concrete, here’s a real example from a recent project — a mobile app with a Laravel backend and a Flutter frontend.

Without the PRD.json, I asked the agent to “build the authentication flow — login, registration, password reset, with Laravel Sanctum on the backend and GetX on the frontend.”

The agent started well. It scaffolded the Laravel auth controllers, set up Sanctum, created the Flutter login screen. Then it kept going. It added email verification I hadn’t planned for. It refactored the routing structure to accommodate a “future dashboard.” It created a middleware layer for role-based access control that wasn’t in scope. Three commits later, I had 14 modified files, a broken test suite, and an auth system designed for a project twice the size of mine.

Rolling back took longer than the original implementation would have.

With the PRD.json, the same work looked like this:

{
  "id": "US-011",
  "title": "Login endpoint",
  "acceptanceCriteria": [
    "POST /api/login accepts email + password",
    "Returns Sanctum token on success",
    "Returns 401 with message on failure",
    "Run: php artisan test --filter=LoginTest — all pass"
  ],
  "priority": 11,
  "passes": false
},
{
  "id": "US-012",
  "title": "Login screen",
  "acceptanceCriteria": [
    "Flutter screen with email and password fields",
    "Calls POST /api/login via AuthService",
    "Stores token using GetStorage",
    "Redirects to /home on success",
    "Shows error snackbar on 401"
  ],
  "priority": 12,
  "passes": false
}

The agent implemented US-011, ran the test, marked it as passed, committed. Then moved to US-012, built exactly what was specified, committed. No email verification. No role-based access. No speculative architecture. Two focused commits, both reviewable in under a minute.

The difference wasn’t the agent’s capability — it was the same model both times. The difference was that the second time, the agent had no gaps to fill with its own ideas. This mirrors the principle behind AI-assisted intake filtering for bug reports: the quality of structured input to the agent determines the quality of output. Just as a well-designed intake conversation removes ambiguity before formalisation, a PRD.json removes ambiguity before coding begins.

Acceptance criteria are the real secret

The most important field in the entire JSON is acceptanceCriteria. This is where most people cut corners, and it’s exactly where the agent needs the most precision.

Bad acceptance criteria:

"acceptanceCriteria": ["Project should be set up correctly"]

Good acceptance criteria:

"acceptanceCriteria": [
  "Run: flutter create --org com.myapp my_app",
  "Verify it compiles: flutter run -d chrome",
  "Initialize git: git init && git add -A && git commit -m 'init'"
]

The difference is that good criteria are commands the agent can execute and verify. They leave no room for interpretation. The agent doesn’t need to wonder what “correctly” means — it just runs the commands and checks the output. This is exactly the principle at work in post-commit verification of AI-written code: when acceptance criteria are executable tests baked into the requirements, you reduce the class of defects that can hide in deployed code.

Where to go from here

The PRD.json pattern scales naturally. Once the structure is in place, you can extend it — adding fields to control which model handles each story, specifying test strategies per task, or integrating it into a CI pipeline that reads the JSON directly.

But the core value is in the basics: IDs, priorities, binary acceptance criteria, and a completion flag. Get those right first. Everything else is optimization.

The compound effect: less drift, less waste, less frustration

After months of working this way, the difference is stark. Sessions that used to derail after 20 minutes now run cleanly through entire phases. The agent commits small, focused changes. The git history is readable. And when something does go wrong, it’s easy to trace back to a specific user story and fix it in isolation. This structured feedback loop mirrors the test-driven optimisation approach used in building self-improving pipelines: define what correct means, test against it, iterate. In that case the loss function was a test suite. Here it is acceptance criteria. The principle is identical — the agent improves by operating inside measurable constraints.

The upfront cost is real — writing a good PRD.json takes time. But that time is an investment you recover tenfold by not fighting the agent, not reverting hallucinated code, and not re-explaining context that the agent forgot between sessions.

Getting started

You don’t need a framework. You don’t need Ralph or any specific tool. You need:

  1. A JSON file in your project root
  2. Phases that group related work
  3. User stories with IDs, priorities, and binary acceptance criteria
  4. A passes flag to track what’s done

Point your agent at it. Tell it to read the PRD, pick the highest-priority incomplete story, implement it, verify the acceptance criteria, mark it as passed, and commit. That’s the whole workflow.

The structure doesn’t have to be complex. Start with five user stories. See how the agent behaves when it has rails to run on. You’ll never go back to unstructured prompting.

The PRD.json pattern was inspired by the Ralph Loop by snarktank, which uses a similar JSON structure for fully autonomous agent execution. I don’t use Ralph’s autonomous loop, but the insight that agents need machine-readable, structured requirements — not just good prompts — has fundamentally changed how I work with AI coding tools.