Spec-Driven Development in the Age of AI

The bottleneck in AI-assisted development is not the code generation. The generation is fast. The bottleneck is what happens before the generation — the specification. Give a coding agent a vague task and you get a plausible-looking result that does not quite match what you wanted. Give the same agent a tight spec and you get code that is close to shippable on the first pass. The quality of the input determines the quality of the output, and in 2026 the skill of writing a good spec has become the most underrated engineering skill.

This is not a new idea. Spec-driven development is as old as software engineering. What is new is that the spec now has a direct consumer — the AI — that follows it literally. Humans read specs and use judgment to fill in gaps. AI reads specs and generates exactly what the spec says, including the gaps. That changes the incentive structure: gaps in the spec used to cost a conversation with a colleague; now they cost a broken PR and a review cycle.

What a useful spec looks like

A spec that works for AI-assisted development is not a product requirements document. It is tighter, more technical, and more explicit about boundaries. The format I use and teach teams:

One-sentence intent. What is this change supposed to accomplish? Not what it does — what it accomplishes. "Allow users to filter the dashboard by date range" is an intent. "Add a date picker component" is an implementation detail, and giving it as the intent invites the AI to make its own choices about everything else.

Inputs and outputs. What data goes in and what comes out? Types, shapes, edge cases. If the function takes a date range, what happens when the start date is after the end date? What is the return type? What does the error case look like?

Constraints. What must the implementation not do? "Do not add new dependencies." "Do not modify the database schema." "Do not change the existing API contract." Constraints are more useful than requirements for AI because they prevent scope creep, which is the most common failure mode.

Existing patterns to follow. "Follow the same pattern as the existing /api/reports route." "Use the same component structure as UserTable.tsx." AI is excellent at pattern matching, and giving it a concrete example in the codebase eliminates a huge class of style and convention errors.

Success criteria. How do you know the change is correct? "The existing tests pass. A new test covers the date range filter. The API response matches this shape." Clear success criteria make the review binary — it either meets them or it does not.

Scope boundary. "This change touches these files and only these files. If you need to modify something outside this list, stop and ask." This prevents the AI from refactoring half the codebase while fulfilling a two-file task.

That format fits in a screen or two of text and takes 10–20 minutes to write. It is not a formal document. It is a structured description of the work.

Why specs matter more now

Three reasons the spec matters more in AI-assisted development than it did in purely human development.

First, the AI has no shared context. A human teammate has been in the standup meetings. They know that the billing module is fragile and should not be touched without permission. They know that the team is migrating off Redux and new code should use Zustand. The AI knows none of this unless the spec (or the project instructions file) tells it. Every piece of shared context that is not in the spec is a potential source of error.

Second, the AI does not push back on ambiguity. A human teammate, when given a vague task, asks clarifying questions. An AI, when given a vague task, makes assumptions and produces code. The assumptions are sometimes right and sometimes wrong, and the wrong ones are often plausible enough to pass a casual review. The spec is the forcing function that catches ambiguity before generation.

Third, the review cycle is the expensive part. Generation is cheap. Review is not. A poor spec produces code that requires multiple rounds of review and revision, each of which eats human time. A good spec produces code that is close to correct on the first pass. The 15 minutes you spend on the spec saves an hour of review.

The spec as code

The most effective pattern I have seen is treating specs like code — writing them in a structured format, storing them in the repo, and reviewing them before handing them to the AI.

Concretely, a team I work with stores specs as markdown files in a /specs directory. Each spec has a standard template. Before an engineer runs an agent on a task, they write the spec, open a PR with just the spec, and get a quick review from a teammate. The review catches ambiguity, missing constraints, and scope creep before the AI ever sees it. After the spec is approved, the engineer hands it to the agent.

The spec PR is usually a 5-minute review. It is the cheapest point in the process to catch problems, because no code has been generated yet.

Some teams take this further and include the spec in the PR alongside the generated code. The reviewer reads the spec first, then reviews the code against the spec. This flips the review from "does this code look correct" — which is hard — to "does this code match the spec" — which is much easier.

The CLAUDE.md connection

If you are using Claude Code, the project's CLAUDE.md file is a global spec. It tells Claude the project's conventions, constraints, patterns, and boundaries. The task-level spec then only needs to cover what is specific to the task. Teams that have a strong CLAUDE.md need shorter task specs. Teams that have no CLAUDE.md need longer task specs because every task has to re-explain the basics.

I wrote about CLAUDE.md in detail in CLAUDE.md is the new README. The short version: writing and maintaining the project instructions file is probably the single highest-leverage investment a team can make in AI-assisted development.

Common spec failures

Too much implementation detail. The spec says "use a useEffect hook to fetch data on mount and store it in a useState variable." This over-constrains the AI and prevents it from using a potentially better approach. Specify the behavior and the constraints, not the implementation.

No error cases. The spec describes the happy path. The AI generates code that handles the happy path and does something unpredictable on errors. Always specify what should happen when things go wrong.

No scope boundary. The spec says "add date filtering to the dashboard." The AI refactors the entire dashboard component, adds three new utilities, and changes the API contract. A scope boundary ("touch only DashboardFilters.tsx and the corresponding test file") prevents this.

References to knowledge the AI does not have. "Follow our standard pattern" without saying which file demonstrates the standard pattern. "Use the existing approach" without a pointer. The AI will guess, and the guess will be based on its training data, not your codebase.

No success criteria. "Make the feature work" is not a success criterion. "The existing test suite passes, the new integration test at tests/dashboard-filter.test.ts passes, and the API response matches the type defined at types/api.ts:DashboardResponse" is a success criterion.

The team workflow

The workflow I install at most engagements:

Step 1: Triage. Look at the backlog and identify which tasks are agent-suitable. Use the criteria from coding agents in production: well-defined refactors, test writing, boilerplate, dependency updates, small scoped bug fixes.

Step 2: Spec. For each agent-suitable task, write the spec. Use the template. 10–20 minutes per task.

Step 3: Spec review. Quick review from a teammate. 5 minutes. Catches ambiguity and missing constraints.

Step 4: Agent execution. Hand the spec to the coding agent. The agent works on it. Set a cost cap.

Step 5: Code review. Review the generated code against the spec. Use the AI-specific review heuristics. If the code matches the spec, the review is fast. If it does not, the spec or the agent output needs adjustment.

This workflow front-loads the thinking and back-loads the review. The expensive human judgment goes into the spec, where it prevents problems. The review then confirms the AI followed the spec rather than validating the AI's independent judgment.

Counterpoint: not every task needs a formal spec

A warning. If the task is "rename this function across the codebase," you do not need a spec template. You need a one-line instruction. The spec practice is for tasks with meaningful ambiguity — anything where the AI might reasonably interpret the task in more than one way. For purely mechanical tasks, just run the agent.

The judgment call is: could a capable engineer do this task differently than I intend? If yes, write a spec. If no, just describe the task.

Your next step

This week, pick one task you would normally hand to a coding agent with a one-line description. Instead, write a structured spec using the template above. Hand the spec to the agent and compare the result to what you typically get from a one-liner. The difference is usually large enough to justify the practice for every non-trivial task going forward.

Where I come in

Installing spec-driven practices for AI-assisted development is a natural part of the team workflow setup I do at most AI dev stack engagements. Usually a few days of pairing on real specs, establishing the template, and building the habit. Book a call if your team is using coding agents and the output quality is inconsistent.