A Fractional CTO's Claude Code Playbook: How I Ship Client Work 3x Faster

I run multiple fractional CTO engagements at the same time. A year ago that was a juggling act. Today it is not, because I put Claude Code in the middle of every codebase I touch.

This is not a love letter to an AI tool. I have broken production with one. I have watched junior engineers regress because the agent was doing the thinking for them. I have rewritten entire PRs that looked correct and were not. What follows is the workflow I actually use after those scars — the parts worth copying, the parts worth avoiding, and the guardrails that make it safe to let an agent near a real customer's code.

The claim in one sentence

With the right setup, a single senior engineer using Claude Code can deliver the throughput of a small team — if the senior engineer is doing more senior work, not less.

That "if" is the whole article.

Where the 3x actually comes from

When I audit my own time on engagements, the speedup is not coming from raw typing. Code was never the bottleneck. The speedup comes from four specific places:

Context assembly. Opening the repo, finding the relevant files, loading the right mental model. Claude Code compresses hours of onboarding per week into minutes.
Plumbing work. Migrations, fixtures, test scaffolds, type definitions, boring glue code. This is 30–40% of most sprints and the agent eats it.
Parallelism. I can have an agent working on a refactor in one branch while I pair with a founder on architecture in another. I could never do that before.
Review velocity. Good prompts plus good tests mean PRs land cleaner the first time. The review cycle shortens from days to hours.

Nothing in that list is "AI writes my whole app." Anyone selling you that is selling you a future rewrite.

The operating model: Plan → Spec → Execute → Verify

Every non-trivial task I give Claude Code runs through four phases. Skipping any of them is where teams get burned.

1. Plan (human-led)

Before I open the agent I write, in plain English, what I am trying to accomplish and what "done" looks like. Two paragraphs, not twenty. If I cannot write that, the task is not ready.

This step is non-negotiable. The failure mode of AI-assisted dev is not bad code — it is the wrong code, shipped quickly, with confidence. A 10-minute plan prevents a 3-hour cleanup.

2. Spec (agent-assisted)

I ask the agent to turn the plan into a concrete spec: files to touch, functions to add, tests to write, edge cases, open questions. I read it carefully. I push back where it is wrong. Only then do I let it write code.

This is the single highest-leverage habit in the whole playbook. The spec doubles as documentation, as a review artifact, and as a prompt for the execution phase.

3. Execute (agent-led, supervised)

Now the agent implements. I keep diffs small — usually one logical change per run. I almost never let it touch more than ten files in a single pass. When it wants to, I split the task.

A few hard rules I enforce on myself:

Tests get written before or alongside the implementation, not as an afterthought.
Migrations are always their own commit and always reviewed by a human line by line.
Anything touching auth, billing, or data deletion gets a second pair of human eyes no matter how trivial the diff looks.

4. Verify (human-led again)

I read every diff. I run the tests. I run the app. I write one sentence in the PR description explaining what actually changed and why. That sentence is for me, my client, and the next engineer.

The verification phase is where most "AI-accelerated" teams quietly degrade. The agent says "I've updated the tests and they pass," and the human nods and merges. Do not nod and merge.

What I put in `CLAUDE.md` on day one

The single biggest lever for agent quality is the project instructions file at the root of the repo. A good CLAUDE.md turns a generic coding agent into a team member who has read the onboarding doc.

My template has eight sections: project overview, commands, architecture, code conventions, test strategy, domain glossary, "things that look wrong but are intentional," and out-of-bounds directories. I wrote a separate article on exactly what goes in each section; link at the bottom.

The short version: if your CLAUDE.md could describe any Next.js project, it is not doing its job. Specificity is the whole point.

The guardrails that keep this safe for client work

Speed without guardrails is just a faster way to cause incidents. Here is what I require on every engagement before I let an agent near main branch.

A real test suite. Not 90% coverage theater. A suite that actually fails when behavior breaks. If the suite does not exist, building it is job one.

Trunk-based development with small PRs. Long-lived branches plus AI-generated code is a merge-conflict apocalypse. Aim for PRs under 400 lines of real diff.

Feature flags for anything user-visible. Progressive rollout is not optional when your delivery velocity goes up 3x.

Observability before acceleration. If you cannot see what broke in production within two minutes of a deploy, you are not ready to deploy faster. Logs, metrics, error tracking, basic tracing.

A human on call who understands the diff. The person merging is the person paged. No exceptions. This one rule eliminates most of the "the AI did it" incident category.

Where I refuse to use it

Not every task is a good fit. After a year of this I will not use an agent for:

First-time architectural decisions on a new codebase. The agent will happily commit to a direction that is wrong in ways it cannot see. Architecture is my job.
Security-critical primitives. Session handling, password flows, crypto, permission checks. I write these myself and ask the agent to review.
Anything where the spec is "figure out what the user wants." That is a conversation, not a task.
Debugging in production under time pressure. When the site is down, I want a human brain with working memory, not a context-window reset.

A concrete example from last month

One of my clients, a 12-person Series A SaaS, needed a multi-tenant audit log: immutable, queryable, exportable, compliant with their enterprise customers' requirements. Historical estimate: three weeks of engineering for one senior.

What actually happened:

Day 1 morning. I wrote a plan doc with the client. Two pages. Agreed on scope, storage model (append-only table, hash-chained), and out-of-scope items (real-time streaming, a UI viewer — later).
Day 1 afternoon. I ran the plan through Claude Code to produce a spec: schema migration, insert API, query API, retention job, export endpoint, tests. Spec came back with three questions I had not considered. Good questions. We resolved them.
Days 2–4. Agent-led execution in five small PRs. I reviewed each. Caught two real bugs (one off-by-one in the hash chain verification, one missing tenant filter in the query API — exactly the kind of thing I would have caught in a human PR review, because I was doing a human PR review).
Day 5. Load testing, observability, feature flag, staged rollout.

Delivered in one week instead of three. The client's reaction was not "wow, AI." It was "wow, this is how I wish every project went." That is the right reaction. The AI was a multiplier on a process that would have worked without it — just slower.

The uncomfortable counterpoint

The senior-junior gap in AI-assisted dev is real and it is growing. Senior engineers get a multiplier. Junior engineers often get stuck, because the agent will happily write code they cannot fully evaluate, and the feedback loop that used to build their intuition is shortcut.

If you are running an engineering team, this is your problem to solve, not theirs. Pair juniors with seniors on agent-led work. Make them write the plan and the spec themselves before the agent touches anything. Protect their learning time from the productivity pressure the tool creates. I will probably write a full article on this — it is one of the most under-discussed shifts in our field right now.

Your first week with Claude Code on a real codebase

If you want to run this playbook on your own stack, here is the minimum-viable version for week one:

Install Claude Code and run it in a throwaway branch first.
Write a real CLAUDE.md for the repo. Spend at least an hour on it. It will pay for itself within a day.
Pick one small, well-defined task. Not a greenfield feature — something with a clear "done" you can verify.
Run Plan → Spec → Execute → Verify explicitly, out loud, even if it feels like overkill.
Merge the PR yourself. Note what the agent got right and what you had to fix.
Do it again. The compound improvement is steep in the first two weeks.

By week two you will have a feel for where the tool shines and where it is not pulling its weight. By week four you will be changing how you estimate work.

Where I come in

I do this at every engagement. When I join a client as a fractional CTO, one of the first things I set up — alongside the engineering operating rhythm, the delivery methodology, and the observability baseline — is the AI-assisted dev workflow above. I write the CLAUDE.md, I train the team on the Plan → Spec → Execute → Verify loop, and I stay embedded long enough to make sure the speedup is real and the guardrails hold.

If you are a founder or a CEO wondering whether your team is getting the throughput they should from modern tooling, that is exactly the conversation I have every week. Book a 30-minute call and we will look at your specific codebase and team together. No slides, no pitch deck, just a real conversation about what is and is not working.