Anatomy of a Rebuild: When the Rewrite Is the Right Call

The most famous essay in software engineering is Joel Spolsky's argument that you should never rewrite from scratch. It was right when he wrote it and it is mostly right today. The "big bang rewrite" — stop all feature development, rebuild the entire product on a new stack, switch over on launch day — is a company-killer. Teams underestimate the scope, customers lose patience, and the rewrite takes twice as long as planned.

But the conclusion that you should never rewrite is too strong. I have managed four rewrites in the last three years across client engagements, and all four were the right call. The difference between a rewrite that kills a company and a rewrite that saves one is not whether you do it — it is how you do it.

When the rewrite is justified

I see five patterns where rewriting is genuinely the best option.

The technology is end-of-life. The product is built on a framework or runtime that is no longer maintained, no longer receives security patches, and has a shrinking talent pool. Migrating incrementally is not possible because the target stack is fundamentally different. Examples: Flash-based applications, legacy PHP frameworks with no upgrade path, Ruby on Rails 3.x.

The architecture cannot support the business. The product was built for a different business model and the current model requires capabilities the architecture cannot provide. A product built as a single-tenant desktop app that now needs to be multi-tenant SaaS. A product built for batch processing that now needs real-time streaming.

The codebase is a liability. The tech debt has accumulated to the point where every change is a risk, the test suite (if it exists) is unreliable, and new engineers take months to become productive. The incremental approach — fixing one module at a time — would take longer than a rebuild because the modules are too entangled to separate.

The team has no knowledge of the existing codebase. The original developers are gone, the documentation is absent, and the remaining team has resorted to cargo-cult maintenance — making changes without understanding why. A rebuild gives the current team ownership of code they understand.

The security model is fundamentally broken. Auth, data isolation, or encryption were implemented so poorly that patching is not feasible. The product needs to be rebuilt with security as a foundational concern, not a retrofit.

If none of these are true, you should not rewrite. If one or more are true, you should evaluate the rewrite option seriously.

The phased approach

The approach that works is not a big bang rewrite. It is a phased replacement that keeps the existing product running while the new one grows alongside it.

Phase 1: Define the target architecture. Before writing any code, document the target architecture. What stack, what patterns, what the data model looks like, how the migration path works. This phase takes 1–2 weeks and prevents the most expensive mistakes.

Phase 2: Build the foundation. Set up the new project with CI/CD, the core data model, authentication, and the basic infrastructure. No features yet — just the scaffolding that every feature will build on. This phase takes 2–4 weeks.

Phase 3: Strangler fig migration. Migrate one feature at a time from the old system to the new one. Each feature is built in the new system, validated against the old system's behavior, and then traffic is routed from old to new. The old and new systems coexist, connected by a routing layer or API gateway that sends requests to the right system based on which features have been migrated.

The strangler fig pattern — named after a vine that grows around a tree and eventually replaces it — is the key to a safe rewrite. At any point, you can stop the migration and the product still works. The old features continue running on the old system. The new features run on the new system. There is never a "launch day" where everything switches at once.

Phase 4: Data migration. As features move to the new system, their data migrates too. This is often the hardest phase because data has relationships, histories, and edge cases that the old system handled implicitly and the new system needs to handle explicitly. Run the data migration in parallel — write to both systems, compare, and cut over when the new system is verified.

Phase 5: Decommission the old system. Once all features and data have been migrated, shut down the old system. Do not rush this step — keep the old system in read-only mode for a period so you can verify that nothing was missed.

The timeline reality

Founders ask me "how long will the rewrite take?" and they want a number. The honest answer: 3–6 months for a seed-stage product, 6–12 months for a Series A product, and 12–18 months for anything larger. These timelines assume the phased approach, not a big bang. They also assume that the team is not trying to rebuild the entire product at once — they are prioritizing the features by usage and migrating the most important ones first.

The critical insight: during the phased rewrite, the team ships new features on the new system. The rewrite is not a pause in product development — it is a different way of doing product development. New features are built on the new stack from day one. Only the existing features need to be migrated.

The team question

A rewrite requires at least one senior engineer who understands the old system well enough to verify that the new system replicates its behavior. If the original developers are gone and nobody understands the old system, the first step is a codebase audit — a few weeks of reading code, documenting behavior, and building a test suite that captures what the old system actually does (not what the docs say it does).

The rewrite team should be small — two to four engineers. Larger teams introduce coordination overhead that slows the work. The rest of the engineering team continues feature development and maintenance on the old system during the early phases, then shifts to the new system as features are migrated.

The business case

The rewrite needs a business case that justifies the investment. The case is not "the code is ugly." The case is one or more of: we cannot hire engineers for the current stack, our deployment reliability is unacceptable, our security posture requires a rebuild, the current architecture blocks a revenue-generating capability, or our maintenance cost exceeds our feature development capacity.

Present the case with numbers. What is the current maintenance cost? What is the expected maintenance cost after the rewrite? What capabilities does the new architecture unlock? What is the cost of not rewriting?

Counterpoint: most of the time, do not rewrite

I want to be clear: most teams considering a rewrite should not do one. The incremental approach — modularize, refactor, upgrade dependencies, improve test coverage — is usually sufficient and carries far less risk. The rewrite is justified only when the incremental approach is genuinely not feasible due to one of the five patterns listed above.

The biggest danger of this article is that it gives engineers ammunition to argue for rewrites they do not need. Every codebase has problems. Most of those problems can be fixed without starting over.

Your next step

If you are considering a rewrite, answer three questions this week. First: which of the five justification patterns applies? If none, stop here. Second: can you articulate the business case with numbers? If not, the rewrite is not justified yet. Third: can you execute the phased approach, or does your situation require a big bang? If big bang is the only option, the risk is high enough to warrant serious caution.

Where I come in

I have managed rewrites from the planning phase through decommissioning the old system. The engagement typically starts with a codebase audit and a go/no-go recommendation, followed by architecture design and phased execution. Book a call if your team is debating a rewrite and you want an outside assessment of whether it is the right move.