DORA Metrics for Startups: The Four Numbers That Matter and the Ones That Don't

Engineering metrics are one of the most abused concepts in startup management. Every year I watch a founder try to measure their team with something they picked up in a Medium article — lines of code, commits per day, Jira velocity — and every year it ends the same way. The numbers move. The team games the numbers. The actual product does not ship any faster.

There is a better option, and it has been sitting in plain sight for most of a decade. It is called DORA — short for the DevOps Research and Assessment program that produced the research. It is four numbers. It is taken seriously by research-oriented engineering organizations. And crucially, it works just as well for a 6-person startup as it does for a 6,000-person enterprise.

This article is the founder's version: what the four metrics actually are, what good looks like at a startup stage, how to start measuring them without buying expensive tooling, and the mistakes that turn DORA into yet another dashboard nobody looks at.

The four metrics in plain English

Two of the four measure throughput — how fast you can ship changes. Two of them measure stability — how reliably those changes work.

1. Deployment Frequency

What it measures: How often you deploy code to production.

Why it matters: Deployment frequency is a direct measure of how small and safe your team considers each change. Teams that deploy many times a day do so because each change is small, reversible, and independently verifiable. Teams that deploy once every two weeks are usually batching a large amount of risk into each release, and feeling the pain.

What good looks like at a startup: Multiple deploys per day for a healthy small team. At a minimum, daily. If your team is deploying less than once a week, something is wrong, and the something is almost never "our code is too complicated to deploy safely."

2. Lead Time for Changes

What it measures: From the moment a code change is committed to the moment it is running in production, how long does it take?

Why it matters: Lead time is the single best proxy for how much process friction lives between your engineers and your customers. Long lead times mean long review queues, long CI pipelines, long QA cycles, long release approvals, or all of the above. Short lead times mean the path from "idea" to "in front of users" is clear.

What good looks like at a startup: Under a day for the median change. Elite teams measure this in hours. If your median lead time is measured in weeks, the problem is not "we need more engineers." The problem is the pipeline.

3. Change Failure Rate

What it measures: Of the changes you deploy to production, what percentage cause a failure — an incident, a rollback, a hotfix?

Why it matters: This is the first stability metric, and it is the counterweight to deployment frequency. It keeps you honest. Anyone can ship fast if they are willing to break things constantly. A low change failure rate means you are shipping fast and your changes are safe.

What good looks like at a startup: Under 15% is healthy. Under 10% is very good. Above 30% and you need to look hard at your test coverage, your code review practice, or your feature-flag discipline.

4. Mean Time to Recovery (MTTR)

What it measures: When something does break in production, how long does it take to get back to a healthy state?

Why it matters: MTTR is the second stability metric, and it is the one that separates teams that are terrified of production from teams that treat production as a known environment they can reason about. Low MTTR means you have observability, you have rollback capability, you have on-call discipline, and your team knows how to debug under pressure.

What good looks like at a startup: Under an hour for the median incident. If your MTTR is measured in days, it almost always means one of three things: you cannot see what is happening in production, you cannot safely deploy a fix, or you do not know who is responsible for responding.

Why these four and not others?

The research behind DORA is genuinely good. It was conducted across thousands of engineering teams over many years and found, consistently, that these four metrics correlate with organizational performance more strongly than any other software delivery metrics tested. Teams that do well on all four ship more, break things less, recover faster, and — this is the part founders care about — correlate with better business outcomes.

The other thing the research found: these four metrics are balanced. You cannot game them individually without hurting the others. Ship faster by removing tests? Change failure rate goes up. Reduce change failure rate by adding approval gates? Lead time goes up. The only way to move all four in the right direction is to improve the underlying engineering practice. That is exactly what you want from a metric.

The metrics I specifically do not recommend

Three metrics I actively steer founders away from:

Lines of code. Correlates with nothing useful and rewards verbose writing. Ignore.

Commits per engineer per day. Easily gamed, punishes careful thinking, and says nothing about whether the commits matter. Ignore.

Jira velocity / story points delivered per sprint. The least harmful of the three, but still corrupt in practice. Story points are a local planning tool, not a cross-team metric. The moment you start comparing velocity across teams or across quarters, engineers start inflating estimates, and the metric collapses. I have never seen velocity survive contact with management pressure.

If your current engineering dashboard is mostly these three, replace it.

How to actually measure DORA at a startup

This is where most guides lose people. They tell you to buy a commercial DORA dashboard or build a complicated data pipeline. That is not necessary at the startup stage.

Here is the minimum viable setup I install for clients, usually in an afternoon:

Deployment Frequency. If you deploy through a CI/CD system (GitHub Actions, Vercel, Fly, Railway, anything), you already have a log of deployments. Count them. You do not need a dashboard — a spreadsheet with a weekly count is plenty for a small team.

Lead Time for Changes. This one requires two timestamps: when a commit was merged to main, and when it was deployed to production. Both are in your git history and your deploy logs respectively. Compute the median weekly. A 30-line script in whatever language your team uses gets you 90% there.

Change Failure Rate. Track incidents in a simple list — one row per incident, with the commit or deploy that caused it. Divide by total deploys for the period. If you do not have an incident tracker, start one today; a shared doc will do for the first month.

Mean Time to Recovery. From the same incident list, capture "detected at" and "resolved at" timestamps. Median them weekly. This is the hardest of the four to measure well because detection is often fuzzy. Good enough is much better than nothing.

Total cost: a few hours of setup, zero dollars of tooling. Once the rhythm is established, you can invest in better tooling (commercial DORA dashboards exist and some are quite good), but you should not start there. Start with pen and paper and upgrade when the volume demands it.

What to do with the numbers

Measuring is worthless if nothing changes. A few rules for making DORA actually move the needle:

Review weekly, not daily. DORA is a trailing indicator. Week-over-week is the right resolution. Looking at it every day creates noise and false signals.

Review the numbers as a team, not over the team. The point is to surface bottlenecks, not to rank engineers. Pull up the four numbers in the Wednesday engineering review and ask the team: what is the biggest thing making any of these worse? Let them propose the fix.

Never, ever tie DORA to individual performance reviews. The research behind DORA is explicit about this. Use it to measure the system, not the people inside it. The moment engineers believe the numbers affect their bonuses, the numbers become lies.

Pair DORA with one product outcome metric. DORA tells you how well the engineering system is working. It does not tell you whether you are building the right thing. Always look at DORA alongside at least one customer-facing number — active users, activation rate, revenue, whatever is the right one for your stage.

Move one number at a time. If all four metrics look bad, do not try to fix all four at once. Pick the one whose fix would unblock the others. Usually that is lead time, because long lead time is both a symptom and a cause of many other problems.

The common ways DORA goes wrong

Gaming by definition. "What counts as a deploy?" "Does a hotfix count as a change failure?" Definitions matter. Write them down, stick to them, and do not retroactively change them to make the numbers look better. The point is the trend line.

Optimizing one number. A team that drops lead time from five days to one day by skipping code review is not improving. A team that pushes deployment frequency up by deploying empty no-op changes is cheating themselves. Watch all four together.

Using DORA for bragging rights. The numbers are for internal use. Nobody outside your company cares whether your deployment frequency is four per day or forty. Do not put DORA on a marketing page. Use it as a mirror.

Quitting too early. It takes about two months for meaningful signal to stabilize after you start measuring. Do not conclude anything after two weeks. Give the system time to show you what it is actually doing.

A real example

One of my clients a few quarters ago was convinced their team was "slow." Founder was frustrated. Engineers felt under-appreciated. I asked what "slow" meant and got a lot of vibes and very few numbers.

We installed minimum-viable DORA tracking over an afternoon. After two weeks of data, the numbers told a specific story: deployment frequency was fine (about five per day), change failure rate was fine (around 8%), MTTR was fine (about 45 minutes). Lead time was the problem — a median of nine days from commit to prod, driven almost entirely by a code review queue that nobody owned.

The fix was one change to the Wednesday engineering review: we started the meeting by looking at any PR open for more than two days. It was not about rebuke, just visibility. Within three weeks the median lead time dropped to under a day. The team felt faster because they were faster, and the founder's anxiety dropped. Nothing else about the team changed. No reorg, no new hires, no new tools. One metric, one specific intervention, measurable result.

That is what DORA is for. Not a dashboard. A conversation starter that points at the thing that is actually wrong.

Counterpoint: metrics are not management

I want to be clear about what DORA cannot do. It cannot tell you whether an engineer is happy. It cannot tell you whether the product is good. It cannot tell you whether you are solving the right problem. It is a very good measure of how the engineering system is running, and a poor measure of everything else.

Founders who fall in love with metrics will try to manage the entire engineering org through DORA. That always ends badly. Metrics are instruments on the dashboard of a car you are driving. You still have to steer the car.

Your first week with DORA

If you want to start measuring DORA at your startup, the first week looks like this:

Day 1: Write down your definitions. What counts as a deploy, what counts as a change failure, how you will measure lead time.
Day 2: Set up the simplest possible tracking for each of the four metrics. Spreadsheet, small script, whatever is fastest.
Day 3: Backfill the last four weeks of data from git and deploy logs if possible. You want a baseline.
Day 4: Share the baseline with the team. Not as a score — as a starting point. Ask which number they think is the worst and why.
Day 5: Commit to reviewing the four numbers in the weekly engineering review for the next month. Do not promise any specific improvement yet.
Weeks 2–4: Watch, do not intervene. Let the team see the numbers and start proposing fixes.

By week four you will know whether the team is ready to improve one specific metric. That is when the real work starts.

Where I come in

Installing DORA is one of the most common things I do in the first month of a fractional CTO engagement. It is low-cost, high-signal, and it almost always reveals a specific bottleneck that founders have been guessing at for months. Once the numbers are on the wall, the conversation about "are we shipping fast enough" becomes a conversation about specific, fixable things.

If you are a founder who cannot quite tell whether your engineering team is performing, DORA is the cheapest diagnosis you can run. Book a 30-minute call and we can talk about what your numbers probably look like today and what I would do first. No dashboard required.