← All posts
SaaS Architecture, Scaling & Cost

Your Cloud Bill Is a Strategy Document: A Fractional CTO's Cost Audit Checklist

Your cloud bill is telling you what your engineering team actually values. Here's how to read it, the specific waste patterns I find in almost every audit, and how to cut 20-40% without touching user-facing functionality.

Craig Hoffmeyer8 min read

Most founders treat their cloud bill as a bill. I treat it as a strategy document. The line items tell you exactly what your engineering team has decided matters — often without anyone explicitly making the decision. A careful read of a cloud bill will reveal architectural assumptions, forgotten experiments, vendor lock-ins, and dead code you did not know was still running.

It will also, almost always, reveal 20–40% of waste you can cut without touching anything your customers can see. I have audited dozens of client bills as a fractional CTO and the pattern is remarkably consistent.

This article is the audit checklist I run. It is deliberately unglamorous and deliberately specific. If you follow it, you will find real money.

The first principle: read the bill line by line

Most teams look at the cloud bill as a single number. They might break it down by service (compute, storage, egress). That is not enough. To find waste, you need to read the full itemized bill — the kind you can export as a CSV from AWS Cost Explorer, GCP Billing, or your provider's equivalent.

Schedule 90 minutes, get the CSV, and read every line. It sounds tedious. It is tedious. It is also one of the highest-hourly-rate activities you can do as a founder or fractional CTO. I have found individual $2,000-per-month line items nobody on the team remembered creating.

The seven waste patterns I find almost every time

These are the specific things I look for, in the order I look for them. If you do this on your own bill, you will find at least three.

1. Idle compute

Instances running 24/7 that are not serving any traffic. Background workers that were spun up for a project that shipped months ago. Staging environments nobody uses anymore. Personal experiment boxes someone forgot about.

How to find them: sort your compute line items by cost, then cross-reference with actual CPU and memory utilization. Anything below 5% sustained utilization is a candidate for deletion or downsizing.

Typical finding: 10–20% of compute spend on instances that should not exist.

2. Over-provisioned instances

Running a large instance type for a workload that would fit comfortably on something half the size or smaller. This happens when teams pick an instance size once and never revisit.

How to find them: look at CPU and memory graphs over a 30-day window. If peak CPU is below 30% and peak memory is below 50%, you are over-provisioned. Cut one size smaller and re-check.

Typical finding: 15–30% savings on compute just from right-sizing.

3. Data transfer (egress) costs

Egress is the cloud providers' revenue secret. Data leaving the cloud costs real money, and the cost compounds in ways founders do not expect. Common culprits:

  • Serving user-generated content (images, video) directly from object storage instead of a CDN.
  • Cross-region traffic because a service in one region is calling a database in another.
  • Logs and metrics streaming to a vendor in a different cloud.
  • CI/CD pipelines that pull large artifacts from S3 on every run.

How to find them: look for the egress line items. They are usually buried but substantial. Any significant egress line should be matched to a specific intentional decision — if you cannot explain it, there is waste.

Typical finding: 5–15% savings from putting a CDN in front of static assets and colocating cross-region traffic.

4. Storage that nobody is using

Orphaned EBS volumes from deleted instances. Snapshots from years ago. S3 buckets full of test data. Old database backups kept indefinitely. Logs retained for 10 years when 30 days would be enough.

How to find them: the storage dashboard will show you buckets and volumes sorted by size. Start at the top. For each, ask: what is this, who uses it, and what is the retention policy?

Typical finding: 10–20% of storage costs on stuff that can be deleted immediately with no consequence.

5. Observability overspend

Logs, metrics, APM, error tracking. These have a strange pricing model — usually volume-based — and they are famous for getting out of control. A noisy debug log that is shipped to a vendor at $0.50 per gigabyte can cost more in observability fees than the service it is monitoring.

How to find them: audit what is being logged and at what level. Kill debug-level logging in production for anything not actively being debugged. Sample high-volume logs. Set retention to 7 days for everything that is not actually useful past 7 days.

Typical finding: 30–60% savings on observability by turning off what nobody is reading.

6. Redundant or overlapping vendors

Teams accumulate vendors over time. Today you have a log vendor, a metrics vendor, an APM vendor, an error tracking vendor, a session replay vendor, an uptime monitoring vendor, and a synthetic monitoring vendor. Each one alone looks reasonable. Together they are a small fortune, and three of them overlap.

How to find them: list every SaaS vendor you pay for that touches observability, infrastructure, or engineering. Ask what unique capability each provides. Consolidate ruthlessly.

Typical finding: 20–40% savings by killing overlapping tools.

7. AI/LLM API spend

I wrote an entire separate article on this. The short version: LLM API spend grows super-linearly if nobody is watching it, and it is often the single fastest-growing line item on a cloud bill in 2026. Audit separately. Apply the seven-step optimization pass from the linked article.

Typical finding: 40–70% savings on LLM spend, usually without any quality degradation.

The audit checklist

If you want to run the audit yourself in a single afternoon, here is the sequence:

Hour 1: Export and read the bill. Get the itemized CSV. Read every line. Mark anything you do not immediately recognize with a question mark.

Hour 2: Right-size compute. Pull utilization graphs for every instance. Anything under-utilized gets a note to downsize or kill.

Hour 3: Hunt orphans. Storage, snapshots, load balancers, databases, environments, IP addresses. Anything that exists but is not serving a purpose.

Hour 4: Vendor audit. List every SaaS vendor that touches engineering. Challenge each one. Consolidate.

Hour 5: Observability review. Log levels, retention, volume. Kill the noise.

Hour 6: LLM/AI audit. If applicable, run through the LLM cost optimization sequence.

Final hour: Write a report. One page. What you found, what you are going to change, and what you expect the new monthly bill to be. Share with your team and the founder.

In a typical single-day audit, I find 20–40% of total cloud spend in waste, and most of the identified waste can be cut within the next two weeks with low risk.

The non-obvious strategic reads

Beyond the waste, your cloud bill will tell you things about your engineering strategy that nobody would say out loud.

What the team actually values. The services you spend the most on are the ones you have invested in. If 60% of your bill is observability and 5% is compute, you have an over-instrumented, under-scaled product. That might be right. It might not.

Where the lock-in is. Each vendor's share of the bill tells you how hard it would be to leave them. A vendor at 30% of your infrastructure spend is a vendor you cannot realistically replace in a quarter. Know which ones those are.

What the dead experiments were. Line items for services that are not on your current architecture diagram are archaeological evidence of abandoned projects. They tell you something about the team's discipline and how you manage scope.

What is about to become a problem. Fast-growing line items, even if small today, predict your future bill. A line item that doubled in the last month is worth investigating even if it is only $300 today — next quarter it will be $2,400.

Making it a habit

One-time audits find waste. Waste grows back. The discipline I install at client engagements is a monthly 30-minute cost review inside the Wednesday engineering review. Someone pulls up the bill, walks through changes since last month, and flags anything unusual.

This single 30-minute habit prevents about 80% of the cost creep I see at companies that do not have it. It also trains the engineers to think about cost as part of architecture, which is more valuable than any one-time audit.

Counterpoint: do not over-rotate on cost

A caveat. Seed-stage startups can waste more time optimizing cloud costs than they can possibly save. If your total cloud bill is $800 per month, spending two weeks to save $200 per month is a terrible use of engineering time. The audit pays off when the bill is big enough to matter — my rule of thumb is that it is worth doing seriously once cloud spend crosses $5,000 per month, and is worth making a monthly habit once it crosses $15,000.

Below those thresholds, ship more features.

Your next step

This week, do one thing: export your cloud bill as a CSV and read every line. Do not optimize anything yet. Just read it. You will find things. Make a list. Schedule a focused afternoon to run the checklist above and act on the list.

Where I come in

A line-by-line cloud cost audit is one of my favorite first-month deliverables. It almost always finds real money, it builds instant credibility with the founder and the CFO, and it sets up a monthly habit that protects the savings. Book a call if your cloud bill has been climbing and you want an outside read on whether the growth is healthy or quietly corrupt.


Related reading: The Hidden Cost Curve of LLM Features · The Seed-Stage Stack · The ROI of a Fractional CTO

Want an honest audit of your cloud spend? Book a call.

Get in touch →