CodeNSM
The Problem · Part 23

The firefighting ratio: the most important number your team has never measured

2026-05-31· 7 min read· by Think North

Imagine a city that runs a fire department but keeps no records. Not "sloppy records" — no records. Nobody knows how many calls came in last month, which neighborhoods they came from, or whether the same warehouse has caught fire six times. The fire chief, asked how things are going, says "busy!" and the city council nods and approves next year's budget, which is the same as last year's budget plus a little more, forever.

You would not tolerate this from a fire department. You are almost certainly tolerating it from your engineering team right now.

The ratio

Here is the question, and I want you to actually try to answer it for your own team before reading on: of all the engineering work you shipped last month, what share was chasing fragile code — fixes, hotfixes, workarounds, "small tweaks" to the thing that broke again — and what share was building new value?

Call it the firefighting ratio. It is, arguably, the purest measurement of Cunningham's interest that exists: technical-debt interest, when it comes due, is paid in exactly this currency — engineer-hours diverted from building to repairing. And here is the strange part: almost no team has ever measured it. Teams that A/B test button colors, that can tell you their conversion funnel to two decimal places, that argue about story-point velocity with religious intensity, have no number for what fraction of their most expensive resource is spent on repair.

The industry-wide estimates that do exist are grim. Stripe's Developer Coefficient report surveyed thousands of developers and put the share of the average developer's week spent on maintenance, debugging and bad code above 40% — a figure the report translated, at global scale, into hundreds of billions of dollars of lost capacity annually. Take the precise number with whatever salt you like; the direction is not controversial to anyone who has watched a sprint board. And in the CodeNSM fleet telemetry, when commit streams are joined to the runtime record, a pattern shows up over and over: repair work is not spread evenly across the codebase. It clusters — hard — on a small set of functions that are both fragile and load-bearing, the same intersection this series keeps returning to. Most of the fire department's calls come from the same few buildings. Nobody at the city council knows which ones.

Why nobody measures it

Three reasons, in ascending order of embarrassment.

  1. It's genuinely awkward to compute by hand. Classifying commits as "repair" versus "build" requires reading them, and nobody wants that job. (Adam Tornhill's Your Code as a Crime Scene showed a decade ago that version-control mining can do a great deal of this automatically — where changes cluster, which files churn together, where fixes concentrate. The techniques exist. Adoption, mostly, doesn't.)
  2. The available metrics point the other way. Velocity, commit counts and tickets-closed all count firefighting as work — which it is! It's just work that a healthier codebase wouldn't have generated. A team can post spectacular activity numbers while spending most of its calories on repair, and the dashboard will report a triumph. DORA's research program spent a decade establishing that delivery metrics distinguish elite teams from struggling ones; but even a beautiful deployment pipeline just means you can ship your hotfixes very efficiently.
  3. Nobody's incentivized to know. The engineers suspect the ratio is bad and don't want it weaponized. The managers suspect it's bad and don't want it escalated. The executives don't suspect anything, because the reporting channel — see reason 2 — is structurally incapable of carrying the signal. Everyone has quietly agreed not to look, the way a family agrees not to discuss the uncle.

What the ratio tells you that nothing else does

The firefighting ratio is a health metric wearing a finance costume, and it answers questions that no activity metric can touch. A rising ratio with flat headcount means your debt interest is compounding faster than your capacity — the mathematical definition of "it gets worse from here." A ratio concentrated on one module tells you exactly where the next refactoring dollar earns the highest return (and per part 22, it is usually not the ugliest module — it's the busiest fragile one). A ratio concentrated on one person tells you your best firefighter has quietly become a single point of failure, spending the judgment you hired them for on rescue work instead of design.

And the AI-era twist, because this series is about the AI era: generation-speed development moves the ratio in both directions at once. New code gets cheaper, so the "build" denominator inflates — and fragile code gets shipped faster and reviewed thinner (parts 1 through 10 of this series, passim), so the "repair" numerator follows a few months later, on a delay. The delay is the killer. The quarter where AI assistance made you feel fastest and the quarter where the firefighting ratio spikes are not the same quarter, so nobody connects them without the number in hand.

Velocity measures how fast you're moving. The firefighting ratio measures what fraction of that movement is running back to fetch water. Teams track the first obsessively. Almost none track the second at all.

There's also a timing argument for measuring it, and it's the difference between a smoke detector and a fire report. Incident counts — the metric most orgs do track — are a lagging indicator: by the time something burns visibly enough to earn a postmortem, the underlying fragility has been billing you for quarters. The firefighting ratio moves earlier. A module starts consuming a growing share of repair commits long before it produces the incident that makes it famous, because engineers patch and re-patch quietly, at the level of "small fixes," safely under the postmortem threshold. Watch the ratio per module and you are effectively watching the smoke. Watch incidents and you're reading last year's fire report — beautifully formatted, thoroughly blameless, and useless for prevention.

Measure it crudely this week

You don't need tooling to get a first estimate; you need an hour and some honesty. Pull last month's merged PRs. Sort each one into build (new capability), repair (fixing or working around existing behavior), or toil (upgrades, migrations, chores). Count. The classification will be fuzzy at the edges and the number will still be the most informative thing your team learns this quarter. Then — this is the part that turns a statistic into a strategy — take the repair pile and ask which functions the fixes touched. If the same names keep appearing, congratulations: you've found the warehouse that keeps catching fire, and you found it with a spreadsheet. (Expect honest debate about edge cases — is a performance fix repair or build? Pick a rule, write it down, and apply it consistently; the trend only needs the rule to be stable, not perfect.)

The continuous version of that exercise is, naturally, an instrument rather than an afternoon (CodeNSM computes it by joining commit history to per-function runtime health, so the ratio and its addresses update themselves). But instrument or spreadsheet, the point stands: this is a knowable number, it is probably the largest unexamined line item in your engineering budget, and every month it goes unmeasured is a month the interest payments get relabeled as "work."

Quick self-assessment below. Be honest — the checklist can't see you.

References

  1. Stripe (2018). The Developer Coefficient.
  2. Tornhill, A. (2024). Your Code as a Crime Scene, 2nd ed.
  3. Google Cloud — DORA research program.
  4. Cunningham, W. (1992). The WyCash Portfolio Management System. OOPSLA '92.

See your own codebase as an office.

One pip install and every function reports for duty — archetype, live state, debt tier, and a single Code-Health North-Star. Free plan, no card.

Read next