CodeNSM
The Problem · Part 22

Interest compounds in prod: why your ugliest code might be free

2026-05-29· 7 min read· by Think North

Thought experiment. Two functions in your codebase, and I mean identical — same tangled logic, same 200 lines that should be 40, same error handling that catches Exception and whispers it into the void. Copy-paste twins. Your static analyzer gives them the same complexity score, your linter files the same seventeen complaints about each, and any engineer who opens either one makes the same face.

Function A lives in an admin export tool that one person runs each January to produce a spreadsheet for the accountants.

Function B sits on your checkout path and runs forty thousand times a day.

Question: how much technical debt do you have?

If your answer involves adding up linter findings, you just gave both functions the same weight, and you are doing debt accounting the way a bank would if it treated every loan as identical regardless of interest rate. Function A is a liability on paper that costs you almost nothing in practice — a mess, yes, but a mess in a locked room. Function B is billing you per execution: every swallowed error is a real customer's failed checkout, every unnecessary millisecond is multiplied forty thousand times, and every on-call page it generates is an engineer-afternoon you paid for twice (once to write the code, once to survive it).

Debt that sleeps vs. debt that hurts

Go back to the source. Cunningham's original 1992 formulation was more precise than the way it gets paraphrased: the danger, he wrote, is the interest — every minute spent on not-quite-right code. Not the principal. The principal is just text sitting in a repository, and text sitting in a repository costs approximately nothing per day. The interest is charged at the moment of interaction: when the code runs, when the code fails, when a human has to touch it.

Which means debt has a property that almost no debt-measurement tool respects: it can sleep. Fragility on a dormant function accrues no interest. It's a bear trap in a forest nobody walks through. The same fragility on a load-bearing path is a bear trap in your front hallway, and the difference between those two situations is not visible in the code, because it isn't a property of the code. It's a property of the code's relationship with production traffic — a relationship your linter has never once been introduced to.

debt that sleeps (dormant path) debt that hurts (hot path) time in production → cumulative interest paid

The research record backs the intuition with unpleasant numbers. Tornhill and Borg's Code Red study — one of the few pieces of debt research done on real commercial codebases with outcome data — found that code in the lowest quality band carried substantially more defects and dramatically slower, less predictable change times than healthy code. But read their method carefully and you'll notice the quiet radicalism: they focused on hotspots, code that is both complex and frequently worked on. Low-quality code that nobody touches barely features in the damage statistics. The damage lives at the intersection. It always did.

Kruchten, Nord and Ozkaya made the same point from the management side a decade ago: the value of paying down a given piece of debt depends entirely on what the system is going to do next — which parts will be exercised, extended, leaned on. Debt repayment is a portfolio decision. And you cannot make portfolio decisions from a report that lists every holding without a single price on it.

The linter-count fallacy

Here's the uncomfortable audit. Most teams that "track technical debt" track one of these: a static-analysis score, a count of code smells, a self-reported backlog of things engineers wish were nicer, or a vibe. Every one of these treats the codebase as a document — a thing to be read and graded like an essay. None of them contains the one variable that determines whether a given unit of ugliness costs you $0 or $40,000 a quarter: runtime load.

The result is a systematic misallocation that plays out in refactoring sprints everywhere. The team fixes the ugliest code (satisfying! visible in the diff!), which — in the CodeNSM fleet telemetry, where production call load is heavily Pareto-concentrated on a small minority of functions — is quite often code that runs rarely or never. Meanwhile the moderately-ugly function carrying a double-digit percentage of all production traffic gets left alone, because it's only moderately ugly, and the essay-grading instruments rank it below the horrors in the locked room. You paid down the loan with the 0% interest rate and kept the one at 30%. Your accountant would like a word.

A linter counts your debts. Production sets their interest rates. Only one of those numbers should decide what you fix on Monday.

The three-question debt audit

You can get most of the way to honest debt accounting with three questions per candidate, asked in order:

  1. Does it run? If the function is dormant, its debt is deferred — possibly forever. (It's not free: dormant code still bills maintenance, migrations and cognitive load, which is why part 27 of this series will tell you to delete it. But it isn't compounding.)
  2. How hard does it run? Call volume is the multiplier on every flaw. A 0.5% error rate is a rounding error at ten calls a day and a standing incident at a hundred thousand.
  3. What happens when it fails? Position matters. The same failure rate means different things on a retry-wrapped background job and on the payment capture path, three frames from revenue.

Ugliness — the thing every existing tool measures — enters the calculation only after all three, as a modifier on how likely the flaw is to fire and how expensive it will be to touch. This is not a subtle reordering. It routinely inverts the entire priority list.

And yes: answering questions 1 through 3 requires knowing, per function, what production is actually doing — which is exactly the instrument most teams don't have and the reason the linter count became the default. (For the record, this weighting is how CodeNSM computes its debt tiers: fragility signals multiplied against observed load and position, so a quarantined horror ranks below a mildly stressed function on the checkout path. The arithmetic is deterministic and boring. The reordering it produces usually isn't.)

Compounding is a schedule, not a metaphor

One last turn of the screw. Interest on hot-path debt doesn't just accrue — it compounds, for a concrete mechanical reason: fragile load-bearing code repels improvement. Engineers learn to fear it, so changes get bolted around it rather than into it, so the mess grows an accretion disk of workarounds, each of which is new principal at the same terrible interest rate. Every month you don't pay, the payment gets larger and the number of people willing to attempt it gets smaller. Meanwhile the dormant horror in the admin tool just… sits there. Identical code. One of them is a time bomb with a payment schedule; the other is modern art.

So before your next refactoring sprint, ask for one number next to every item on the debt backlog: how many times did this code run in production last month? If nobody can produce that column, that's not a gap in the spreadsheet. That's the finding.

References

  1. Cunningham, W. (1992). The WyCash Portfolio Management System. OOPSLA '92.
  2. Tornhill, A. & Borg, M. (2022). Code Red: The Business Impact of Code Quality.
  3. Kruchten, P., Nord, R. & Ozkaya, I. (2012). Technical Debt: From Metaphor to Theory and Practice. IEEE Software.
  4. Fowler, M. (2009). Technical Debt Quadrant.

See your own codebase as an office.

One pip install and every function reports for duty — archetype, live state, debt tier, and a single Code-Health North-Star. Free plan, no card.

Read next