Ten thousand lines a day
Let's do some arithmetic that nobody at your company has done on purpose.
A careful engineer can deeply read — actually understand, not eyeball — maybe a few hundred lines of unfamiliar code in a day. Call it 500 on a generous day with good coffee and no meetings, which is to say, call it 500 on a day that doesn't exist.
Now count the other side. A mid-size team running agentic tooling can comfortably add thousands of net new lines a day — ten thousand on a big day, and the big days are getting more frequent. GitClear's year-over-year analysis of hundreds of millions of changed lines documents the wave: code volume climbing steeply with assistant adoption, and climbing fastest in exactly the categories (added and copy-pasted lines, not moved-and-consolidated lines) that create new surface to understand rather than compressing old surface.
So: comprehension grows linearly with human reading hours, which are flat. Code grows with machine output, which is compounding. Two curves. One flat, one exponential.
You already know what happens to the space between two curves like that.
The gap has a name, and it's the right name
Call the space between the curves comprehension debt: the ever-growing portion of your codebase that exists, runs, and carries your revenue, but that no living person currently understands.
Why "debt"? Because it behaves exactly like Cunningham's original financial metaphor, with one upgrade: the borrowing is now involuntary. Every line that ships un-understood is a small loan against the future — and the interest gets charged, with total reliability, at the worst possible moments: the incident where nobody can say what the failing module assumes; the migration where nobody knows which behaviours are load-bearing; the security review where nobody can answer "what does this actually do with the token?"
The intellectual anchor here is fifty years old and has never been more current. Peter Naur's "Programming as Theory Building" (1985) argued that the program is not the source text — the program is the theory of it held in human minds, and the text is merely the theory's residue. A team that holds the theory can modify the system swiftly and safely; a team that holds only the text is reduced to guesswork and superstition, no matter how good the text is. Naur's nightmare scenario was a team dissolving and the theory dying with it. Our era found a more efficient route: we now generate the text without the theory ever existing. There is nothing to lose because nothing was built. The residue arrives fresh, pre-orphaned.
And before AI, comprehension was already the dominant cost of software. Xia and colleagues' large-scale field study of professional developers — published in IEEE Transactions on Software Engineering — measured developers spending on the order of 58% of their time on program comprehension alone. Reading, tracing, reconstructing intent. That was the price of understanding human-written code, produced at human speed, by authors you could interview at the coffee machine. Now remove the authors and multiply the volume.
Why this is the actual bottleneck (and not a philosophical one)
"So what if nobody understands it? It works." Sure — until you need to do literally anything to it. Here's the thing about comprehension: it's upstream of every other engineering activity you care about. To change code safely you must understand it. To debug it, review it, secure it, optimize it, delete it — understand it, understand it, understand it. Comprehension isn't one input among many; it's the substrate. Ousterhout's A Philosophy of Software Design puts the same point structurally: complexity is anything that makes a system hard to understand and modify, and it accumulates unless actively fought — his whole book is about spending design effort to keep systems comprehensible because comprehensibility is capacity.
Which means the gap between the curves isn't an abstract shame. It's a hard ceiling on your organization's ability to act. Every function inside the gap is a function you can't confidently change, can only cargo-cult around, must treat as a haunted room in your own house. As the gap grows, an increasing share of engineering work becomes negotiating with the unknown: defensive copies, wrapper layers, "let's not touch that module," regenerating whole components because modifying them requires understanding nobody has. (Regeneration, note, adds fresh un-understood code. The debt refinances itself. At a worse rate.)
You can even watch the ceiling descend in your own planning meetings, if you know the tell. It sounds like this: an estimate that should be two days comes back as two weeks, and when you ask why, the answer isn't about the work — it's about the archaeology before the work. "We'd need to figure out what the export pipeline actually does first." That sentence is the gap, invoiced. Multiply it by every ticket that grazes an un-understood region, and you have a tax on your entire roadmap that appears on no budget line anywhere.
THE MODEL WRITES FASTER THAN YOUR TEAM CAN READ. That's the whole crisis, one sentence, no adjectives needed.
The comforting lie and the useful truth
The comforting lie is that AI will solve the comprehension problem it created — just ask the model what the code does! And genuinely, models are useful readers; a good summary beats no summary. But a summary is text about the text. It is not the theory. Naur was specific about this: the theory is the capacity to answer the unasked questions — why this shape and not another, what breaks if you push here, which parts are load-bearing — and that capacity can't be handed over as prose; a model's explanation of un-understood code is a plausible reconstruction by a reader with no privileged access to intent, delivered (see Part 6) in a voice that sounds identical whether it's right or guessing.
The useful truth is that you can't close the gap, but you can do something almost as good: find out where it matters. Because here's what the two-curves picture hides — the gap is not uniformly dangerous. Most un-understood code is cold: rarely called, low blast radius, safely haunted. A small fraction is hot: carrying real traffic, sitting on the request path, failing in ways that cost money. Production traffic is savagely concentrated in a minority of functions (in CodeNSM's own fleet telemetry the concentration is strong enough that we pre-registered it as a formal research hypothesis), which means the comprehension debt that can actually hurt you is a small, findable subset — if you can see which functions are doing the work. That's the triage runtime telemetry makes possible: not "understand everything" (impossible, forever) but "know exactly which un-understood functions are load-bearing, and spend your scarce human reading hours there."
Flat curve, meet prioritization. It's the only move the geometry allows.
The question that finds your gap
One diagnostic, free of charge. Ask your team: "Name the ten functions where an unexplained failure would hurt us most — and for each, name the person who could explain it today." Not who wrote it. Who could explain it, now, at a whiteboard.
The first few answers will come fast. Somewhere around number five, the room gets quiet, and someone says the most 2026 sentence there is:
"I mean… the code exists. Somebody could figure it out."
Somebody. Could. That's the gap talking. It's polite now, while everything works.
It won't stay polite.
References
- Naur, P. (1985). Programming as Theory Building.
- Xia, X. et al. (2018). Measuring Program Comprehension: A Large-Scale Field Study with Professionals. IEEE TSE.
- GitClear — AI Copilot Code Quality: 2025 research on churn, duplication and moved code.
- Ousterhout, J. (2018). A Philosophy of Software Design.
- Cunningham, W. (1992). The WyCash Portfolio Management System. OOPSLA '92.