CodeNSM
The Problem · Part 25

The non-technical CEO's survival guide to an engineering org you can't audit

2026-06-02· 7 min read· by Think North

Let's set the scene precisely, because if you're a non-technical CEO of a software company, this is your actual job description and nobody ever reads it out loud:

You own a restaurant. You are contractually forbidden from entering the kitchen. Your entire knowledge of the kitchen arrives via the kitchen staff, who are decent, hard-working people — who also happen to be the ones being evaluated by what they tell you. The health inspector is your customers. You find out about problems when they do.

Most CEOs in this position develop one of two coping styles. Style A is faith: hire someone impressive and stop asking. Style B is theater: demand dashboards, get dashboards, mistake dashboards for knowledge. Both styles fail the same way — slowly, then suddenly, usually in a quarter when something load-bearing breaks and the post-incident conversation reveals that everyone in engineering had known about it for a year.

There is a Style C. It doesn't require you to learn to code (please don't learn to code for this; that's like learning to butcher because you own a restaurant). It requires you to know which questions to ask, which answers should worry you, and — the part almost every executive gets wrong — the difference between activity metrics and health metrics.

The weekly five

Ask these every week, in some form. The point is not the answers on any given week; it's watching whether the answers exist and how they move.

  1. "What's the most fragile thing we run, and how much traffic does it carry?" A healthy org answers with a name and a number ("the invoice generator; it's on every checkout"). A worrying answer is a shrug, a subject change, or — the classic — "it's complicated." Fragility they can't locate is fragility nobody is watching.
  2. "What share of this month's engineering went to repair versus new value?" This is the firefighting ratio from Part 23, and it is the closest thing to a debt-interest line item your P&L will ever get. Worrying answer: any answer with no number in it. Stripe's Developer Coefficient research put the industry's average share of developer time lost to maintenance and bad code above 40% — if your org's number is unknown, assume you're average and act accordingly.
  3. "What breaks if [most senior engineer] disappears for a month?" Peter Naur argued that the real program is the theory in your builders' heads. This question asks how much of your company is stored in skulls. A worrying answer is a specific name said with a nervous laugh. A terrifying answer is your own CTO's name, said by your CTO.
  4. "What did we ship in the last quarter that nothing in production actually calls?" Dormant code is payroll without output — you paid to build it and you pay again every time it's migrated, patched and worked around. Worrying answer: the concept is unfamiliar. Nobody who can't measure dormancy can measure utilization, and utilization is what you think you're buying when you approve headcount.
  5. "How do we know the AI-written code is holding up — as a cohort?" Not "do we review it" (everyone says yes; Part 4 of this series was about why that's mostly theater). The question is whether anyone tracks machine-drafted code's behavior in production versus everything else's. GitClear's longitudinal data has tracked rising churn and duplication as assistant adoption climbed; the 2024 DORA report found AI adoption associated with small drops in delivery throughput and stability even as satisfaction rose. Your org is running this experiment either way. Worrying answer: you're running it without a control group.

Activity is not health

Now the conceptual upgrade that makes the whole exercise work. Nearly everything engineering reports upward — commits, story points, velocity, tickets closed, even deployment frequency — is an activity metric. Activity metrics answer "are people working?" They cannot answer "is the asset okay?", because they measure the production of code, not the condition of it. The SPACE framework authors — Forsgren, Storey and colleagues, writing after a decade of DORA research — were blunt about this: no single activity measure captures productivity, and raw activity counts are the most seductive and least meaningful of all. A team heroically firefighting a rotten codebase posts spectacular activity numbers. So does a team confidently generating unreviewable debt at machine speed. The dashboard cannot tell those teams from a healthy one. That's not a flaw in your dashboard. It's a category error in what's being measured.

You would fire a CFO who reported "we did lots of accounting this month" instead of a balance sheet. Velocity is "we did lots of engineering this month." Demand the balance sheet.

Health metrics are properties of the asset: error rates on revenue paths and their trend. Latency drift against baseline. The dormant share. The firefighting ratio. The fragile-and-load-bearing register, ranked by traffic. None of these require interpreting anyone's effort or intent, which is exactly why they can be reported to a non-engineer without translation loss — a number that comes from production behavior has no opinion about who deserves a good quarter. (Making those health metrics exist per-function, continuously, without asking your team to hand-compile them is the instrument-shaped problem CodeNSM was built for; but the questions above work with or without it, and the worrying answers are worrying either way.)

What Style C sounds like

The goal isn't surveillance, and if it becomes surveillance you've failed — your best engineers have been trying to tell you about the fragile stuff for years through a channel that had no numbers in it. The goal is to change the weekly conversation from assertion to allocation. "The codebase is in decent shape" becomes "the health number is 71, down 3, and the drop is two functions on the billing path — here's the plan." The six-week estimate that used to feel like haggling becomes "four of those weeks are the fragile module we flagged in March; pay the interest or refinance." Your engineers get to point at the same instrument you're reading, which protects them exactly as much as it informs you.

A note about your CTO, before you forward this

Everything above could be misread as a manual for auditing your CTO into a corner, so let's be precise: your CTO is not the problem. They are flying the same instrument-free cockpit you are, one seat forward, and most of them have been quietly uncomfortable about it for years — they simply learned early that reporting unmeasurable dread upward is a career-limiting move, so they translated it into the reassuring dialect every executive eventually becomes fluent in. When you start asking the weekly five, frame them as shared instrumentation, not interrogation: "I want us both looking at the same gauges" lands very differently from "prove it." Watch the reaction, though, because it's diagnostic. A CTO who lights up — who has been WAITING for someone to ask for the fragility register — is a keeper. A CTO who treats the questions themselves as an insult is telling you that the current opacity is load-bearing for someone, and you should wonder for whom. Either way you learn something no org chart, and no exit interview, would ever have told you.

Run the checklist below in your next leadership meeting. Out loud. Watch the room.

References

  1. Forsgren, N., Storey, M-A. et al. (2021). The SPACE of Developer Productivity. ACM Queue.
  2. Stripe (2018). The Developer Coefficient.
  3. Naur, P. (1985). Programming as Theory Building.
  4. Google Cloud (2024). DORA Accelerate State of DevOps Report.
  5. GitClear — Coding on Copilot: AI's downward pressure on code quality.

See your own codebase as an office.

One pip install and every function reports for duty — archetype, live state, debt tier, and a single Code-Health North-Star. Free plan, no card.

Read next