The Problem · Part 26

Hiring a tech lead vs. renting judgment

2026-06-04· 7 min read· by Think North

Here's a question that sounds rude but is actually just accounting: when you pay a great tech lead their very great salary, what exactly are you buying?

Not code volume — the juniors and the models out-produce them ten to one, and everybody knows it. Not meetings, though heaven knows you get those. Watch a genuinely good tech lead for a week with a stopwatch and a notepad, and what you'll find is that the scarce, expensive thing they do all day is make roughly two hundred small decisions of a single type: where to spend attention.

An anatomy of the two hundred decisions

Their morning looks like nothing. Coffee, scroll, coffee. But run the tape slowly:

Forty-one PRs await review. They read three of them line-by-line, skim nine, and approve the rest on trust. Which three? The ones touching the payment path, the auth gate, and the scheduler — because they know, without consulting anything, which code is load-bearing and which is decorative.
An error rate ticked from 0.3% to 0.9% on some internal endpoint. Two hundred alerts fired last night; this one wasn't among them. They notice it anyway, because that particular function sits upstream of invoicing and 0.9% is how last year's incident started.
A junior proposes refactoring a gnarly module. The lead says no — not because the module is fine (it's horrible) but because it's horrible and dormant, and the junior's month is worth more elsewhere.
A diff lands in the vendor-integration layer and the lead reads it twice, because that desk is where the outside world's failures come in wearing your company's uniform.

None of these decisions is visible in any metric your company collects. Each is an act of triage — allocating a fixed budget of expert attention across a codebase that exceeds it by orders of magnitude. And the research on code review says triage is precisely where review's value lives: Bacchelli and Bird's classic Microsoft study found that review outcomes depend overwhelmingly on the reviewer's understanding of the context — reviewers with deep knowledge of the code find the real defects; everyone else finds typos. The lead's map of what matters isn't an input to their judgment. It is the judgment.

Why the layer is scarce

Now ask where that map came from, and you hit Naur again — because in this series, you always hit Naur. "Programming as Theory Building": the map is a theory, built the only way theories of running systems get built — by being present while the system ran. The lead knows the invoice function is touchy because they were on call the night it wasn't. They know which vendor lies about timeouts because they personally caught it lying. The map is scar tissue, and scar tissue has brutal economics: it takes years to form, it cannot be transferred by documentation (people have tried; the wiki is where maps go to die), it walks out the door in one two-week notice, and — the AI-era twist this series keeps circling — it has stopped forming in the next generation, because the juniors are prompting instead of getting burned. You are not just employing a scarce resource. You are employing a scarce resource whose replacement pipeline has quietly shut down.

So companies rent. Fractional CTOs, advisory tech leads, the consultant who parachutes in for two days a quarter and renders judgment. And renting can work for decisions — architecture calls, hiring bars, vendor picks. But notice what the rented expert is missing: the two hundred daily decisions were powered by continuous observation, and the consultant gets a snapshot. Asking a fractional expert "what's fragile here?" is asking someone to diagnose a patient from a single photograph. A good one will read the code and the commit history and get somewhere real. What they cannot do — what no human parachuting in can do — is know how the system has been behaving: what runs hot, what drifts, what quietly stopped being called in March.

A tech lead's judgment is a decision layer sitting on top of an observation layer. The judgment is irreplaceably human. The observation layer is just… instrumentation nobody built.

The decomposition nobody does

That sentence is the whole post, so let's stress-test it. Go back through the four decisions in the anatomy and split each into its observation and its judgment:

Which three PRs to read deeply? Observation: which functions carry production load. Judgment: how carefully to read a diff that touches them. The observation is a traffic table. It's mechanical.
Which sub-alarm anomaly matters? Observation: error rate drift against each function's own baseline, weighted by what sits downstream. Mechanical again — tedious for a human, trivial for an instrument that never sleeps and watches every function instead of the famous ones.
Which horrible module to leave alone? Observation: dormancy. A counter.
Which layer deserves double reading? Observation: this function's job is talking to third parties, and that job class fails differently. That's a classification — the kind a rule engine can make deterministically, the same way every time.

Every one of the two hundred decisions decomposes the same way: a mechanical observation a human is currently making expensively, incompletely and from memory, feeding a judgment only a human should make. The tragedy of the scarce tech lead isn't that their judgment can't scale — it's that we've been spending their judgment-hours on the observation half, making our most expensive layer do the work of a sensor. (This decomposition is, not coincidentally, CodeNSM's design brief: classify every function's job, watch every function's behavior against its own baseline, and hand the resulting worry-list to the human — so the lead reviews exceptions instead of hunting for them. The instrument doesn't replace the lead. It replaces the lead's insomnia.)

What to do with this, by size

If you have a great tech lead: instrument the observation layer before they burn out or leave, because right now your company's situational awareness has a bus factor of one, and Part 25's CEO is reading their condition through that single skull. If you're hiring one: notice that you're really hiring two things, and the observational half of the role can be online before their first day — which shortens the years-of-scar-tissue ramp to something closer to months, because the new lead inherits a map instead of having to get burned into one. And if you can't afford one — the position most AI-accelerated small teams are actually in, shipping senior-scale volume with no senior — then understand clearly what you're missing. It isn't typing. It's the two hundred decisions. Renting a human for two days a quarter buys you a photograph. The layer underneath the decisions, at least, can now be a live feed.

And a coda on the apprenticeship pipeline, because instrumenting the observation layer turns out to matter for it too. The old way juniors became leads was osmotic: years of sitting near the map-holder, absorbing which functions deserve fear, one overheard war story at a time. That channel is dying with the rest (Part 3 of this series). But a junior who spends a year working from an explicit, living map — this is load-bearing, this is drifting, this quietly went dormant in March — is being taught the lead's attention-allocation habits by the instrument itself, one worry-list at a time. It isn't scar tissue. It's the closest thing to scar tissue that doesn't require the burns.

The judgment stays scarce, human, and worth every dollar. The seeing never had to be.

Hiring a tech lead vs. renting judgment

An anatomy of the two hundred decisions

Why the layer is scarce

The decomposition nobody does

What to do with this, by size

References

See your own codebase as an office.

Read next

Nobody wrote this code

The prompt lottery

Graduating without scars