Three Kinds of Teacher
Three systems, all routinely called “AI”:
- A spam filter that catches a phishing email it has never seen before.
- Discover Weekly, which groups you with strangers who share your taste — though no one ever labelled anyone's taste.
- AlphaZero, which reached superhuman chess in hours — having studied zero human games.
Here's the puzzle: none of the three was taught the same way. One had an answer key. One had no answers at all. One had only a score.
Before reading on, try to match them. Which one had the answer key? Which had nothing? Which had only a score?
AlphaZero saw zero human games and became superhuman anyway. What was its teacher?
Pick one — committing first is what makes the answer stick.
the lesson continues after you choose
The tempting mental model is that “training an AI” is one activity — feed in data, intelligence comes out.
It's reasonable: from the outside, all three systems just consumed data and got smart. But it hides the variable that determines everything about an AI project — what kind of feedback the machine gets. Change the teacher and you change what's learnable, what data you need, and what can go wrong.
Supervised learning — the answer key. Every example comes with the correct output (spam / not spam), and the model adjusts until its answers match. Most deployed business ML — fraud flags, medical image triage, demand forecasts — is this. Its bottleneck: someone must produce the labels.
Unsupervised learning — no answers at all. The model finds structure that was already in the data: clusters of listeners with similar taste, groups of transactions that look alike, the odd one out (that's anomaly detection). Nobody defines the categories; they emerge. Its bottleneck: the structure it finds may not be the structure you meant.
Reinforcement learning — the score. No examples, no labels; just actions, consequences, and a reward signal to maximise. It's how AlphaZero learned chess, how robots learn to walk, and — this surprises people — the finishing school for every chatbot you use: after pretraining, models like Claude and GPT are tuned with reinforcement learning from human feedback (RLHF), where the 'score' is a human preferring one answer over another. Its bottleneck: you get exactly what you score, which deserves its own lesson.
One 2026 addendum the textbooks of 2023 underplayed: the biggest models of all use a fourth trick — self-supervised learning — supervised learning where the labels are free. Take any sentence from the internet, hide the next word, and the hidden word is the label. That's how LLMs turn the whole internet into an answer key with zero human labelling. It's the reason they could scale to trillions of training words while CAPTCHA-style labelling never could.
So the classroom joke resolves: the spam filter had the answer key (supervised), the playlist had no answers and found the structure itself (unsupervised), and the chess engine had only a score (reinforcement). They're not three products of one method — they're three different teachers, and each teacher leaves a different signature on what the student can and cannot do.
Your rule: before any ML project — or any vendor pitch — ask “do we have answers, structure, or a score?” Labelled history → supervised. Piles of unlabelled data and a hunch → unsupervised. A measurable outcome you can score repeatedly → reinforcement. If you have none of the three, you don't have an AI project yet; you have a data-collection project.