Types of machine learning
The three families
Section titled “The three families”Machine learning is not one thing — it’s a family with three siblings, each defined by the kind of feedback the model receives during training.
flowchart TB ML["Machine Learning"] ML --> S["Supervised<br/>(labelled examples)"] ML --> U["Unsupervised<br/>(no labels — find structure)"] ML --> R["Reinforcement<br/>(trial and error + reward)"]
1. Supervised learning
Section titled “1. Supervised learning”You give the model inputs paired with the correct answers, and it learns to map one to the other.
| Input X | Output y |
|---|---|
| Email text | Spam / not spam |
| House features | Selling price |
| Image | ”Cat”, “dog”, “horse” |
Two main tasks:
- Classification — output is a category (spam, dog, fraud).
- Regression — output is a continuous number (price, temperature).
Rule of thumb: if you can label your data with a clear right answer, start here. ~80% of business ML in the real world is supervised.
2. Unsupervised learning
Section titled “2. Unsupervised learning”No labels, no “right answer”. The model has to find structure in the data by itself.
Common tasks:
- Clustering — group similar customers, products, documents.
- Dimensionality reduction — squash 1,000 features into 2D for visualisation (PCA, t-SNE, UMAP).
- Anomaly detection — flag what doesn’t look like the rest (fraud, intrusion).
Example: feed 1 million customer purchase histories to a clustering algorithm. Without telling it anything, it discovers “the bargain hunters”, “the weekend shoppers”, “the gift-buyers”.
3. Reinforcement learning
Section titled “3. Reinforcement learning”The model — called an agent — interacts with an environment, takes actions, and receives rewards (or penalties). Over many trials, it learns a strategy that maximises the long-term reward.
flowchart LR A["Agent"] -->|action| E["Environment"] E -->|"state + reward"| A
Famous applications:
- AlphaGo (DeepMind, 2016) — beats the world Go champion.
- OpenAI Five — plays Dota 2 at world-class level.
- RLHF — the technique used to align ChatGPT and Claude with human preferences.
RL is the most “human-like” of the three: try things, fail, get feedback, get better.
One business question, three different answers
Section titled “One business question, three different answers”The three paradigms are not interchangeable: each one reformulates the problem and produces a different kind of output. Here is the same starting point — “we have 100,000 e-commerce customers and want to act on this data” — viewed through each paradigm.
The same dataset, three paradigms side by side
| Aspect | Supervised | Unsupervised | Reinforcement |
|---|---|---|---|
| Question reformulated as | ”Which customers will churn in the next 30 days?" | "Which natural customer groups exist?" | "Which product should we recommend next to maximise lifetime value?” |
| Input data X | Purchase history, last visit, cart size, etc. | Same — but with no label | Sequence of past interactions (state) |
| Target / signal | churned = 0 or 1 (known label) | None | A reward (e.g. +10 if click, −1 per impression) |
| Typical algorithm | Logistic regression, gradient boosting | K-means, DBSCAN, PCA | Q-learning, contextual bandits |
| Concrete output on row n°4217 | P(churn) = 0.83 → flag in CRM | cluster = 2 ("dormant bargain hunters") | next_action = "send 15% coupon" |
| What you need to ship it | Years of historical labels | Just the features | A live system to play actions and observe reward |
| Time horizon | One prediction at a time | Static map of the population | Long-term strategy (months) |
| Risk if misapplied | Useless if labels are biased or stale | Clusters may not match a business reality | Slow convergence, exploration cost |
Same 100,000 rows, three completely different projects. Choosing the paradigm is not a technical decision first — it is a business framing decision.
When to use which
Section titled “When to use which”flowchart TB Q["Do you have<br/>labelled data?"] Q -- "Yes" --> S["Supervised"] Q -- "No, but you have lots of data" --> U["Unsupervised<br/>(clustering, embeddings)"] Q -- "No — agent that acts in a world" --> R["Reinforcement"]
Don’t forget: semi-supervised and self-supervised
Section titled “Don’t forget: semi-supervised and self-supervised”There’s a fourth category that powers modern LLMs: self-supervised learning. The labels are created automatically from the data itself.
Example: take a sentence, hide one word, ask the model to guess it. No human labels were needed — the sentence labels itself.
This is exactly how GPT-style models are pre-trained on the internet. We’ll come back to it in the lesson on LLMs.
Key takeaways
Section titled “Key takeaways”- Supervised = labelled examples (classification, regression). The workhorse.
- Unsupervised = no labels, find structure (clustering, anomaly detection).
- Reinforcement = trial & error with rewards. Powers game AI and RLHF.
- Self-supervised = labels generated from data itself. Powers modern LLMs.
Next: The deep learning revolution — why 2012 changed everything.