Types of machine learning

The three families

Machine learning is not one thing — it’s a family with three siblings, each defined by the kind of feedback the model receives during training.

flowchart TB
  ML["Machine Learning"]
  ML --> S["Supervised<br/>(labelled examples)"]
  ML --> U["Unsupervised<br/>(no labels — find structure)"]
  ML --> R["Reinforcement<br/>(trial and error + reward)"]

Three paradigms, three different problem types.

1. Supervised learning

You give the model inputs paired with the correct answers, and it learns to map one to the other.

Input X	Output y
Email text	Spam / not spam
House features	Selling price
Image	”Cat”, “dog”, “horse”

Two main tasks:

Classification — output is a category (spam, dog, fraud).
Regression — output is a continuous number (price, temperature).

Rule of thumb: if you can label your data with a clear right answer, start here. ~80% of business ML in the real world is supervised.

2. Unsupervised learning

No labels, no “right answer”. The model has to find structure in the data by itself.

Common tasks:

Clustering — group similar customers, products, documents.
Dimensionality reduction — squash 1,000 features into 2D for visualisation (PCA, t-SNE, UMAP).
Anomaly detection — flag what doesn’t look like the rest (fraud, intrusion).

Example: feed 1 million customer purchase histories to a clustering algorithm. Without telling it anything, it discovers “the bargain hunters”, “the weekend shoppers”, “the gift-buyers”.

3. Reinforcement learning

The model — called an agent — interacts with an environment, takes actions, and receives rewards (or penalties). Over many trials, it learns a strategy that maximises the long-term reward.

flowchart LR
  A["Agent"] -->|action| E["Environment"]
  E -->|"state + reward"| A

The reinforcement learning loop — the heart of game AI, robotics, and (recently) LLM fine-tuning.

Famous applications:

AlphaGo (DeepMind, 2016) — beats the world Go champion.
OpenAI Five — plays Dota 2 at world-class level.
RLHF — the technique used to align ChatGPT and Claude with human preferences.

RL is the most “human-like” of the three: try things, fail, get feedback, get better.

One business question, three different answers

The three paradigms are not interchangeable: each one reformulates the problem and produces a different kind of output. Here is the same starting point — “we have 100,000 e-commerce customers and want to act on this data” — viewed through each paradigm.

The same dataset, three paradigms side by side

Aspect	Supervised	Unsupervised	Reinforcement
Question reformulated as	”Which customers will churn in the next 30 days?"	"Which natural customer groups exist?"	"Which product should we recommend next to maximise lifetime value?”
Input data X	Purchase history, last visit, cart size, etc.	Same — but with no label	Sequence of past interactions (state)
Target / signal	`churned = 0 or 1` (known label)	None	A reward (e.g. `+10` if click, `−1` per impression)
Typical algorithm	Logistic regression, gradient boosting	K-means, DBSCAN, PCA	Q-learning, contextual bandits
Concrete output on row n°4217	`P(churn) = 0.83` → flag in CRM	`cluster = 2 ("dormant bargain hunters")`	`next_action = "send 15% coupon"`
What you need to ship it	Years of historical labels	Just the features	A live system to play actions and observe reward
Time horizon	One prediction at a time	Static map of the population	Long-term strategy (months)
Risk if misapplied	Useless if labels are biased or stale	Clusters may not match a business reality	Slow convergence, exploration cost

Same 100,000 rows, three completely different projects. Choosing the paradigm is not a technical decision first — it is a business framing decision.

When to use which

flowchart TB
  Q["Do you have<br/>labelled data?"]
  Q -- "Yes" --> S["Supervised"]
  Q -- "No, but you have lots of data" --> U["Unsupervised<br/>(clustering, embeddings)"]
  Q -- "No — agent that acts in a world" --> R["Reinforcement"]

A 5-second decision tree to pick the right ML family.

Don’t forget: semi-supervised and self-supervised

There’s a fourth category that powers modern LLMs: self-supervised learning. The labels are created automatically from the data itself.

Example: take a sentence, hide one word, ask the model to guess it. No human labels were needed — the sentence labels itself.

This is exactly how GPT-style models are pre-trained on the internet. We’ll come back to it in the lesson on LLMs.

Key takeaways

Supervised = labelled examples (classification, regression). The workhorse.
Unsupervised = no labels, find structure (clustering, anomaly detection).
Reinforcement = trial & error with rewards. Powers game AI and RLHF.
Self-supervised = labels generated from data itself. Powers modern LLMs.

Next: The deep learning revolution — why 2012 changed everything.