Skip to content

Types of machine learning

Machine learning is not one thing — it’s a family with three siblings, each defined by the kind of feedback the model receives during training.

flowchart TB
  ML["Machine Learning"]
  ML --> S["Supervised<br/>(labelled examples)"]
  ML --> U["Unsupervised<br/>(no labels — find structure)"]
  ML --> R["Reinforcement<br/>(trial and error + reward)"]
Three paradigms, three different problem types.

You give the model inputs paired with the correct answers, and it learns to map one to the other.

Input XOutput y
Email textSpam / not spam
House featuresSelling price
Image”Cat”, “dog”, “horse”

Two main tasks:

  • Classification — output is a category (spam, dog, fraud).
  • Regression — output is a continuous number (price, temperature).

Rule of thumb: if you can label your data with a clear right answer, start here. ~80% of business ML in the real world is supervised.

No labels, no “right answer”. The model has to find structure in the data by itself.

Common tasks:

  • Clustering — group similar customers, products, documents.
  • Dimensionality reduction — squash 1,000 features into 2D for visualisation (PCA, t-SNE, UMAP).
  • Anomaly detection — flag what doesn’t look like the rest (fraud, intrusion).

Example: feed 1 million customer purchase histories to a clustering algorithm. Without telling it anything, it discovers “the bargain hunters”, “the weekend shoppers”, “the gift-buyers”.

The model — called an agent — interacts with an environment, takes actions, and receives rewards (or penalties). Over many trials, it learns a strategy that maximises the long-term reward.

flowchart LR
  A["Agent"] -->|action| E["Environment"]
  E -->|"state + reward"| A
The reinforcement learning loop — the heart of game AI, robotics, and (recently) LLM fine-tuning.

Famous applications:

  • AlphaGo (DeepMind, 2016) — beats the world Go champion.
  • OpenAI Five — plays Dota 2 at world-class level.
  • RLHF — the technique used to align ChatGPT and Claude with human preferences.

RL is the most “human-like” of the three: try things, fail, get feedback, get better.

One business question, three different answers

Section titled “One business question, three different answers”

The three paradigms are not interchangeable: each one reformulates the problem and produces a different kind of output. Here is the same starting point — “we have 100,000 e-commerce customers and want to act on this data” — viewed through each paradigm.

The same dataset, three paradigms side by side
AspectSupervisedUnsupervisedReinforcement
Question reformulated as”Which customers will churn in the next 30 days?""Which natural customer groups exist?""Which product should we recommend next to maximise lifetime value?”
Input data XPurchase history, last visit, cart size, etc.Same — but with no labelSequence of past interactions (state)
Target / signalchurned = 0 or 1 (known label)NoneA reward (e.g. +10 if click, −1 per impression)
Typical algorithmLogistic regression, gradient boostingK-means, DBSCAN, PCAQ-learning, contextual bandits
Concrete output on row n°4217P(churn) = 0.83 → flag in CRMcluster = 2 ("dormant bargain hunters")next_action = "send 15% coupon"
What you need to ship itYears of historical labelsJust the featuresA live system to play actions and observe reward
Time horizonOne prediction at a timeStatic map of the populationLong-term strategy (months)
Risk if misappliedUseless if labels are biased or staleClusters may not match a business realitySlow convergence, exploration cost

Same 100,000 rows, three completely different projects. Choosing the paradigm is not a technical decision first — it is a business framing decision.

flowchart TB
  Q["Do you have<br/>labelled data?"]
  Q -- "Yes" --> S["Supervised"]
  Q -- "No, but you have lots of data" --> U["Unsupervised<br/>(clustering, embeddings)"]
  Q -- "No — agent that acts in a world" --> R["Reinforcement"]
A 5-second decision tree to pick the right ML family.

Don’t forget: semi-supervised and self-supervised

Section titled “Don’t forget: semi-supervised and self-supervised”

There’s a fourth category that powers modern LLMs: self-supervised learning. The labels are created automatically from the data itself.

Example: take a sentence, hide one word, ask the model to guess it. No human labels were needed — the sentence labels itself.

This is exactly how GPT-style models are pre-trained on the internet. We’ll come back to it in the lesson on LLMs.

  • Supervised = labelled examples (classification, regression). The workhorse.
  • Unsupervised = no labels, find structure (clustering, anomaly detection).
  • Reinforcement = trial & error with rewards. Powers game AI and RLHF.
  • Self-supervised = labels generated from data itself. Powers modern LLMs.

Next: The deep learning revolution — why 2012 changed everything.