History of AI

Why start with history?

To understand why ChatGPT or Claude only became possible in the 2020s, you have to know what came before. AI was not born in 2022 — it has existed since the 1950s and has gone through three big waves.

Every wave added capability rather than replacing the previous one. Rule-based systems, classical ML, and modern LLMs all coexist in production today.

A 70-year timeline

flowchart TB
  Y1950["<b>1950</b><br/>Turing test · Symbolic AI"]:::c1
  Y1956["<b>1956</b><br/>Dartmouth workshop coins the word 'AI'"]:::c2
  Y1980["<b>1980</b><br/>Expert systems boom · Backpropagation"]:::c3
  Y1997["<b>1997</b><br/>Deep Blue beats Kasparov"]:::c4
  Y2006["<b>2006</b><br/>Hinton revives deep networks"]:::c5
  Y2012["<b>2012</b><br/>AlexNet wins ImageNet (CNN)"]:::c6
  Y2017["<b>2017</b><br/>'Attention is All You Need' — Transformer"]:::c7
  Y2020["<b>2020</b><br/>GPT-3"]:::c8
  Y2022["<b>2022</b><br/>ChatGPT goes mainstream"]:::c9
  Y2024["<b>2024</b><br/>Multimodal LLMs · AI agents"]:::c10
  Y1950 --> Y1956 --> Y1980 --> Y1997 --> Y2006 --> Y2012 --> Y2017 --> Y2020 --> Y2022 --> Y2024
  classDef c1 fill:#b8b5f5,stroke:#5752c4,color:#1e1b4b
  classDef c2 fill:#fff89c,stroke:#c9a300,color:#3f2e00
  classDef c3 fill:#ccf2cc,stroke:#3aa83a,color:#0e3a0e
  classDef c4 fill:#d9aef0,stroke:#8a3fbb,color:#3a134f
  classDef c5 fill:#ffb3d9,stroke:#c9468c,color:#560b35
  classDef c6 fill:#ff9999,stroke:#c43434,color:#4a0d0d
  classDef c7 fill:#ffd8a3,stroke:#c97a1a,color:#4a2a05
  classDef c8 fill:#ffcc88,stroke:#cc7a00,color:#4a2a00
  classDef c9 fill:#c8f0a0,stroke:#5fa830,color:#1e3a08
  classDef c10 fill:#a8f0d0,stroke:#2da378,color:#0c3a26

70 years of AI — read top to bottom.

flowchart LR
  A["<b>1950</b><br/>Turing test<br/>Symbolic AI"]:::c1 --> B["<b>1956</b><br/>Dartmouth<br/>coins 'AI'"]:::c2 --> C["<b>1980</b><br/>Expert systems<br/>Backpropagation"]:::c3 --> D["<b>1997</b><br/>Deep Blue<br/>beats Kasparov"]:::c4
  classDef c1 fill:#b8b5f5,stroke:#5752c4,color:#1e1b4b
  classDef c2 fill:#fff89c,stroke:#c9a300,color:#3f2e00
  classDef c3 fill:#ccf2cc,stroke:#3aa83a,color:#0e3a0e
  classDef c4 fill:#d9aef0,stroke:#8a3fbb,color:#3a134f

Foundations — symbolic AI and classical ML (1950 → 1997).

flowchart LR
  E["<b>2006</b><br/>Hinton revives<br/>deep networks"]:::c5 --> F["<b>2012</b><br/>AlexNet wins<br/>ImageNet (CNN)"]:::c6 --> G["<b>2017</b><br/>Transformer<br/>'Attention is<br/>All You Need'"]:::c7 --> H["<b>2020</b><br/>GPT-3"]:::c8 --> I["<b>2022</b><br/>ChatGPT<br/>mainstream"]:::c9 --> J["<b>2024</b><br/>Multimodal<br/>+ AI agents"]:::c10
  classDef c5 fill:#ffb3d9,stroke:#c9468c,color:#560b35
  classDef c6 fill:#ff9999,stroke:#c43434,color:#4a0d0d
  classDef c7 fill:#ffd8a3,stroke:#c97a1a,color:#4a2a05
  classDef c8 fill:#ffcc88,stroke:#cc7a00,color:#4a2a00
  classDef c9 fill:#c8f0a0,stroke:#5fa830,color:#1e3a08
  classDef c10 fill:#a8f0d0,stroke:#2da378,color:#0c3a26

The deep learning era (2006 → 2024).

The three big waves of AI

flowchart LR
  A["Wave 1<br/>Symbolic AI<br/>1950 - 1980"] --> B["Wave 2<br/>Classical ML<br/>1980 - 2010"]
  B --> C["Wave 3<br/>Deep Learning + LLMs<br/>2012 - today"]
  A -. coexists .-> C
  B -. coexists .-> C

Each wave adds capability — none of them disappeared.

Wave 1 — Symbolic AI (1950–1980): “if… then…”

The earliest AI programs were rule-based systems written by hand by humans.

Example: a 1970s medical expert system contains 500 rules like “if the patient has a fever AND a dry cough THEN suggest the flu”.

It works well… as long as the rule exists. The moment you step outside the planned domain, the system is blind. This is the limitation that the next wave fixes.

Wave 2 — Classical Machine Learning (1980–2010): “learn from examples”

Instead of writing rules, we show examples to an algorithm, and it figures out the rules on its own.

Example: we give 10,000 emails labelled “spam” or “not spam”, and the algorithm learns to spot spam.

Key methods: linear regression, decision trees, SVM, k-NN, random forests. Still used today for many tabular-data problems.

Wave 3 — Deep Learning and LLMs (2012–today)

In 2012, a team (Krizhevsky, Sutskever, Hinton) wins the ImageNet competition with a CNN (convolutional neural network). Accuracy jumps by a wide margin, and academia realises that deep networks + lots of data + GPUs define the new state of the art.

What followed:

2014–2017 — image recognition and machine translation reach near-human level.
2017 — Google publishes Attention is All You Need, inventing the Transformer architecture.
2020 — OpenAI releases GPT-3, the first truly general-purpose large language model.
2022 — ChatGPT ships; AI hits the mainstream.
2024–2026 — multimodal LLMs (text + image + audio), code agents, video generation, agentic systems.

Key takeaways

AI is 70 years old, not 4.
Three big waves: rules, classical ML, deep learning / LLMs.
The modern turning point is 2012 (CNN on ImageNet) followed by 2017 (Transformer).
No wave replaced the previous one — rules, classical ML and LLMs coexist in real-world systems.

Next: From rules to data — what changed in the way we build software when ML arrived.