Skip to content

From LLMs to agents

An LLM on its own is brilliant and badly limited. Once you understand the four core limitations, every modern AI architecture you’ll see — RAG, assistants, agents, agentic AI — becomes obvious. Each one is a fix for a specific problem.

flowchart LR
  P["Raw LLM"] -- "doesn't know your data" --> S1["RAG"]
  P -- "can't act on the world" --> S2["Tool calling"]
  P -- "can't plan multi-step tasks" --> S3["Agents"]
  P -- "needs orchestration & long-term goals" --> S4["Agentic AI"]
Every layer above the raw LLM exists to plug a specific hole.

The four big limits:

  1. No private data — the model only knows what was on the internet up to its cutoff. It has never seen your company wiki.
  2. No actions — it can write you Python code, but it cannot run it, send an email, or query your database.
  3. No memory — every API call starts from zero; the model has no idea you spoke yesterday.
  4. No planning — it produces tokens left-to-right; it doesn’t naturally break a goal into sub-tasks.
flowchart TB
  L0["LLM<br/>raw model"]
  L1["+ Prompting &amp; context<br/>(stuff context into the prompt)"]
  L2["+ RAG<br/>(retrieve relevant docs first)"]
  L3["+ Tool calling<br/>(LLM can call APIs)"]
  L4["+ Agents<br/>(plan, act, observe, loop)"]
  L5["+ Agentic AI<br/>(multiple agents, long-running goals)"]
  L0 --> L1 --> L2 --> L3 --> L4 --> L5
Each rung adds capability — and complexity. Pick the lowest rung that solves your problem.

You can fix surprisingly many problems just by being specific in your prompt: roles, examples, format constraints. This is prompt engineering. It costs $0 and is reversible — always try this first.

Rung 2 — Retrieval-Augmented Generation (RAG)

Section titled “Rung 2 — Retrieval-Augmented Generation (RAG)”

When the LLM needs your data, you fetch the relevant chunks from a vector database (or any search engine) and stick them in the prompt.

flowchart LR
  U["User question"] --> R["Search<br/>(vector DB)"]
  R --> D["Top-k documents"]
  D --> P["Prompt = question + docs"]
  U --> P
  P --> L["LLM"]
  L --> A["Grounded answer"]
RAG in one diagram. We'll build one in Course 2.

The model is given a list of “tools” (functions it can call) and decides on its own when to use one. A “weather” tool, a “search the web” tool, a “run SQL” tool. This is the bridge between language and action.

An agent is an LLM in a loop: think → act → observe → repeat, until the goal is reached.

flowchart LR
  G["Goal"] --> T["Think<br/>(LLM plans the next step)"]
  T --> A["Act<br/>(call a tool)"]
  A --> O["Observe<br/>(read the result)"]
  O --> T
  T -->|"goal reached"| D["Done"]
The ReAct loop — the fundamental shape of every modern agent.

Examples you’ve probably used: Claude Code, Cursor, Devin, Manus, OpenAI’s o1 in agent mode.

The buzzword of 2025–2026. It is agents on top of agents: a planner agent that spawns specialist agents (a “researcher”, a “coder”, a “tester”), each operating with its own memory and tools, coordinating to achieve a long-running goal autonomously.

This is where libraries like LangGraph and CrewAI live.

LibraryRole
LangChainToolkit to compose LLM calls, tools, RAG. Good for prototypes; criticised for over-abstraction in production.
LangGraphState-machine framework for stateful, multi-step, multi-agent workflows. The serious cousin of LangChain.
LlamaIndexRAG-focused — indexing, retrieval, query engines.
Crew AI / AutoGenMulti-agent orchestration frameworks.

We’ll touch on these in Course 2 — for now just remember they exist and that they live at rungs 2–5 of the ladder.

Assistants vs agents — what’s the difference?

Section titled “Assistants vs agents — what’s the difference?”

People mix these terms; here is the cleanest distinction:

TermStepsInitiativeExample
AI assistantOne question → one answerReactive (waits for the user)ChatGPT in normal chat mode
AI agentMulti-step loop with toolsTakes the next step on its ownCursor in agent mode
Agentic AIMultiple agents, long horizonPursues goals autonomouslyDevin shipping a PR while you sleep
  • A raw LLM knows the internet, but nothing about you, can’t act, has no memory, can’t plan.
  • Each modern AI pattern is a fix for one of these holes.
  • The ladder: prompting → RAG → tool calling → agents → agentic AI.
  • LangChain / LangGraph / LlamaIndex are libraries that operate at rungs 2–5.
  • Don’t jump to the top of the ladder — solve your problem at the lowest rung that works.

This closes Part 1. Up next: Part 2 — ML lifecycle — how a real ML project actually ships, from business question to production model.