Skip to content

Annex — LangChain & LangGraph in depth

Duration: ~25 min Prerequisites: chapter 04 — Ollama vs LangChain and chapter 09 — Demo 3: a simple agent

This annex extends the short comparison in chapter 04 with the depth you need to actually read other people’s tutorials and not feel lost. It assumes you already know what an LLM is (chapter 01) and what it cannot do on its own (chapter 02).

The companion annex, the agentic AI landscape, covers what an agent is, the ReAct loop, and the broader framework ecosystem (LlamaIndex, AutoGen, CrewAI, Smolagents…).

By the end of demo 3 you have a working agent in about 30 useful lines, with every message visible and every tool call printable. The natural next question is: if we wrote everything by hand, what would change if we used LangChain or LangGraph instead?

This annex answers that question methodically. It defines each piece of vocabulary you will see in tutorials, it shows the same problem with and without a framework, and it gives you a decision rule for picking one over the other in a real project.


LangChain is an open-source toolbox (Python and JavaScript, MIT-licensed) that gives you standardised building blocks for LLM applications. The most accurate mental image is a Lego set for AI applications: each piece has a well-defined shape, and the pieces snap together with one operator.

BlockWhat it does
PromptTemplateA reusable message template with variables
ChatModelA uniform interface over OpenAI, Anthropic, Ollama, Mistral, Groq…
OutputParserTurns the model’s free text into a typed Python object (Pydantic, JSON)
Tool / @toolA Python function exposed to the model as an action it can call
AgentAn LLM + a list of tools + a built-in reasoning loop
RetrieverReturns the documents most relevant to a question (the R in RAG)
VectorStoreA database of embeddings (Pinecone, Chroma, FAISS, Weaviate…)
MemoryA system for keeping conversation history across turns
Runnable / LCELThe pipe operator `

LCEL stands for LangChain Expression Language. The whole idea fits in one character: |.

chain = prompt | llm | parser
result = chain.invoke({"question": "Hello?"})

Read this as: send the input into prompt, its output into llm, its output into parser.

Every LCEL chain inherits four methods for free:

  • .invoke(input) — synchronous call
  • .ainvoke(input) — asynchronous call
  • .stream(input) — token-by-token streaming
  • .batch(inputs) — batched calls

You write the chain once; you get sync, async, streaming and batching without a single extra line.

  • Provider portability. Swap init_chat_model("openai:gpt-4o") for init_chat_model("ollama:llama3.1:8b") and everything downstream keeps working.
  • Streaming and async for free, no SSE parsing by hand.
  • Hundreds of integrations out of the box (Tavily, Pinecone, Chroma, SQL, Slack, GitHub, Notion, S3…).
  • An agent in a few lines via create_agent(model=..., tools=[...]).

2.4. What LangChain explicitly does not do

Section titled “2.4. What LangChain explicitly does not do”
  • It does not replace an LLM. You still need a provider (OpenAI, Anthropic, Ollama…).
  • It does not guarantee the quality of the response — only the model is responsible for that.
  • It does not orchestrate complex stateful workflows with loops, branches and shared state. That is exactly what LangGraph is for.

3. What LangGraph is, and why it is separate from LangChain

Section titled “3. What LangGraph is, and why it is separate from LangChain”

LangGraph is a companion library built by the same team. Its central idea: represent your LLM application as a graph of states, not as a linear chain.

flowchart LR
subgraph LangChain["LangChain (chain mindset)"]
direction LR
A1[Prompt] --> B1[LLM]
B1 --> C1[Parser]
end
subgraph LangGraph["LangGraph (graph mindset)"]
direction TB
S([Start]) --> N1[Generate]
N1 -->|"needs tool"| N2[Tool]
N2 --> N1
N1 -->|"draft ready"| N3[Reflect]
N3 -->|"improve"| N1
N3 -->|"good enough"| E([End])
end

LangChain treats your application as a pipeline. LangGraph treats it as a state machine: nodes are functions, edges are transitions, and a shared state object travels through the graph.

  • Loops. Generate → critique → generate again until a quality threshold is met.
  • Conditional edges. “If the question is about pricing, go to sql_query; otherwise go to web_search.”
  • Shared state. A TypedDict flows through every node and accumulates messages, retrieved documents, scores, intermediate results.
  • Multi-agent orchestration. Several agents are individual nodes that hand off control to each other.
  • Advanced patterns described in academic papers: Reflection, Reflexion, Plan-and-Execute, Adaptive RAG, Self-RAG, Corrective RAG.
You need…Use
A linear pipeline (prompt → LLM → parser)LangChain (LCEL)
A single agent with a list of toolsLangChain create_agent
A reflection or revision loopLangGraph
A workflow with conditional branches and shared stateLangGraph
Several agents collaborating with handoffLangGraph

4. The vocabulary you will see in every tutorial

Section titled “4. The vocabulary you will see in every tutorial”

This vocabulary is shared across almost every framework — LangChain, LlamaIndex, Haystack, AutoGen, CrewAI. Mastering it is enough to read any tutorial in the field.

A prompt is the instruction sent to the LLM. A prompt template is a prompt with variables, for example:

“Summarise this text in {language} in {n_sentences} sentences: {text}.”

A chain is a sequence of steps such as prompt → llm → parser. In LangChain, it is written with the pipe |.

A tool is a function the LLM can request to call. Typical examples:

  • search_web(query) returns web results.
  • query_database(sql) returns rows from a database.
  • send_email(to, body) sends a message.

You declare your tools, the LLM picks which one to call, and the framework or your own code executes it. This is exactly what we did by hand in demo 3 with list_files, read_file, write_file, compile_java.

An agent is an LLM + tools + a loop. It decides which tools to call, in which order, until it can produce a final answer.

Memory is the conversation history. Without it, each turn is isolated. Common kinds:

  • Buffer memory keeps the last N messages.
  • Summary memory summarises older turns to save tokens.
  • Vector memory indexes past exchanges and retrieves only the relevant ones.

An embedding is a numerical representation of a piece of text as a vector (typically 384, 768 or 1536 dimensions). Two texts with similar meanings have nearby vectors.

A vector store is a database that stores embeddings and answers the question “give me the texts whose vectors are closest to this one”. Common implementations: Pinecone (managed), Chroma (local), FAISS (Facebook), Weaviate, Milvus, pgvector.

A retriever is the component that combines a vector store with retrieval logic. You hand it a question, it hands you back the top k most relevant documents.

4.9. RAG — Retrieval-Augmented Generation

Section titled “4.9. RAG — Retrieval-Augmented Generation”

RAG is the technique that combines a retriever with an LLM:

  1. The user asks a question.
  2. The retriever fetches the relevant documents from your data.
  3. Those documents are injected into the prompt as context.
  4. The LLM answers using the documents.

It is the standard way to avoid hallucinations on your private data, since the LLM was not trained on it.

Chunking is the act of splitting documents into smaller pieces (typically 200 to 1000 tokens) before embedding them. The size of the chunks directly affects retrieval quality.

Streaming means receiving the answer token by token as it is generated, rather than in one block at the end. This is what produces the “ChatGPT typing live” effect.

Tracing is recording every step of an agent’s execution: prompts sent, tools called, outputs received, durations, costs. Indispensable for debugging anything beyond a toy example.

LangSmith is the official tracing and evaluation platform for the LangChain ecosystem. It is a commercial hosted service (with a free tier), in contrast to LangChain and LangGraph themselves which are MIT-licensed open source.


5. Without a framework, five problems you will face

Section titled “5. Without a framework, five problems you will face”

If you decide to code everything yourself against the bare OpenAI, Anthropic or Ollama SDK (which is exactly what this course does for teaching purposes), you will hit five recurring walls. Some of them were already mentioned in chapter 02 — what an LLM cannot do, but here we look at them from the engineering angle, not the model angle.

You code everything against the OpenAI API. Three months later, the team asks: “can we switch to Anthropic? Or to Mistral local?”

Without a framework:

  • Every openai.ChatCompletion.create(...) must be rewritten.
  • The message format differs slightly between providers.
  • The tool-call format differs a lot between providers.
  • You rewrite dozens of call sites.

Cost. Several days of work for a change that should be trivial.

You want your bot to call a web search. Without a framework:

  • You hand-write the JSON schema for the tool (~30 lines).
  • You parse the model’s tool_calls yourself.
  • You call the tool.
  • You reformat the result as {"role": "tool", "content": ...}.
  • You feed it back into the prompt.
  • You repeat that recipe for every new tool.

Cost. A lot of repetitive plumbing and subtle bugs around argument parsing.

You want the answer to appear progressively in the UI. Without a framework:

  • You switch to stream=True.
  • You receive raw SSE chunks.
  • You parse and accumulate them, and handle network errors gracefully.

Cost. It works, but it is roughly 50 lines per endpoint instead of one.

Two weeks after deployment, a bug arrives: “the agent answers badly”. Without a framework:

  • You have no trace of the calls.
  • You do not know which prompt was actually sent.
  • You do not know how many iterations the agent did.
  • You do not know how much each call cost.

Cost. Blind debugging and considerable wasted time.

You build a RAG system for client A. Three months later, client B wants a similar one with a different vector store.

Without a framework:

  • Everything was coupled to Pinecone.
  • You rewrite half the code to switch to Chroma.

Cost. Each project starts from zero.


6. With a framework, the same five problems

Section titled “6. With a framework, the same five problems”
llm = init_chat_model("openai:gpt-4o")
# one line changes:
llm = init_chat_model("anthropic:claude-3-5-sonnet")

Everything downstream keeps working.

@tool
def search_web(query: str) -> str:
"""Search the web."""
return tavily.search(query)
agent = create_agent(model=llm, tools=[search_web])

LangChain generates the JSON schema, parses the tool calls, formats the results. You write one Python function and you are done. This is the same primitive Ollama exposes when we write client.chat(model=..., tools=[search_web]) in demo 3 — except LangChain also handles the loop, retries and observability.

for chunk in chain.stream({"question": "Hello"}):
print(chunk, end="")

One line. That is the whole thing.

You set two environment variables:

Terminal window
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=...

And the LangSmith dashboard automatically shows, for every call: the exact prompt sent, the response received, the tool calls, the duration, the cost, and the full trace tree.

vectorstore = PineconeVectorStore(...)
# or:
vectorstore = Chroma(...)

The rest of your RAG chain (retriever, prompt, LLM, parser) does not change.

flowchart TD
A["Without framework"] --> A1["Code coupled to one provider"]
A1 --> A2["Heavy boilerplate"]
A2 --> A3["No observability"]
A3 --> A4["Hard to maintain"]
B["With LangChain / LangGraph"] --> B1["Provider-agnostic code"]
B1 --> B2["Reusable blocks"]
B2 --> B3["Built-in tracing"]
B3 --> B4["Fast iteration"]

AspectBare Ollama API (our choice in this course)LangChain (LCEL)LangChain AgentsLangGraph
Level of abstractionVery lowMediumHighHigh but very controllable
Tool callingYes, you wire it manuallyYes, via bind_toolsYes, nativeYes, native
Built-in reasoning loopHand-coded (our ~30 lines)Possible, not idiomaticYesYes, explicit
Conditional branchesHand-codedLimitedLimitedYes, conditional edges
Shared state across stepsHand-codedLimited (RunnablePassthrough)Implicit (message list)Yes, explicit TypedDict
Multi-agentHand-codedPossibleLimitedYes, idiomatic
Streaming for freeHand-codedYesYesYes
Built-in tracingNoneYes (LangSmith)YesYes
Learning curveLow (you just call an API)Low to mediumMediumMedium to high
Best forLearning, prototypes, ultra-simple scriptsLinear pipelinesSingle agent with tools, few linesComplex workflows, serious production

8. When to stay hand-coded, when to switch

Section titled “8. When to stay hand-coded, when to switch”

This section extends the same conversation we started at the end of chapter 09 — demo 3, with more depth.

  • You are teaching or learning an agent for the first time. Visibility on every byte beats line count.
  • You target one provider and you only need a handful of tools.
  • You want to control the prompt exactly, with no template you have not written yourself.
  • Your project lives in a single file and a single afternoon.
  • You need to support several LLM providers in the same product (OpenAI, Anthropic, Ollama, a fine-tune in private cloud).
  • You want streaming, retries, timeouts, caches without writing them.
  • You want automatic observability via LangSmith — every call traced, costs counted, prompts archived.
  • You build a RAG over hundreds or thousands of documents with a real vector store.
  • Your team is already familiar with LangChain.
  • Your agent needs explicit loops (generate → critique → regenerate).
  • The workflow has conditional branches based on the model’s output or external state.
  • You want multiple cooperating agents with clean handoff (planner → coder → reviewer).
  • You need to persist and resume an agent’s state across days (LangGraph has built-in checkpointing).

Keep it hand-coded until any one of these things happens twice in a week: – you change provider, – you add a fifth tool, – you wish you could see the exact prompt sent to the model, – a teammate asks “why did the agent do that?” and you cannot answer.

When the second occurrence hits, port to LangChain or LangGraph. Not before.


  • LangChain is a Lego set of building blocks for LLM applications, composable with the | operator (LCEL). It solves provider portability, tool wiring, streaming and observability.
  • LangGraph is the same team’s companion library for stateful workflows: loops, conditional branches, multi-agent collaboration, shared TypedDict state.
  • LangChain, LangGraph, LlamaIndex, AutoGen, CrewAI and Smolagents are all open source and free to use. The hosted services around them (LangSmith, LangGraph Cloud) are the paid commercial part.
  • This course stays hand-coded for pedagogical reasons: every byte sent to the model and every byte returned is visible in 30 lines of Python.
  • In a real project, switch to a framework the moment you hit any of the five walls in section 5 — and not earlier.

Next stop: the agentic AI landscape annex — what “Agentic AI” really means, the ReAct loop, and the broader framework ecosystem.