Annex — LangChain & LangGraph in depth

Duration: ~25 min Prerequisites: chapter 04 — Ollama vs LangChain and chapter 09 — Demo 3: a simple agent

This annex extends the short comparison in chapter 04 with the depth you need to actually read other people’s tutorials and not feel lost. It assumes you already know what an LLM is (chapter 01) and what it cannot do on its own (chapter 02).

The companion annex, the agentic AI landscape, covers what an agent is, the ReAct loop, and the broader framework ecosystem (LlamaIndex, AutoGen, CrewAI, Smolagents…).

1. Where we come from

By the end of demo 3 you have a working agent in about 30 useful lines, with every message visible and every tool call printable. The natural next question is: if we wrote everything by hand, what would change if we used LangChain or LangGraph instead?

This annex answers that question methodically. It defines each piece of vocabulary you will see in tutorials, it shows the same problem with and without a framework, and it gives you a decision rule for picking one over the other in a real project.

2. What LangChain is, in one paragraph

LangChain is an open-source toolbox (Python and JavaScript, MIT-licensed) that gives you standardised building blocks for LLM applications. The most accurate mental image is a Lego set for AI applications: each piece has a well-defined shape, and the pieces snap together with one operator.

2.1. The blocks LangChain provides

Block	What it does
`PromptTemplate`	A reusable message template with variables
`ChatModel`	A uniform interface over OpenAI, Anthropic, Ollama, Mistral, Groq…
`OutputParser`	Turns the model’s free text into a typed Python object (Pydantic, JSON)
`Tool` / `@tool`	A Python function exposed to the model as an action it can call
`Agent`	An LLM + a list of tools + a built-in reasoning loop
`Retriever`	Returns the documents most relevant to a question (the R in RAG)
`VectorStore`	A database of embeddings (Pinecone, Chroma, FAISS, Weaviate…)
`Memory`	A system for keeping conversation history across turns
`Runnable` / LCEL	The pipe operator `

2.2. LCEL — the pipe operator

LCEL stands for LangChain Expression Language. The whole idea fits in one character: |.

chain = prompt | llm | parser
result = chain.invoke({"question": "Hello?"})

Read this as: send the input into prompt, its output into llm, its output into parser.

Every LCEL chain inherits four methods for free:

.invoke(input) — synchronous call
.ainvoke(input) — asynchronous call
.stream(input) — token-by-token streaming
.batch(inputs) — batched calls

You write the chain once; you get sync, async, streaming and batching without a single extra line.

2.3. What LangChain genuinely buys you

Provider portability. Swap init_chat_model("openai:gpt-4o") for init_chat_model("ollama:llama3.1:8b") and everything downstream keeps working.
Streaming and async for free, no SSE parsing by hand.
Hundreds of integrations out of the box (Tavily, Pinecone, Chroma, SQL, Slack, GitHub, Notion, S3…).
An agent in a few lines via create_agent(model=..., tools=[...]).

2.4. What LangChain explicitly does not do

It does not replace an LLM. You still need a provider (OpenAI, Anthropic, Ollama…).
It does not guarantee the quality of the response — only the model is responsible for that.
It does not orchestrate complex stateful workflows with loops, branches and shared state. That is exactly what LangGraph is for.

3. What LangGraph is, and why it is separate from LangChain

LangGraph is a companion library built by the same team. Its central idea: represent your LLM application as a graph of states, not as a linear chain.

3.1. Chains vs graphs

flowchart LR
  subgraph LangChain["LangChain (chain mindset)"]
      direction LR
      A1[Prompt] --> B1[LLM]
      B1 --> C1[Parser]
  end

  subgraph LangGraph["LangGraph (graph mindset)"]
      direction TB
      S([Start]) --> N1[Generate]
      N1 -->|"needs tool"| N2[Tool]
      N2 --> N1
      N1 -->|"draft ready"| N3[Reflect]
      N3 -->|"improve"| N1
      N3 -->|"good enough"| E([End])
  end

LangChain composes pipelines (left). LangGraph orchestrates graphs of states with loops, branches, and shared state (right).

LangChain treats your application as a pipeline. LangGraph treats it as a state machine: nodes are functions, edges are transitions, and a shared state object travels through the graph.

3.2. What LangGraph specifically enables

Loops. Generate → critique → generate again until a quality threshold is met.
Conditional edges. “If the question is about pricing, go to sql_query; otherwise go to web_search.”
Shared state. A TypedDict flows through every node and accumulates messages, retrieved documents, scores, intermediate results.
Multi-agent orchestration. Several agents are individual nodes that hand off control to each other.
Advanced patterns described in academic papers: Reflection, Reflexion, Plan-and-Execute, Adaptive RAG, Self-RAG, Corrective RAG.

3.3. When to use LangChain vs LangGraph

You need…	Use
A linear pipeline (prompt → LLM → parser)	LangChain (LCEL)
A single agent with a list of tools	LangChain `create_agent`
A reflection or revision loop	LangGraph
A workflow with conditional branches and shared state	LangGraph
Several agents collaborating with handoff	LangGraph

4. The vocabulary you will see in every tutorial

This vocabulary is shared across almost every framework — LangChain, LlamaIndex, Haystack, AutoGen, CrewAI. Mastering it is enough to read any tutorial in the field.

4.1. Prompt and prompt template

A prompt is the instruction sent to the LLM. A prompt template is a prompt with variables, for example:

“Summarise this text in {language} in {n_sentences} sentences: {text}.”

4.2. Chain

A chain is a sequence of steps such as prompt → llm → parser. In LangChain, it is written with the pipe |.

4.3. Tool

A tool is a function the LLM can request to call. Typical examples:

search_web(query) returns web results.
query_database(sql) returns rows from a database.
send_email(to, body) sends a message.

You declare your tools, the LLM picks which one to call, and the framework or your own code executes it. This is exactly what we did by hand in demo 3 with list_files, read_file, write_file, compile_java.

4.4. Agent

An agent is an LLM + tools + a loop. It decides which tools to call, in which order, until it can produce a final answer.

4.5. Memory

Memory is the conversation history. Without it, each turn is isolated. Common kinds:

Buffer memory keeps the last N messages.
Summary memory summarises older turns to save tokens.
Vector memory indexes past exchanges and retrieves only the relevant ones.

4.6. Embedding

An embedding is a numerical representation of a piece of text as a vector (typically 384, 768 or 1536 dimensions). Two texts with similar meanings have nearby vectors.

4.7. Vector store

A vector store is a database that stores embeddings and answers the question “give me the texts whose vectors are closest to this one”. Common implementations: Pinecone (managed), Chroma (local), FAISS (Facebook), Weaviate, Milvus, pgvector.

4.8. Retriever

A retriever is the component that combines a vector store with retrieval logic. You hand it a question, it hands you back the top k most relevant documents.

4.9. RAG — Retrieval-Augmented Generation

RAG is the technique that combines a retriever with an LLM:

The user asks a question.
The retriever fetches the relevant documents from your data.
Those documents are injected into the prompt as context.
The LLM answers using the documents.

It is the standard way to avoid hallucinations on your private data, since the LLM was not trained on it.

4.10. Chunking

Chunking is the act of splitting documents into smaller pieces (typically 200 to 1000 tokens) before embedding them. The size of the chunks directly affects retrieval quality.

4.11. Streaming

Streaming means receiving the answer token by token as it is generated, rather than in one block at the end. This is what produces the “ChatGPT typing live” effect.

4.12. Tracing and observability

Tracing is recording every step of an agent’s execution: prompts sent, tools called, outputs received, durations, costs. Indispensable for debugging anything beyond a toy example.

LangSmith is the official tracing and evaluation platform for the LangChain ecosystem. It is a commercial hosted service (with a free tier), in contrast to LangChain and LangGraph themselves which are MIT-licensed open source.

5. Without a framework, five problems you will face

If you decide to code everything yourself against the bare OpenAI, Anthropic or Ollama SDK (which is exactly what this course does for teaching purposes), you will hit five recurring walls. Some of them were already mentioned in chapter 02 — what an LLM cannot do, but here we look at them from the engineering angle, not the model angle.

5.1. Changing provider

You code everything against the OpenAI API. Three months later, the team asks: “can we switch to Anthropic? Or to Mistral local?”

Without a framework:

Every openai.ChatCompletion.create(...) must be rewritten.
The message format differs slightly between providers.
The tool-call format differs a lot between providers.
You rewrite dozens of call sites.

Cost. Several days of work for a change that should be trivial.

5.2. Adding a tool

You want your bot to call a web search. Without a framework:

You hand-write the JSON schema for the tool (~30 lines).
You parse the model’s tool_calls yourself.
You call the tool.
You reformat the result as {"role": "tool", "content": ...}.
You feed it back into the prompt.
You repeat that recipe for every new tool.

Cost. A lot of repetitive plumbing and subtle bugs around argument parsing.

5.3. Adding streaming

You want the answer to appear progressively in the UI. Without a framework:

You switch to stream=True.
You receive raw SSE chunks.
You parse and accumulate them, and handle network errors gracefully.

Cost. It works, but it is roughly 50 lines per endpoint instead of one.

5.4. Observing what happens in production

Two weeks after deployment, a bug arrives: “the agent answers badly”. Without a framework:

You have no trace of the calls.
You do not know which prompt was actually sent.
You do not know how many iterations the agent did.
You do not know how much each call cost.

Cost. Blind debugging and considerable wasted time.

5.5. Reusing across projects

You build a RAG system for client A. Three months later, client B wants a similar one with a different vector store.

Without a framework:

Everything was coupled to Pinecone.
You rewrite half the code to switch to Chroma.

Cost. Each project starts from zero.

6. With a framework, the same five problems

6.1. Changing provider

llm = init_chat_model("openai:gpt-4o")
# one line changes:
llm = init_chat_model("anthropic:claude-3-5-sonnet")

Everything downstream keeps working.

6.2. Adding a tool

@tool
def search_web(query: str) -> str:
    """Search the web."""
    return tavily.search(query)

agent = create_agent(model=llm, tools=[search_web])

LangChain generates the JSON schema, parses the tool calls, formats the results. You write one Python function and you are done. This is the same primitive Ollama exposes when we write client.chat(model=..., tools=[search_web]) in demo 3 — except LangChain also handles the loop, retries and observability.

6.3. Streaming

for chunk in chain.stream({"question": "Hello"}):
    print(chunk, end="")

One line. That is the whole thing.

6.4. Observability

You set two environment variables:

LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=...

And the LangSmith dashboard automatically shows, for every call: the exact prompt sent, the response received, the tool calls, the duration, the cost, and the full trace tree.

6.5. Reusing across projects

vectorstore = PineconeVectorStore(...)
# or:
vectorstore = Chroma(...)

The rest of your RAG chain (retriever, prompt, LLM, parser) does not change.

6.6. The synthesis

flowchart TD
  A["Without framework"] --> A1["Code coupled to one provider"]
  A1 --> A2["Heavy boilerplate"]
  A2 --> A3["No observability"]
  A3 --> A4["Hard to maintain"]

  B["With LangChain / LangGraph"] --> B1["Provider-agnostic code"]
  B1 --> B2["Reusable blocks"]
  B2 --> B3["Built-in tracing"]
  B3 --> B4["Fast iteration"]

The five problems disappear together once you trade hand-rolled glue code for a framework's shared abstractions.

7. Compact comparison table

Aspect	Bare Ollama API (our choice in this course)	LangChain (LCEL)	LangChain Agents	LangGraph
Level of abstraction	Very low	Medium	High	High but very controllable
Tool calling	Yes, you wire it manually	Yes, via `bind_tools`	Yes, native	Yes, native
Built-in reasoning loop	Hand-coded (our ~30 lines)	Possible, not idiomatic	Yes	Yes, explicit
Conditional branches	Hand-coded	Limited	Limited	Yes, conditional edges
Shared state across steps	Hand-coded	Limited (`RunnablePassthrough`)	Implicit (message list)	Yes, explicit TypedDict
Multi-agent	Hand-coded	Possible	Limited	Yes, idiomatic
Streaming for free	Hand-coded	Yes	Yes	Yes
Built-in tracing	None	Yes (LangSmith)	Yes	Yes
Learning curve	Low (you just call an API)	Low to medium	Medium	Medium to high
Best for	Learning, prototypes, ultra-simple scripts	Linear pipelines	Single agent with tools, few lines	Complex workflows, serious production

8. When to stay hand-coded, when to switch

This section extends the same conversation we started at the end of chapter 09 — demo 3, with more depth.

8.1. Stay hand-coded when…

You are teaching or learning an agent for the first time. Visibility on every byte beats line count.
You target one provider and you only need a handful of tools.
You want to control the prompt exactly, with no template you have not written yourself.
Your project lives in a single file and a single afternoon.

8.2. Switch to LangChain when…

You need to support several LLM providers in the same product (OpenAI, Anthropic, Ollama, a fine-tune in private cloud).
You want streaming, retries, timeouts, caches without writing them.
You want automatic observability via LangSmith — every call traced, costs counted, prompts archived.
You build a RAG over hundreds or thousands of documents with a real vector store.
Your team is already familiar with LangChain.

8.3. Switch to LangGraph when…

Your agent needs explicit loops (generate → critique → regenerate).
The workflow has conditional branches based on the model’s output or external state.
You want multiple cooperating agents with clean handoff (planner → coder → reviewer).
You need to persist and resume an agent’s state across days (LangGraph has built-in checkpointing).

8.4. A practical heuristic

Keep it hand-coded until any one of these things happens twice in a week: – you change provider, – you add a fifth tool, – you wish you could see the exact prompt sent to the model, – a teammate asks “why did the agent do that?” and you cannot answer.

When the second occurrence hits, port to LangChain or LangGraph. Not before.

9. Key takeaways

LangChain is a Lego set of building blocks for LLM applications, composable with the | operator (LCEL). It solves provider portability, tool wiring, streaming and observability.
LangGraph is the same team’s companion library for stateful workflows: loops, conditional branches, multi-agent collaboration, shared TypedDict state.
LangChain, LangGraph, LlamaIndex, AutoGen, CrewAI and Smolagents are all open source and free to use. The hosted services around them (LangSmith, LangGraph Cloud) are the paid commercial part.
This course stays hand-coded for pedagogical reasons: every byte sent to the model and every byte returned is visible in 30 lines of Python.
In a real project, switch to a framework the moment you hit any of the five walls in section 5 — and not earlier.

Next stop: the agentic AI landscape annex — what “Agentic AI” really means, the ReAct loop, and the broader framework ecosystem.