Annex — LangChain & LangGraph in depth
Duration: ~25 min Prerequisites: chapter 04 — Ollama vs LangChain and chapter 09 — Demo 3: a simple agent
This annex extends the short comparison in chapter 04 with the depth you need to actually read other people’s tutorials and not feel lost. It assumes you already know what an LLM is (chapter 01) and what it cannot do on its own (chapter 02).
The companion annex, the agentic AI landscape, covers what an agent is, the ReAct loop, and the broader framework ecosystem (LlamaIndex, AutoGen, CrewAI, Smolagents…).
1. Where we come from
Section titled “1. Where we come from”By the end of demo 3 you have a working agent in about 30 useful lines, with every message visible and every tool call printable. The natural next question is: if we wrote everything by hand, what would change if we used LangChain or LangGraph instead?
This annex answers that question methodically. It defines each piece of vocabulary you will see in tutorials, it shows the same problem with and without a framework, and it gives you a decision rule for picking one over the other in a real project.
2. What LangChain is, in one paragraph
Section titled “2. What LangChain is, in one paragraph”LangChain is an open-source toolbox (Python and JavaScript, MIT-licensed) that gives you standardised building blocks for LLM applications. The most accurate mental image is a Lego set for AI applications: each piece has a well-defined shape, and the pieces snap together with one operator.
2.1. The blocks LangChain provides
Section titled “2.1. The blocks LangChain provides”| Block | What it does |
|---|---|
PromptTemplate | A reusable message template with variables |
ChatModel | A uniform interface over OpenAI, Anthropic, Ollama, Mistral, Groq… |
OutputParser | Turns the model’s free text into a typed Python object (Pydantic, JSON) |
Tool / @tool | A Python function exposed to the model as an action it can call |
Agent | An LLM + a list of tools + a built-in reasoning loop |
Retriever | Returns the documents most relevant to a question (the R in RAG) |
VectorStore | A database of embeddings (Pinecone, Chroma, FAISS, Weaviate…) |
Memory | A system for keeping conversation history across turns |
Runnable / LCEL | The pipe operator ` |
2.2. LCEL — the pipe operator
Section titled “2.2. LCEL — the pipe operator”LCEL stands for LangChain Expression Language. The whole idea fits in one character: |.
chain = prompt | llm | parserresult = chain.invoke({"question": "Hello?"})Read this as: send the input into prompt, its output into llm, its output into parser.
Every LCEL chain inherits four methods for free:
.invoke(input)— synchronous call.ainvoke(input)— asynchronous call.stream(input)— token-by-token streaming.batch(inputs)— batched calls
You write the chain once; you get sync, async, streaming and batching without a single extra line.
2.3. What LangChain genuinely buys you
Section titled “2.3. What LangChain genuinely buys you”- Provider portability. Swap
init_chat_model("openai:gpt-4o")forinit_chat_model("ollama:llama3.1:8b")and everything downstream keeps working. - Streaming and async for free, no SSE parsing by hand.
- Hundreds of integrations out of the box (Tavily, Pinecone, Chroma, SQL, Slack, GitHub, Notion, S3…).
- An agent in a few lines via
create_agent(model=..., tools=[...]).
2.4. What LangChain explicitly does not do
Section titled “2.4. What LangChain explicitly does not do”- It does not replace an LLM. You still need a provider (OpenAI, Anthropic, Ollama…).
- It does not guarantee the quality of the response — only the model is responsible for that.
- It does not orchestrate complex stateful workflows with loops, branches and shared state. That is exactly what LangGraph is for.
3. What LangGraph is, and why it is separate from LangChain
Section titled “3. What LangGraph is, and why it is separate from LangChain”LangGraph is a companion library built by the same team. Its central idea: represent your LLM application as a graph of states, not as a linear chain.
3.1. Chains vs graphs
Section titled “3.1. Chains vs graphs”flowchart LR subgraph LangChain["LangChain (chain mindset)"] direction LR A1[Prompt] --> B1[LLM] B1 --> C1[Parser] end
subgraph LangGraph["LangGraph (graph mindset)"] direction TB S([Start]) --> N1[Generate] N1 -->|"needs tool"| N2[Tool] N2 --> N1 N1 -->|"draft ready"| N3[Reflect] N3 -->|"improve"| N1 N3 -->|"good enough"| E([End]) endLangChain treats your application as a pipeline. LangGraph treats it as a state machine: nodes are functions, edges are transitions, and a shared state object travels through the graph.
3.2. What LangGraph specifically enables
Section titled “3.2. What LangGraph specifically enables”- Loops. Generate → critique → generate again until a quality threshold is met.
- Conditional edges. “If the question is about pricing, go to
sql_query; otherwise go toweb_search.” - Shared state. A
TypedDictflows through every node and accumulates messages, retrieved documents, scores, intermediate results. - Multi-agent orchestration. Several agents are individual nodes that hand off control to each other.
- Advanced patterns described in academic papers: Reflection, Reflexion, Plan-and-Execute, Adaptive RAG, Self-RAG, Corrective RAG.
3.3. When to use LangChain vs LangGraph
Section titled “3.3. When to use LangChain vs LangGraph”| You need… | Use |
|---|---|
| A linear pipeline (prompt → LLM → parser) | LangChain (LCEL) |
| A single agent with a list of tools | LangChain create_agent |
| A reflection or revision loop | LangGraph |
| A workflow with conditional branches and shared state | LangGraph |
| Several agents collaborating with handoff | LangGraph |
4. The vocabulary you will see in every tutorial
Section titled “4. The vocabulary you will see in every tutorial”This vocabulary is shared across almost every framework — LangChain, LlamaIndex, Haystack, AutoGen, CrewAI. Mastering it is enough to read any tutorial in the field.
4.1. Prompt and prompt template
Section titled “4.1. Prompt and prompt template”A prompt is the instruction sent to the LLM. A prompt template is a prompt with variables, for example:
“Summarise this text in
{language}in{n_sentences}sentences:{text}.”
4.2. Chain
Section titled “4.2. Chain”A chain is a sequence of steps such as prompt → llm → parser. In LangChain, it is written with the pipe |.
4.3. Tool
Section titled “4.3. Tool”A tool is a function the LLM can request to call. Typical examples:
search_web(query)returns web results.query_database(sql)returns rows from a database.send_email(to, body)sends a message.
You declare your tools, the LLM picks which one to call, and the framework or your own code executes it. This is exactly what we did by hand in demo 3 with list_files, read_file, write_file, compile_java.
4.4. Agent
Section titled “4.4. Agent”An agent is an LLM + tools + a loop. It decides which tools to call, in which order, until it can produce a final answer.
4.5. Memory
Section titled “4.5. Memory”Memory is the conversation history. Without it, each turn is isolated. Common kinds:
- Buffer memory keeps the last N messages.
- Summary memory summarises older turns to save tokens.
- Vector memory indexes past exchanges and retrieves only the relevant ones.
4.6. Embedding
Section titled “4.6. Embedding”An embedding is a numerical representation of a piece of text as a vector (typically 384, 768 or 1536 dimensions). Two texts with similar meanings have nearby vectors.
4.7. Vector store
Section titled “4.7. Vector store”A vector store is a database that stores embeddings and answers the question “give me the texts whose vectors are closest to this one”. Common implementations: Pinecone (managed), Chroma (local), FAISS (Facebook), Weaviate, Milvus, pgvector.
4.8. Retriever
Section titled “4.8. Retriever”A retriever is the component that combines a vector store with retrieval logic. You hand it a question, it hands you back the top k most relevant documents.
4.9. RAG — Retrieval-Augmented Generation
Section titled “4.9. RAG — Retrieval-Augmented Generation”RAG is the technique that combines a retriever with an LLM:
- The user asks a question.
- The retriever fetches the relevant documents from your data.
- Those documents are injected into the prompt as context.
- The LLM answers using the documents.
It is the standard way to avoid hallucinations on your private data, since the LLM was not trained on it.
4.10. Chunking
Section titled “4.10. Chunking”Chunking is the act of splitting documents into smaller pieces (typically 200 to 1000 tokens) before embedding them. The size of the chunks directly affects retrieval quality.
4.11. Streaming
Section titled “4.11. Streaming”Streaming means receiving the answer token by token as it is generated, rather than in one block at the end. This is what produces the “ChatGPT typing live” effect.
4.12. Tracing and observability
Section titled “4.12. Tracing and observability”Tracing is recording every step of an agent’s execution: prompts sent, tools called, outputs received, durations, costs. Indispensable for debugging anything beyond a toy example.
LangSmith is the official tracing and evaluation platform for the LangChain ecosystem. It is a commercial hosted service (with a free tier), in contrast to LangChain and LangGraph themselves which are MIT-licensed open source.
5. Without a framework, five problems you will face
Section titled “5. Without a framework, five problems you will face”If you decide to code everything yourself against the bare OpenAI, Anthropic or Ollama SDK (which is exactly what this course does for teaching purposes), you will hit five recurring walls. Some of them were already mentioned in chapter 02 — what an LLM cannot do, but here we look at them from the engineering angle, not the model angle.
5.1. Changing provider
Section titled “5.1. Changing provider”You code everything against the OpenAI API. Three months later, the team asks: “can we switch to Anthropic? Or to Mistral local?”
Without a framework:
- Every
openai.ChatCompletion.create(...)must be rewritten. - The message format differs slightly between providers.
- The tool-call format differs a lot between providers.
- You rewrite dozens of call sites.
Cost. Several days of work for a change that should be trivial.
5.2. Adding a tool
Section titled “5.2. Adding a tool”You want your bot to call a web search. Without a framework:
- You hand-write the JSON schema for the tool (~30 lines).
- You parse the model’s
tool_callsyourself. - You call the tool.
- You reformat the result as
{"role": "tool", "content": ...}. - You feed it back into the prompt.
- You repeat that recipe for every new tool.
Cost. A lot of repetitive plumbing and subtle bugs around argument parsing.
5.3. Adding streaming
Section titled “5.3. Adding streaming”You want the answer to appear progressively in the UI. Without a framework:
- You switch to
stream=True. - You receive raw SSE chunks.
- You parse and accumulate them, and handle network errors gracefully.
Cost. It works, but it is roughly 50 lines per endpoint instead of one.
5.4. Observing what happens in production
Section titled “5.4. Observing what happens in production”Two weeks after deployment, a bug arrives: “the agent answers badly”. Without a framework:
- You have no trace of the calls.
- You do not know which prompt was actually sent.
- You do not know how many iterations the agent did.
- You do not know how much each call cost.
Cost. Blind debugging and considerable wasted time.
5.5. Reusing across projects
Section titled “5.5. Reusing across projects”You build a RAG system for client A. Three months later, client B wants a similar one with a different vector store.
Without a framework:
- Everything was coupled to Pinecone.
- You rewrite half the code to switch to Chroma.
Cost. Each project starts from zero.
6. With a framework, the same five problems
Section titled “6. With a framework, the same five problems”6.1. Changing provider
Section titled “6.1. Changing provider”llm = init_chat_model("openai:gpt-4o")# one line changes:llm = init_chat_model("anthropic:claude-3-5-sonnet")Everything downstream keeps working.
6.2. Adding a tool
Section titled “6.2. Adding a tool”@tooldef search_web(query: str) -> str: """Search the web.""" return tavily.search(query)
agent = create_agent(model=llm, tools=[search_web])LangChain generates the JSON schema, parses the tool calls, formats the results. You write one Python function and you are done. This is the same primitive Ollama exposes when we write client.chat(model=..., tools=[search_web]) in demo 3 — except LangChain also handles the loop, retries and observability.
6.3. Streaming
Section titled “6.3. Streaming”for chunk in chain.stream({"question": "Hello"}): print(chunk, end="")One line. That is the whole thing.
6.4. Observability
Section titled “6.4. Observability”You set two environment variables:
LANGCHAIN_TRACING_V2=trueLANGCHAIN_API_KEY=...And the LangSmith dashboard automatically shows, for every call: the exact prompt sent, the response received, the tool calls, the duration, the cost, and the full trace tree.
6.5. Reusing across projects
Section titled “6.5. Reusing across projects”vectorstore = PineconeVectorStore(...)# or:vectorstore = Chroma(...)The rest of your RAG chain (retriever, prompt, LLM, parser) does not change.
6.6. The synthesis
Section titled “6.6. The synthesis”flowchart TD A["Without framework"] --> A1["Code coupled to one provider"] A1 --> A2["Heavy boilerplate"] A2 --> A3["No observability"] A3 --> A4["Hard to maintain"]
B["With LangChain / LangGraph"] --> B1["Provider-agnostic code"] B1 --> B2["Reusable blocks"] B2 --> B3["Built-in tracing"] B3 --> B4["Fast iteration"]7. Compact comparison table
Section titled “7. Compact comparison table”| Aspect | Bare Ollama API (our choice in this course) | LangChain (LCEL) | LangChain Agents | LangGraph |
|---|---|---|---|---|
| Level of abstraction | Very low | Medium | High | High but very controllable |
| Tool calling | Yes, you wire it manually | Yes, via bind_tools | Yes, native | Yes, native |
| Built-in reasoning loop | Hand-coded (our ~30 lines) | Possible, not idiomatic | Yes | Yes, explicit |
| Conditional branches | Hand-coded | Limited | Limited | Yes, conditional edges |
| Shared state across steps | Hand-coded | Limited (RunnablePassthrough) | Implicit (message list) | Yes, explicit TypedDict |
| Multi-agent | Hand-coded | Possible | Limited | Yes, idiomatic |
| Streaming for free | Hand-coded | Yes | Yes | Yes |
| Built-in tracing | None | Yes (LangSmith) | Yes | Yes |
| Learning curve | Low (you just call an API) | Low to medium | Medium | Medium to high |
| Best for | Learning, prototypes, ultra-simple scripts | Linear pipelines | Single agent with tools, few lines | Complex workflows, serious production |
8. When to stay hand-coded, when to switch
Section titled “8. When to stay hand-coded, when to switch”This section extends the same conversation we started at the end of chapter 09 — demo 3, with more depth.
8.1. Stay hand-coded when…
Section titled “8.1. Stay hand-coded when…”- You are teaching or learning an agent for the first time. Visibility on every byte beats line count.
- You target one provider and you only need a handful of tools.
- You want to control the prompt exactly, with no template you have not written yourself.
- Your project lives in a single file and a single afternoon.
8.2. Switch to LangChain when…
Section titled “8.2. Switch to LangChain when…”- You need to support several LLM providers in the same product (OpenAI, Anthropic, Ollama, a fine-tune in private cloud).
- You want streaming, retries, timeouts, caches without writing them.
- You want automatic observability via LangSmith — every call traced, costs counted, prompts archived.
- You build a RAG over hundreds or thousands of documents with a real vector store.
- Your team is already familiar with LangChain.
8.3. Switch to LangGraph when…
Section titled “8.3. Switch to LangGraph when…”- Your agent needs explicit loops (generate → critique → regenerate).
- The workflow has conditional branches based on the model’s output or external state.
- You want multiple cooperating agents with clean handoff (planner → coder → reviewer).
- You need to persist and resume an agent’s state across days (LangGraph has built-in checkpointing).
8.4. A practical heuristic
Section titled “8.4. A practical heuristic”Keep it hand-coded until any one of these things happens twice in a week: – you change provider, – you add a fifth tool, – you wish you could see the exact prompt sent to the model, – a teammate asks “why did the agent do that?” and you cannot answer.
When the second occurrence hits, port to LangChain or LangGraph. Not before.
9. Key takeaways
Section titled “9. Key takeaways”- LangChain is a Lego set of building blocks for LLM applications, composable with the
|operator (LCEL). It solves provider portability, tool wiring, streaming and observability. - LangGraph is the same team’s companion library for stateful workflows: loops, conditional branches, multi-agent collaboration, shared
TypedDictstate. - LangChain, LangGraph, LlamaIndex, AutoGen, CrewAI and Smolagents are all open source and free to use. The hosted services around them (LangSmith, LangGraph Cloud) are the paid commercial part.
- This course stays hand-coded for pedagogical reasons: every byte sent to the model and every byte returned is visible in 30 lines of Python.
- In a real project, switch to a framework the moment you hit any of the five walls in section 5 — and not earlier.
Next stop: the agentic AI landscape annex — what “Agentic AI” really means, the ReAct loop, and the broader framework ecosystem.