Source code
Repo: gneuroneai/ollama-demo-3-agent-java — one agent_java.py file, ~30 useful lines of agent loop.
git clone https://github.com/gneuroneai/ollama-demo-3-agent-java.gitcd ollama-demo-3-agent-java.\start.ps1Duration: 15 min Prerequisites: chapter 06 (environment installed)
Source code
Repo: gneuroneai/ollama-demo-3-agent-java — one agent_java.py file, ~30 useful lines of agent loop.
git clone https://github.com/gneuroneai/ollama-demo-3-agent-java.gitcd ollama-demo-3-agent-java.\start.ps1This project is a self-contained agent loop of about thirty useful lines, written in a single Python file agent_java.py. It exposes four Python functions to the language model as tools (list_files, read_file, write_file, compile_java) and lets the model invoke them freely to create and compile a small Java console program from a natural-language instruction. Each tool call is logged to the terminal as it happens, which makes the tool calling mechanism introduced in chapter 03 directly observable. The output of a successful run is an actual Java project on disk — source files, compiled classes, console output — produced step by step by the model under the supervision of the Python loop.
This repository is the practical core of the course: one Python file, four tools, one loop. Running it produces, step by step, a small Java console application generated from a natural-language instruction.
You run:
cd ollama-demo-3-agent-java.\.venv\Scripts\activatepython agent_java.pyThe script:
Typical terminal output:
--- Step 1 ---[tool] write_file -> Product.java File created or modified: Product.java (28 lines)
--- Step 2 ---[tool] write_file -> ProductManager.java File created or modified: ProductManager.java (35 lines)
--- Step 3 ---[tool] write_file -> Main.java File created or modified: Main.java (15 lines)
--- Step 4 ---[tool] compile_java Compilation successful.
--- Step 5 ---Model: I created Product, ProductManager and Main, and compilation succeeded.(no tool calls, agent stops)Four steps to generate a compilable Java project. Everything lands in ollama-demo-3-agent-java/workspace/.
| Tool | Line in agent_java.py | What it does |
|---|---|---|
list_files | ~153 | Lists files in workspace/. Useful for the model to know where it stands. |
read_file(path) | ~165 | Reads a file already written. Useful when fixing an error. |
write_file(path, content) | ~230 | Creates or overwrites a file in workspace/. Filters by extension. |
compile_java() | ~259 | Runs javac -encoding UTF-8 *.java via subprocess. Returns success or errors. |
flowchart LR
Model["llama3.1:8b"]
subgraph tools [4 Python tools]
T1["list_files"]
T2["read_file"]
T3["write_file"]
T4["compile_java"]
end
World[("workspace/")]
Javac[("javac")]
Model -->|"tool_calls"| tools
T1 --> World
T2 --> World
T3 --> World
T4 --> Javac
Javac --> World
tools -->|"text result"| Model
That’s all. The model can do nothing else. No network, no deletion, no arbitrary execution.
A common confusion in class: the four tools are not built into Ollama, into the model, or into any external library. They are our own Python code, written by hand in agent_java.py between lines 153 and 282.
What is ours (agent_java.py) | What comes from outside |
|---|---|
| The four tool functions themselves (~130 lines of Python) | The ollama Python SDK (one pip install ollama, ~80 KB) — talks to the local Ollama daemon over HTTP on 127.0.0.1:11434 |
The tools = [list_files, read_file, write_file, compile_java] registration list | Python standard library: pathlib, json, os, re, subprocess, sys, time |
| The system prompt (lines 39 – 72) | The model weights pulled by ollama pull llama3.1:8b (~4.9 GB on disk) |
| The agent loop (~30 lines, from line 486) | The javac compiler from the JDK, invoked by compile_java |
The fallback parser parse_pseudo_tool_calls |
A single line near the top of agent_java.py wires everything together:
tools = [list_files, read_file, write_file, compile_java]
available_functions = { "list_files": list_files, "read_file": read_file, "write_file": write_file, "compile_java": compile_java,}When Python passes those four function references to client.chat(tools=...), the Ollama SDK automatically generates the JSON tool schema (name, parameters, types, description) by reading the function’s type hints and docstring. That JSON schema is what gets sent to the model. The Python code itself is never sent — the model only knows that a tool called write_file exists and accepts a path: str and a content: str.
The full chain on a single tool call:
tool_calls entry.tool_calls, looks up the matching Python function in available_functions, and calls it with the model-provided arguments."tool", so the model can react on the next turn.That is everything. No framework. No decorator. No central registry. If you want to add a fifth tool — say git_status() — you write a Python function, append it to the tools list and to available_functions, and the model will start using it on the next run.
The previous section showed that the tools are our code. The natural follow-up: if we write the tools, what does the model actually contribute, and what does Ollama add in the middle? This section answers it directly. It is the heart of why llama3.1:8b works cleanly on this demo and qwen2.5-coder:7b does not — even though both models declare the same Capabilities: tools on the Ollama library.
Three layers of code and data cooperate to make client.chat(model=..., tools=[...]) work.
| Layer | What it is | Who wrote it | Where it lives |
|---|---|---|---|
| 1. Our code | The 4 Python functions, the tools list, the loop, the fallback parser | Us, in agent_java.py | One file, ~700 lines total, ~30 lines for the loop itself |
| 2. The Ollama runtime | Chat template, JSON-schema generation from Python type hints, prompt formatting, and the extraction of tool_calls from the model’s raw output | The Ollama team | The ollama daemon listening on 127.0.0.1:11434 |
| 3. The model weights | Billions of numbers that were trained to recognise the system prompt + tool schema, and to emit a tool call in a specific format when appropriate | Meta (Llama), Alibaba (Qwen), Google (Gemma)… during fine-tuning | The .gguf file pulled by ollama pull |
Each layer is necessary. Each one is independent. Misalignment between any two of them is what makes a model “declare tools” but fail to use them.
When your Python code calls client.chat(model="llama3.1:8b", tools=[list_files, read_file, write_file, compile_java]), four things happen in the Ollama runtime without you seeing them:
Schema generation. The SDK reads each Python function’s type hints and docstring and turns it into a JSON tool schema (name, parameter names, types, descriptions). That schema is what gets sent to the model — not the Python code itself.
Chat-template injection. Every model in Ollama ships with a chat template (a small Go template that formats the conversation into the exact byte sequence the model was trained to consume). The template for a tool-calling model has a {{ if .Tools }}...{{ end }} block that knows where in the prompt the tool schemas must appear, and in which format that specific model family expects them.
Model invocation. The formatted byte sequence is fed to the model, which generates a response token by token.
Tool-call extraction. As the response streams back, Ollama watches for the special tokens or string patterns that the model was trained to emit when it wants to call a tool. If those patterns are detected, the matching substring is lifted out of the raw text and placed in the structured field response.message.tool_calls. Anything that does not match stays in response.message.content.
That fourth step is the one that succeeds with llama3.1:8b and fails with qwen2.5-coder:7b. It deserves its own paragraph.
During fine-tuning, the model is shown thousands of conversations that look roughly like this (schematically — each family uses its own special tokens):
User: "What's the weather in Paris?"Assistant: <special_tool_call_start> {"name": "get_weather", "arguments": {"location": "Paris"}}<special_tool_call_end>Tool: {"temperature": 22, "conditions": "sunny"}Assistant: "It's sunny and 22°C in Paris."Through this kind of training, the model learns three things at once:
The “special tokens” are model-family-specific:
<|python_tag|> and <|eom_id|> around tool-call JSON.<tool_call> … </tool_call>.The match between the special tokens the model emits and the chat template Ollama uses to parse them back is what makes a model “good at tool calling”. If the model emits something close but not identical to what the template expects, the parser fails to lift it into tool_calls, and the JSON ends up in message.content instead. That is exactly the failure mode of qwen2.5-coder:7b on this demo.
When you visit a model’s page on the Ollama library (for example ollama.com/library/llama3.1 or ollama.com/library/qwen2.5-coder), you see a row of badges. One of them may say Capabilities: tools.
That badge means: “the model card declares that this model was fine-tuned for tool calling.” It is a necessary condition, not a sufficient one. The badge is filled in by the publisher of the model; it is not the result of an automated benchmark Ollama runs.
You can see the same declaration locally:
ollama show llama3.1:8b# ...# Capabilities completion tools
ollama show qwen2.5-coder:7b# ...# Capabilities completion insert toolsBoth models declare tools. Yet only llama3.1:8b populates the structured tool_calls field on our demo. The badge does not guarantee that the model’s emitted format matches what Ollama’s chat template for that family expects on every kind of prompt.
llama3.1:8b works cleanly and qwen2.5-coder:7b does notThis is the explicit answer to the question that opens this section.
| Aspect | llama3.1:8b | qwen2.5-coder:7b |
|---|---|---|
| Base family | Meta’s Llama 3.1, trained directly on tool-calling examples | Alibaba’s Qwen 2.5, then fine-tuned for code generation |
| Special-token format learned | Llama 3.1 tool-call tokens — matches Ollama’s Llama chat template byte-for-byte | Qwen 2.5 tool-call tokens — but drifted by the code-specialised fine-tune, so the format the model now emits is not the one Ollama’s Qwen chat template expects |
| What the model emits on our demo prompt | The Llama 3.1 special tokens with valid JSON inside | A JSON-shaped string inside the regular message.content, sometimes with embedded Java strings that break json.loads |
| What Ollama’s chat-template parser does with it | Lifts it into response.message.tool_calls | Leaves it in response.message.content — the parser does not recognise it |
| What our Python code sees | response.message.tool_calls = [<4 structured calls>], response.message.content = "" | response.message.tool_calls = [], response.message.content = "{...JSON...}" |
| What our code does about it | Iterates tool_calls directly | Calls parse_pseudo_tool_calls() on content to extract the calls anyway |
In one sentence: qwen2.5-coder:7b was retrained to be excellent at writing code, and that retraining slightly drifted the tool-call output away from the format Ollama’s Qwen chat template expects. Our fallback parser exists precisely to bridge that gap — see the journal of attempts in the demo README, attempts 5 to 7.
Three short commands cover everything in this section.
# 1. See what capabilities a model declaresollama show llama3.1:8bollama show qwen2.5-coder:7b
# 2. See the exact chat template used for that modelollama show --modelfile llama3.1:8b | Select-String -Pattern 'TEMPLATE','TOOL'
# 3. Hit the API directly and see what the structured field actually containscurl.exe -s http://127.0.0.1:11434/api/chat -d '{ "model": "qwen2.5-coder:7b", "stream": false, "messages": [{"role": "user", "content": "What time is it in Paris?"}], "tools": [{"type":"function","function":{"name":"get_time","description":"Get current time","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}]}' | ConvertFrom-Json | Select-Object -ExpandProperty messageOn qwen2.5-coder:7b you will typically see the JSON of the tool call inside the content field and an empty tool_calls array. On llama3.1:8b, the opposite: content is mostly empty and tool_calls holds the structured call.
The model brings the trained capability to emit tool calls in a specific format. Ollama brings the chat template, the schema generation, and the extraction of structured
tool_callsfrom the model’s raw output. We bring the Python functions, the agent loop, and — when the model’s format drifts from what Ollama’s parser expects — a fallback parser to bridge the gap.
A natural follow-up question: why write the loop ourselves when frameworks exist for that? The answer has two parts — a clarification about what is open source and what is commercial, then a pedagogical trade-off.
| Tool / library | License | Cost of the library itself | Paid services around it? |
|---|---|---|---|
| Ollama (local runtime) | MIT — open source | Free | None |
The course demo loop (agent_java.py) | MIT — open source | Free | None |
| LangChain | MIT — open source | Free | LangSmith (observability), LangGraph Platform (hosted deployment) — both optional |
| LangGraph | MIT — open source | Free | LangGraph Platform — optional |
| OpenWebUI | MIT — open source | Free | None |
| Continue (VS Code extension) | Apache 2.0 — open source | Free | None |
| OpenCode (CLI agent) | MIT — open source | Free | None |
| OpenAI function calling | Proprietary | Pay per token | The whole API is paid |
| Anthropic tool use (Claude) | Proprietary | Pay per token | The whole API is paid |
LangChain, LangGraph, OpenWebUI, Continue and OpenCode are all open source and free. What is commercial in that ecosystem are the hosted services around them (LangSmith for telemetry, LangGraph Platform for hosted agents, OpenAI’s and Anthropic’s cloud APIs) — not the libraries themselves. So the question is not “free vs paid”, it is “hand-coded vs framework”.
If we rewrote demo 3 in LangChain + LangGraph, the result would look roughly like this:
from langchain_ollama import ChatOllamafrom langchain_core.tools import toolfrom langgraph.prebuilt import create_react_agent
@tooldef write_file(path: str, content: str) -> str: """Create or overwrite a file in workspace/.""" # ... same body as our write_file ...
@tooldef compile_java() -> str: """Compile every Java file in workspace/.""" # ... same body as our compile_java ...
llm = ChatOllama(model="llama3.1:8b", num_ctx=20480)agent = create_react_agent( llm, tools=[list_files, read_file, write_file, compile_java],)result = agent.invoke({"messages": [("user", DEFAULT_USER_PROMPT)]})That is fewer lines in the calling code, but it pulls in two extra dependencies (langchain, langgraph), each with a tree of 30 – 50 transitive packages, and it hides the loop behind agent.invoke. You no longer see when the model is called, when tool_calls are parsed, when the results are fed back. You also lose the ability to plug in our fallback parser parse_pseudo_tool_calls cleanly — LangChain’s tool-calling expects the structured channel and falls back to its own internal logic when the model misbehaves.
| Criterion | Hand-coded (our choice) | LangChain + LangGraph |
|---|---|---|
| Lines of code to read to understand the agent | ~30 | A framework’s worth of indirection |
| External dependencies | ollama only | langchain, langgraph, plus dozens of transitive |
| Visibility of the loop | Every step printed and explainable | Hidden behind agent.invoke |
Handling quirky models (e.g. qwen2.5-coder:7b fallback parser) | Trivial — edit one helper | Requires diving into LangChain parser internals |
| Pedagogical value | High — students step through every line | Lower — students learn a framework, not the protocol |
| Production realism on large agents | Lower — you would add structure for big projects | Higher — battle-tested across many production agents |
| Cost to add a tool | Add a Python function to two lists | Add an @tool-decorated function |
The course’s goal is to understand what an agent is. A 30-line loop that prints what it does at every step is the right pedagogical artefact for that goal. Once a student has read those 30 lines and run them on their laptop, the leap to LangChain is small: they already know what the framework is doing under the hood. The reverse is not true — starting with LangChain often leaves students unable to explain what agent.invoke actually does.
In a real codebase, the right question is “is the marginal benefit of the framework greater than the cost of the dependency and the loss of control?”. Honest signals that you should switch:
For the workshop, none of those apply. A 30-line loop is the right tool. The course keeps it. Demo 4 (chapter 10) reuses exactly the same loop to drive three specialised agents in sequence, which proves the hand-coded approach scales further than people expect.
The loop lives in agent_java.py from line 486:
for step in range(1, MAX_STEPS + 1): stats["turns"] = step ui.step_start(step, time.monotonic() - started)
response = client.chat( model=MODEL_NAME, messages=messages, tools=tools, options={"num_ctx": 20480}, ) messages.append(response.message)
if response.message.content: ui.model_message(response.message.content)
calls = list(iter_tool_calls(response.message)) if not calls: break
for name, args in calls: stats["tool_calls"] += 1 ui.tool_call(name, args)
fn = available_functions.get(name) if fn is None: result = f"Unknown tool: {name}" else: try: result = fn(**args) except Exception as error: result = f"Error while running the tool: {error}"
ui.tool_result(name, result) messages.append( {"role": "tool", "tool_name": name, "content": str(result)} )Line by line:
response = client.chat(...): we send the whole history to the model, plus the tool list. The SDK handles the JSON schema.messages.append(response.message): we append the model’s reply (including any tool_calls) to the history. That’s what gives the agent “memory”.calls = list(iter_tool_calls(...)): extract tool calls. iter_tool_calls is a helper that handles two cases: real tool_calls (Llama 3.1’s structured channel) and the pseudo tool_calls Qwen slips into message.content (fallback).if not calls: break: if the model didn’t call any tool, it’s done. We exit.fn(**args): we call the real Python function with the arguments the model gave us. This is where the bridge to the real world materialises.messages.append({"role": "tool", ...}): we hand the result back to the model so it can take it into account next turn.MAX_STEPS = 10 is the safety net: in case of infinite loop, we cut it.
The agent loop runs at most MAX_STEPS iterations before it is forced to stop. Each iteration is one round of model_reply → tool_calls → tool_results → next model_reply. The right value depends on the task complexity.
MAX_STEPS | Typical behaviour on the “small Java console app” task | Risk profile |
|---|---|---|
| 2 | The model writes one file then is cut off before compiling and fixing errors. Demo fails. | Too tight — no recovery room. |
| 5 | Often enough for a one-class app. Two files + compile + fix = 4–5 turns. | Acceptable for very simple cases. |
| 10 (course default) | Comfortable for the canonical demo (1 main + 1 helper class + compile + fix once). | Sweet spot for an 8B model on small tasks. |
| 30 | Allows multi-class iteration, several compile/fix loops, exploration. | The model may start looping if the prompt is ambiguous (rereads the same file 5 times). |
| 100+ | Unbounded exploration. | A misaligned model can burn tokens running in circles for half an hour. Always pair with a time-out. |
MAX_STEPS = 10for step in range(1, MAX_STEPS + 1): response = client.chat( model=MODEL_NAME, messages=messages, tools=tools, ) calls = list(iter_tool_calls(response)) if not calls: break for call in calls: result = run_tool(call) messages.append({"role": "tool", "content": str(result)})Two operational rules:
MAX_STEPS is a safety net, not a feature. A well-formed task should converge in 3–7 turns. If the agent regularly hits the ceiling, the system prompt or the toolset has a problem — raising the ceiling only postpones the failure.MAX_STEPS with a wall-clock timeout (e.g. 5 minutes total). A 10-step loop with a model stuck on a very long generation can still take 20 minutes without ever incrementing step.The system prompt lives on lines 39-72 of agent_java.py. It does two things:
private, add the java.util.* imports, write a complete file every time (never ...), don’t create empty stubs.Why so many rules? Because llama3.1:8b is an 8-billion-parameter model: good enough to follow clear instructions, not enough to guess project conventions. Each rule added fixes a real bug observed during demo development.
That’s the topic of chapter 11.
Run the demo with screen sharing. Ask students to point at the terminal:
--- Step N --- line that starts, just before the first [tool].[tool] write_file -> Foo.java line (what it asked for). The reality: the File created or modified: Foo.java (N lines) line (what your code did).compile_java is appended to messages, the model sees the error on the next turn, calls read_file and write_file to fix.Test the generated program:
cd workspacejava Maincd ..You should see the product list and the total stock value.
Edit the DEFAULT_USER_PROMPT in agent_java.py (line ~75) to ask for something else than product management. Ideas:
Account and Bank;FizzBuzz class (don’t dump everything into Main);Calculator with four static methods.Empty the workspace first:
Remove-Item workspace\*.java, workspace\*.class -Forcepython agent_java.pyWatch: which tools get called? How many turns? If compilation fails, does the model fix it on its own?