Skip to content

Tool calling: the bridge

Duration: 10 min Prerequisites: chapter 02

Tool calling is a simple protocol: we tell the model “here is a list of functions you can call”, and instead of generating text, it emits a JSON describing the call to make. Your Python code intercepts that JSON, runs the real function, hands the result back. Loop.


sequenceDiagram
  participant User as User
  participant Code as Python loop
  participant LLM as Ollama (LLM)
  participant Tool as Python tool

  User->>Code: user prompt
  Code->>LLM: messages + tool list
  LLM-->>Code: tool_calls = [{"name":"write_file"}]
  Code->>Tool: write_file(path="Main.java", content="...")
  Tool-->>Code: "File created or modified: Main.java"
  Code->>LLM: append result to messages
  LLM-->>Code: tool_calls = [{"name":"compile_java"}]
  Code->>Tool: javac via subprocess
  Tool-->>Code: "Compilation successful."
  Code->>LLM: append, re-ask
  LLM-->>Code: no tool_call, just text
  Code-->>User: final text
As long as the model returns tool_calls, we execute them. Otherwise we stop.

As long as the model returns tool_calls, we execute them and hand control back. When it stops emitting them (just a plain text message), we exit the loop.


The key point: nothing is described manually. The ollama-python SDK automatically builds the JSON schema from the Python signature and the docstring. From ollama-demo-3-agent-java/agent_java.py:

def write_file(path: str, content: str = "") -> str:
"""Create or overwrite a file in the workspace folder.
Args:
path: Relative path of the file to create, for example Main.java.
content: Full content to write to the file.
"""
# ... implementation ...

The SDK reads:

  • the name: write_file;
  • the typed arguments: path: str, content: str = "" (so content is optional);
  • the description from the docstring;
  • the description of each arg from the Args: section.

It then sends Ollama a JSON-schema that the model uses to format its answer. That’s why in ollama-demo-3-agent-java/agent_java.py you just see:

tools = [list_files, read_file, write_file, compile_java]
response = client.chat(
model=MODEL_NAME,
messages=messages,
tools=tools,
options={"num_ctx": 20480},
)

Four ordinary Python functions, passed as-is. No framework, no implicit decorator.

Concrete example: the full chain Python signature → JSON schema → tool_calls → invocation

For the write_file function above, here is what happens on the wire when the user asks “Create Main.java that prints Hello world.”

1. What the SDK generates from the Python signature and the docstring

{
"type": "function",
"function": {
"name": "write_file",
"description": "Create or overwrite a file in the workspace folder.",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "Relative path of the file to create, for example Main.java."},
"content": {"type": "string", "description": "Full content to write to the file."}
},
"required": ["path"]
}
}
}

2. What the model returns in response.message.tool_calls

{
"role": "assistant",
"content": "",
"tool_calls": [
{
"function": {
"name": "write_file",
"arguments": {
"path": "Main.java",
"content": "public class Main {\n public static void main(String[] args) {\n System.out.println(\"Hello world\");\n }\n}\n"
}
}
}
]
}

3. What the Python loop does with it

name = "write_file"
args = {"path": "Main.java", "content": "public class Main { ... }"}
result = write_file(**args)
# -> the file Main.java is actually created on disk;
# -> result is the string "OK: 96 bytes written to Main.java".

4. What gets appended to messages for the next turn

{"role": "tool", "tool_name": "write_file", "content": "OK: 96 bytes written to Main.java"}

The model sees this confirmation on its next turn, knows the action succeeded, and can now decide to compile, read the file again, or stop. The whole “agent” effect comes from this four-step ping-pong, repeated until the model emits a plain text answer with no tool_calls.


Here is the skeleton of the loop in ollama-demo-3-agent-java/agent_java.py (around line 486):

for step in range(1, MAX_STEPS + 1):
response = client.chat(
model=MODEL_NAME,
messages=messages,
tools=tools,
options={"num_ctx": 20480},
)
messages.append(response.message)
calls = list(iter_tool_calls(response.message))
if not calls:
break # The model is done, nothing left to do
for name, args in calls:
fn = available_functions.get(name)
if fn is None:
result = f"Unknown tool: {name}"
else:
try:
result = fn(**args)
except Exception as error:
result = f"Error while running the tool: {error}"
messages.append(
{"role": "tool", "tool_name": name, "content": str(result)}
)

Read it three times. The whole agent is there. Everything else — Streamlit UI, per-project isolation, collaborating agents — is decoration around this loop.

Each iteration produces:

  1. a question to the model with the full history;
  2. zero, one or several tool calls;
  3. the real execution of each tool;
  4. the append of the result to messages so the model sees it on the next turn.

MAX_STEPS = 10 prevents infinite loops: if the model isn’t done in 10 turns, we cut it off.


The model thinks. The tools act. The compiler verifies. The human validates.

If you can point each line at the loop code, you know what an agent is:

  • “the model thinks” → the call to client.chat(...);
  • “the tools act” → the for name, args in calls: result = fn(**args) loop;
  • “the compiler verifies” → the compile_java() tool that runs subprocess.run(["javac", ...]);
  • “the human validates” → that’s you watching the output and deciding whether to rerun.

Tool calling is not a feature of the raw LLM. It’s a learned behaviour that comes from fine-tuning. Meta trained Llama 3.1 on millions of examples like:

System: You have access to these tools: [...]
User: Compile my code.
Assistant: <tool_call>{"name":"compile_java"}</tool_call>

If you take a model that did not receive this fine-tuning (e.g. a raw Llama 2), it will not be able to generate these structured tool_calls. That’s the rabbit hole we go down in chapter 05b.


  • Tool calling = a protocol where the LLM outputs JSON, your code runs the real function.
  • ollama-python builds the schema automatically from your Python signatures + docstrings. No boilerplate.
  • The agent loop is ~30 lines: call model → run tools → recall model → … → stop when no more tool_calls.
  • Not every model can do this: you need a model fine-tuned for tool calling (chapter 05).
  • The model thinks, the tools act, the compiler verifies, the human validates.