Demo 4 — three collaborating agents

Duration: 20 min Prerequisites: chapter 09 (you’ve understood the simple agent loop)

Source code

Repo: gneuroneai/ollama-demo-4-trio-agents-java — Streamlit UI + 3 agents + JUnit 5, ~620 lines in app.py.

git clone https://github.com/gneuroneai/ollama-demo-4-trio-agents-java.git
cd ollama-demo-4-trio-agents-java
.\start.ps1

What this demo is about

This project reuses the agent loop of demo 3 three times, with three different system prompts, to form a small multi-agent application served by a Streamlit interface. The first agent (Generate) writes a Java program from a natural-language instruction; the second (Verify) re-reads the produced files and reports anomalies; the third (Tests) generates JUnit 5 test classes and runs them against the program. Each agent works on its own isolated project folder, which allows side-by-side comparison of the same prompt under different prompts and configurations. The system prompts of the three agents are exposed in editable expanders, so that the effect of a prompt change on the agent’s behaviour can be observed at runtime without modifying any Python code.

Key idea

This repository takes the same loop as demo 3, reuses it three times with different system prompts (Generate, Verify, Tests), wraps it in a Streamlit interface with one tab per agent, and isolates each project in its own folder so they can be compared.

Launch

cd ollama-demo-4-trio-agents-java
.\start.ps1

The browser opens at http://localhost:8502. You see:

a sidebar with 8 ready-to-use Java prompts (banking, tictactoe, library, calculator, gradebook, …) + a custom slot;
three tabs: Generate Java, Verify (review & fix), Unit tests (JUnit 5).

Why three agents instead of one big one

This is the central pedagogical theme of demo 4. We could have stuffed it all into one agent with a huge system prompt: “generate Java, then verify, then write JUnit tests”. Bad idea:

the system prompt becomes unreadable (3 or 4 KB);
the 8B model loses track in long prompts with multiple contexts;
impossible to rerun just the test phase if it failed;
impossible to visualise cleanly which phase did what.

The solution: one agent per responsibility, like you’d do with classes in an OO project.

Agent	System prompt	Tools	Role
Generate	`SYSTEM_PROMPT`	`list_files`, `read_file`, `write_file`, `compile_java`	Creates the project from scratch. Resets the workspace.
Verify	`VERIFY_SYSTEM_PROMPT`	same 4 tools	Reads, compiles, runs, fixes. Does NOT reset.
Tests	`TEST_SYSTEM_PROMPT`	+ `compile_with_tests`, `run_tests`	Writes `XTest.java` for each non-Main class, compiles with JUnit, runs tests.

Each agent has its own build_*_bootstrap() that computes a user message from the current workspace (class list, source code, compile status). Nothing hidden: just file reading + a text format.

Architecture in a diagram

flowchart TB
  Sidebar["Sidebar: pick a prompt"]
  State[("st.session_state.project_id")]
  Sidebar --> State
  State --> Tab1["Tab 1: Generate"]
  State --> Tab2["Tab 2: Verify"]
  State --> Tab3["Tab 3: Tests"]
  Tab1 --> Loop1["run_agent SYSTEM_PROMPT"]
  Tab2 --> Loop2["run_verifier VERIFY_SYSTEM_PROMPT"]
  Tab3 --> Loop3["run_test_generator TEST_SYSTEM_PROMPT"]
  Loop1 --> Folder[("workspace/&lt;project_id&gt;/")]
  Loop2 --> Folder
  Loop3 --> Folder
  Folder --> Log[("workspace/&lt;project_id&gt;/run.log")]

Three agents share one per-project workspace folder, which acts as a communication bus.

The three agents share the same workspace/<project_id>/ folder. When the Generate agent writes Account.java, the Verify agent sees it, the Tests agent sees it. The filesystem is the communication bus.

Tab 1 — Generate Java

The simplest tab. It’s demo 3 wrapped in a Streamlit UI.

At the top: the user prompt (editable in a text_area).
Below the prompt (since the pedagogical update): a System prompt expander that shows SYSTEM_PROMPT and lets you edit it.
A Run agent button.
During execution: an st.status that shows each step live (tool called, result).
At the end: 5 metrics (turns, tool calls, files written, LOC, wall time), compile status, output of java Main.

Note: if the workspace already contains code (you’ve run the demo once for this project), a yellow banner warns you: “running the agent will overwrite”.

Tab 2 — Verify (review & fix)

This agent creates nothing: it reads what the Generate agent produced, checks it compiles and that java Main runs without crashing, scans private fields and replaces them with public (project convention), and fixes only what’s broken.

The verifier bootstrap (computed by build_verifier_bootstrap()) injects into the user message:

the list of Java files present;
the result of an initial compile_java;
the return of an initial java Main (rc=0 or not);
the list of files containing private.

So the model sees, before its first tool_call, exactly where the project stands. This is what we call a rich bootstrap: we don’t let the model figure out the context alone, we put it on the table.

The UI now has an expander “User message that will be sent (auto-generated from detected classes — read-only)” with a Compute preview button that shows you exactly what the agent will see. That’s the pedagogical answer to “how does the agent know which classes to verify?” — it knows because we wrote it in the user message, in nearly-natural English produced by 30 lines of Python.

Tab 3 — Unit tests (JUnit 5)

The most demanding agent. It:

lists non-Main classes (Account.java, Bank.java, Transaction.java);
inlines the source code of each class in the user message, so the model sees the exact constructor signatures and the names of public fields;
asks the model to write AccountTest.java, BankTest.java, TransactionTest.java;
compiles everything with junit-platform-console-standalone-1.10.2.jar on the classpath;
runs the tests via the JUnit ConsoleLauncher;
parses the output to extract tests found / passed / failed.

The bootstrap (computed by build_test_bootstrap()) includes the full Java sources of the classes to test. That’s where you see why num_ctx=20480 matters: on a banking project, that user message can be 4-5 KB.

The UI also has an editable system prompt + a user-message preview (Java sources included). You can inspect it before launching.

Per-project isolation

Before isolation, running tictactoe after banking overwrote banking. Now each prompt has its own subfolder:

workspace/
  banking/
    Account.java
    Bank.java
    Main.java
    Transaction.java
    AccountTest.java
    BankTest.java
    TransactionTest.java
    run.log              <-- persistent append-only per-run log
  tictactoe/
    Board.java
    RandomAI.java
    Main.java
    BoardTest.java
    RandomAITest.java
    run.log
  custom/
    ...

Switching prompt in the sidebar = switching active folder. Other projects stay untouched. The run.logs are in the Previous run logs expander of each tab — you can re-read what happened between two sessions.

The implementation fits in a set_project(project_id) function in agent.py that reassigns the global WORKSPACE. All tool logic reads WORKSPACE at call time, so the switch is transparent.

How it’s wired on the Streamlit side

The app lives in app.py (~620 lines). Structure:

import agent
from agent import (
    SYSTEM_PROMPT, VERIFY_SYSTEM_PROMPT, TEST_SYSTEM_PROMPT,
    build_test_bootstrap, build_verifier_bootstrap,
    run_agent, run_verifier, run_test_generator,
    set_project,
)

class StreamlitUI(AgentUI):
    """Adapts the engine's AgentUI to Streamlit components (st.status, etc.)."""

# 3 tabs: tab_gen, tab_verify, tab_tests
with tab_gen:
    sys_p = system_prompt_editor(SYSTEM_PROMPT, "gen_sys_prompt", "System prompt")
    if run_clicked:
        stats = run_agent(entry, ui=ui, project_id=project_id, system_prompt=sys_p)

with tab_verify:
    sys_p = system_prompt_editor(VERIFY_SYSTEM_PROMPT, "verify_sys_prompt", "System prompt")
    # ... optional preview ...
    if run_verify_clicked:
        vstats = run_verifier(main_class_v, ui=ui, project_id=project_id, system_prompt=sys_p)

with tab_tests:
    sys_p = system_prompt_editor(TEST_SYSTEM_PROMPT, "tests_sys_prompt", "System prompt")
    # ... preview ...
    if run_tests_clicked:
        tstats = run_test_generator(main_class_t, ui=ui, project_id=project_id, system_prompt=sys_p)

The Streamlit UI is not the agent logic. It’s just the adapter between the loops in agent.py and the visual components (st.status, st.metric, st.code). If you swap Streamlit for Flask or Tkinter tomorrow, the agent loop doesn’t change.

What you should be able to explain to a teacher

“Why 3 agents instead of 1?” → separation of responsibilities, shorter prompts, ability to rerun one phase, per-tab visualisation.
“Where do the classes to test come from in tab 3?” → from WORKSPACE.glob("*.java"), computed in Python, injected in the user message via build_test_bootstrap().
“Why can the user edit the prompts?” → for pedagogy. See chapter 11, which shows the live demo you can do in front of an audience.

Key takeaways

3 agents = 3 responsibilities. Same loop, different system prompts, possibly different tools.
The filesystem (workspace/<project_id>/) serves as the communication bus between agents.
build_*_bootstrap() computes the user message to send in Python; that’s where we detect classes automatically.
The Streamlit UI is a visual adapter; the logic stays in chapter 03’s agent loop.
Per-project isolation lets you chain demos without breaking anything.