Source code
Repo: gneuroneai/ollama-demo-4-trio-agents-java — Streamlit UI + 3 agents + JUnit 5, ~620 lines in app.py.
git clone https://github.com/gneuroneai/ollama-demo-4-trio-agents-java.gitcd ollama-demo-4-trio-agents-java.\start.ps1Duration: 20 min Prerequisites: chapter 09 (you’ve understood the simple agent loop)
Source code
Repo: gneuroneai/ollama-demo-4-trio-agents-java — Streamlit UI + 3 agents + JUnit 5, ~620 lines in app.py.
git clone https://github.com/gneuroneai/ollama-demo-4-trio-agents-java.gitcd ollama-demo-4-trio-agents-java.\start.ps1This project reuses the agent loop of demo 3 three times, with three different system prompts, to form a small multi-agent application served by a Streamlit interface. The first agent (Generate) writes a Java program from a natural-language instruction; the second (Verify) re-reads the produced files and reports anomalies; the third (Tests) generates JUnit 5 test classes and runs them against the program. Each agent works on its own isolated project folder, which allows side-by-side comparison of the same prompt under different prompts and configurations. The system prompts of the three agents are exposed in editable expanders, so that the effect of a prompt change on the agent’s behaviour can be observed at runtime without modifying any Python code.
This repository takes the same loop as demo 3, reuses it three times with different system prompts (Generate, Verify, Tests), wraps it in a Streamlit interface with one tab per agent, and isolates each project in its own folder so they can be compared.
cd ollama-demo-4-trio-agents-java.\start.ps1The browser opens at http://localhost:8502. You see:
banking, tictactoe, library, calculator, gradebook, …) + a custom slot;This is the central pedagogical theme of demo 4. We could have stuffed it all into one agent with a huge system prompt: “generate Java, then verify, then write JUnit tests”. Bad idea:
The solution: one agent per responsibility, like you’d do with classes in an OO project.
| Agent | System prompt | Tools | Role |
|---|---|---|---|
| Generate | SYSTEM_PROMPT | list_files, read_file, write_file, compile_java | Creates the project from scratch. Resets the workspace. |
| Verify | VERIFY_SYSTEM_PROMPT | same 4 tools | Reads, compiles, runs, fixes. Does NOT reset. |
| Tests | TEST_SYSTEM_PROMPT | + compile_with_tests, run_tests | Writes XTest.java for each non-Main class, compiles with JUnit, runs tests. |
Each agent has its own build_*_bootstrap() that computes a user message from the current workspace (class list, source code, compile status). Nothing hidden: just file reading + a text format.
flowchart TB
Sidebar["Sidebar: pick a prompt"]
State[("st.session_state.project_id")]
Sidebar --> State
State --> Tab1["Tab 1: Generate"]
State --> Tab2["Tab 2: Verify"]
State --> Tab3["Tab 3: Tests"]
Tab1 --> Loop1["run_agent SYSTEM_PROMPT"]
Tab2 --> Loop2["run_verifier VERIFY_SYSTEM_PROMPT"]
Tab3 --> Loop3["run_test_generator TEST_SYSTEM_PROMPT"]
Loop1 --> Folder[("workspace/<project_id>/")]
Loop2 --> Folder
Loop3 --> Folder
Folder --> Log[("workspace/<project_id>/run.log")]
The three agents share the same workspace/<project_id>/ folder. When the Generate agent writes Account.java, the Verify agent sees it, the Tests agent sees it. The filesystem is the communication bus.
The simplest tab. It’s demo 3 wrapped in a Streamlit UI.
text_area).SYSTEM_PROMPT and lets you edit it.st.status that shows each step live (tool called, result).java Main.Note: if the workspace already contains code (you’ve run the demo once for this project), a yellow banner warns you: “running the agent will overwrite”.
This agent creates nothing: it reads what the Generate agent produced, checks it compiles and that java Main runs without crashing, scans private fields and replaces them with public (project convention), and fixes only what’s broken.
The verifier bootstrap (computed by build_verifier_bootstrap()) injects into the user message:
compile_java;java Main (rc=0 or not);private.So the model sees, before its first tool_call, exactly where the project stands. This is what we call a rich bootstrap: we don’t let the model figure out the context alone, we put it on the table.
The UI now has an expander “User message that will be sent (auto-generated from detected classes — read-only)” with a Compute preview button that shows you exactly what the agent will see. That’s the pedagogical answer to “how does the agent know which classes to verify?” — it knows because we wrote it in the user message, in nearly-natural English produced by 30 lines of Python.
The most demanding agent. It:
Account.java, Bank.java, Transaction.java);AccountTest.java, BankTest.java, TransactionTest.java;junit-platform-console-standalone-1.10.2.jar on the classpath;tests found / passed / failed.The bootstrap (computed by build_test_bootstrap()) includes the full Java sources of the classes to test. That’s where you see why num_ctx=20480 matters: on a banking project, that user message can be 4-5 KB.
The UI also has an editable system prompt + a user-message preview (Java sources included). You can inspect it before launching.
Before isolation, running tictactoe after banking overwrote banking. Now each prompt has its own subfolder:
workspace/ banking/ Account.java Bank.java Main.java Transaction.java AccountTest.java BankTest.java TransactionTest.java run.log <-- persistent append-only per-run log tictactoe/ Board.java RandomAI.java Main.java BoardTest.java RandomAITest.java run.log custom/ ...Switching prompt in the sidebar = switching active folder. Other projects stay untouched. The run.logs are in the Previous run logs expander of each tab — you can re-read what happened between two sessions.
The implementation fits in a set_project(project_id) function in agent.py that reassigns the global WORKSPACE. All tool logic reads WORKSPACE at call time, so the switch is transparent.
The app lives in app.py (~620 lines). Structure:
import agentfrom agent import ( SYSTEM_PROMPT, VERIFY_SYSTEM_PROMPT, TEST_SYSTEM_PROMPT, build_test_bootstrap, build_verifier_bootstrap, run_agent, run_verifier, run_test_generator, set_project,)
class StreamlitUI(AgentUI): """Adapts the engine's AgentUI to Streamlit components (st.status, etc.)."""
# 3 tabs: tab_gen, tab_verify, tab_testswith tab_gen: sys_p = system_prompt_editor(SYSTEM_PROMPT, "gen_sys_prompt", "System prompt") if run_clicked: stats = run_agent(entry, ui=ui, project_id=project_id, system_prompt=sys_p)
with tab_verify: sys_p = system_prompt_editor(VERIFY_SYSTEM_PROMPT, "verify_sys_prompt", "System prompt") # ... optional preview ... if run_verify_clicked: vstats = run_verifier(main_class_v, ui=ui, project_id=project_id, system_prompt=sys_p)
with tab_tests: sys_p = system_prompt_editor(TEST_SYSTEM_PROMPT, "tests_sys_prompt", "System prompt") # ... preview ... if run_tests_clicked: tstats = run_test_generator(main_class_t, ui=ui, project_id=project_id, system_prompt=sys_p)The Streamlit UI is not the agent logic. It’s just the adapter between the loops in agent.py and the visual components (st.status, st.metric, st.code). If you swap Streamlit for Flask or Tkinter tomorrow, the agent loop doesn’t change.
WORKSPACE.glob("*.java"), computed in Python, injected in the user message via build_test_bootstrap().workspace/<project_id>/) serves as the communication bus between agents.build_*_bootstrap() computes the user message to send in Python; that’s where we detect classes automatically.