Demo 1 — Streamlit chat

Duration: 10 min Prerequisites: demo 0 (loop understood)

Source code

Repo: gneuroneai/ollama-demo-1-chat-streamlit — app.py ~200 lines, of which ~15 are actual chat.

git clone https://github.com/gneuroneai/ollama-demo-1-chat-streamlit.git
cd ollama-demo-1-chat-streamlit
.\start.ps1

What this demo is about

This project wraps the chat loop from demo 0 in a Streamlit web interface. The conversation history is stored in st.session_state and re-rendered on every interaction, while the underlying call to ollama.Client.chat() remains identical to the one studied in the previous chapter. The interface adds two pedagogical panels: a sidebar that lets the system prompt and the model be modified at runtime, and an expander that displays the raw list of messages currently sent to Ollama, so that the abstract data structure of demo 0 becomes visible inside the browser. The intended use is to make the same mechanism accessible through a graphical interface without introducing any new framework, abstraction or LLM concept.

Key idea

Take the loop from demo 0, wrap it in Streamlit, and you get a local ChatGPT in your browser. No new LLM concept — it’s the same loop with a comfortable UI and two pedagogical panels.

What the demo does

cd ollama-demo-1-chat-streamlit
.\start.ps1

The browser opens at http://localhost:8500. You see:

A ChatGPT-style conversation with your local model.
Streamed responses (token by token).
An editable system prompt in an expander: change the assistant’s role and the behaviour changes.
A “Show raw messages” panel that displays the exact JSON list sent to Ollama next turn.

Architecture in a diagram

flowchart TB
  U["<b>You (browser)</b><br/>type in st.chat_input"]
  SS["<b>st.session_state</b><br/>messages = [system, user, assistant, ...]"]
  O["<b>ollama.Client.chat()</b><br/>HTTP to 127.0.0.1:11434"]
  UI["<b>st.chat_message</b><br/>rendered bubbles"]
  EX["<b>Expander Raw messages</b><br/>raw JSON"]

  U -->|"submit"| SS
  SS -->|"messages list"| O
  O -->|"streamed chunks"| UI
  UI -->|"append to messages"| SS
  SS -.->|"debug display"| EX
  classDef user fill:#fde68a,stroke:#c2410c,color:#451a03
  classDef state fill:#ddd6fe,stroke:#7c3aed,color:#1e1b4b
  classDef code fill:#dbeafe,stroke:#2563eb
  classDef out fill:#d1fae5,stroke:#047857
  U:::user
  SS:::state
  O:::code
  UI:::out
  EX:::out

Streamlit handles the UI and the cross-rerun persistence; the Ollama loop is identical to demo 0.

The core of the code in 3 minutes

app.py literally does this when sending a message:

from ollama import Client
client = Client(host="http://127.0.0.1:11434")

messages = [{"role": "system", "content": SYSTEM_PROMPT}] + chat_history

placeholder = st.empty()
accumulated = ""
for chunk in client.chat(model="llama3.1:8b", messages=messages, stream=True):
    accumulated += chunk["message"]["content"]
    placeholder.markdown(accumulated + " ▌")
placeholder.markdown(accumulated)

Compared to demo 0:

Demo 0 (terminal)	Demo 1 (Streamlit)
`print(token, end="", flush=True)`	`placeholder.markdown(accumulated + " ▌")`
`while True: input()` loop	Streamlit handles reruns automatically
`messages` in a local variable	`messages` in `st.session_state` (survives reruns)

The rest is Streamlit packaging (~200 file lines):

st.chat_message to render bubbles;
st.chat_input for the input bar;
st.session_state.messages to persist history across reruns;
expanders for the pedagogical view (system prompt + raw messages).

The “Raw messages” panel — the most important pedagogical tool

This is the expander not to miss. When you click on it, you literally see the JSON sent to Ollama on the next turn:

[
  {"role": "system",    "content": "You are a pedagogical assistant..."},
  {"role": "user",      "content": "How are you?"},
  {"role": "assistant", "content": "Good, and you?"},
  {"role": "user",      "content": "And you?"}
]

That’s the protocol, no framework, no abstraction. If a student asks “but where is the AI’s memory?”, you point at this expander: memory is this JSON, and you are sending it every turn.

Guided walk-through

Ask the model: “How are you?”. It answers.
Ask: “And you?”. It answers — and it knows you’re replying to its previous turn. Why? Because the code sends the whole conversation every turn. Open the “Show raw messages” panel: you see the 4 messages in the JSON.
Edit the system prompt: “You always answer like a pirate.”
Clear the conversation, ask anything.
The model talks like a pirate. You touched neither the Python code nor the model weights. Just one line of prompt.

This makes the phrase concrete:

An agent = a system prompt + a model + (later) tools.

Ports and launch

Streamlit picks port 8500 by default to not clash with the other demos if you run them in parallel:

Demo	Default port
`ollama-demo-1-chat-streamlit` (this demo)	8500
`ollama-demo-2-comparator`	8503
`ollama-demo-3-agent-java` (simple CLI agent)	8501
`ollama-demo-4-trio-agents-java` (3 agents)	8502

To stop cleanly:

.\stop.ps1

Important differences vs demo 0

Aspect	Demo 0 (CLI)	Demo 1 (Streamlit)
UI	Terminal, monochrome	Browser, bubbles, expanders
State	Local variable	`st.session_state` (rerun persistence)
Change system prompt	`/system <text>`	Text field in an expander
See history	`/history`	”Raw messages” expander
Multi-user	No	Possible with Streamlit deployment
Lines of code	~50 useful	~200 (of which ~15 of chat, rest UI)
New LLM concepts	—	None. Same Ollama calls.

What to understand before moving on

Streamlit reruns the whole script on every interaction. That’s why st.session_state is crucial: without it, history would vanish.
Streaming is implemented with a placeholder that rewrites on each chunk — exactly like print(end="", flush=True) in demo 0.
The LLM logic is identical to demo 0. If you can explain demo 0, you can explain demo 1.

Going further

You want…	Look at…
The most minimal version (terminal, ~50 lines)	Demo 0 — CLI chat
Compare three models on the same prompt	Demo 2 — 3-way comparator
Understand how to add tools to the model	Demo 3 — minimal CLI agent
A richer UI with 3 collaborating agents	Demo 4 — three agents

Key takeaways

Streamlit adds no new LLM concept — it’s just a UI on top of demo 0’s loop.
st.session_state is what makes history persist across reruns.
The “Raw messages” panel is your best pedagogical tool: it reveals the raw protocol.
Editing the system prompt on the fly = changing the assistant’s behaviour without touching the code.