Skip to content

Demo 1 — Streamlit chat

Duration: 10 min Prerequisites: demo 0 (loop understood)

Source code

Repo: gneuroneai/ollama-demo-1-chat-streamlitapp.py ~200 lines, of which ~15 are actual chat.

Terminal window
git clone https://github.com/gneuroneai/ollama-demo-1-chat-streamlit.git
cd ollama-demo-1-chat-streamlit
.\start.ps1

This project wraps the chat loop from demo 0 in a Streamlit web interface. The conversation history is stored in st.session_state and re-rendered on every interaction, while the underlying call to ollama.Client.chat() remains identical to the one studied in the previous chapter. The interface adds two pedagogical panels: a sidebar that lets the system prompt and the model be modified at runtime, and an expander that displays the raw list of messages currently sent to Ollama, so that the abstract data structure of demo 0 becomes visible inside the browser. The intended use is to make the same mechanism accessible through a graphical interface without introducing any new framework, abstraction or LLM concept.

Take the loop from demo 0, wrap it in Streamlit, and you get a local ChatGPT in your browser. No new LLM concept — it’s the same loop with a comfortable UI and two pedagogical panels.

Terminal window
cd ollama-demo-1-chat-streamlit
.\start.ps1

The browser opens at http://localhost:8500. You see:

  • A ChatGPT-style conversation with your local model.
  • Streamed responses (token by token).
  • An editable system prompt in an expander: change the assistant’s role and the behaviour changes.
  • A “Show raw messages” panel that displays the exact JSON list sent to Ollama next turn.
flowchart TB
  U["<b>You (browser)</b><br/>type in st.chat_input"]
  SS["<b>st.session_state</b><br/>messages = [system, user, assistant, ...]"]
  O["<b>ollama.Client.chat()</b><br/>HTTP to 127.0.0.1:11434"]
  UI["<b>st.chat_message</b><br/>rendered bubbles"]
  EX["<b>Expander Raw messages</b><br/>raw JSON"]

  U -->|"submit"| SS
  SS -->|"messages list"| O
  O -->|"streamed chunks"| UI
  UI -->|"append to messages"| SS
  SS -.->|"debug display"| EX
  classDef user fill:#fde68a,stroke:#c2410c,color:#451a03
  classDef state fill:#ddd6fe,stroke:#7c3aed,color:#1e1b4b
  classDef code fill:#dbeafe,stroke:#2563eb
  classDef out fill:#d1fae5,stroke:#047857
  U:::user
  SS:::state
  O:::code
  UI:::out
  EX:::out
Streamlit handles the UI and the cross-rerun persistence; the Ollama loop is identical to demo 0.

app.py literally does this when sending a message:

from ollama import Client
client = Client(host="http://127.0.0.1:11434")
messages = [{"role": "system", "content": SYSTEM_PROMPT}] + chat_history
placeholder = st.empty()
accumulated = ""
for chunk in client.chat(model="llama3.1:8b", messages=messages, stream=True):
accumulated += chunk["message"]["content"]
placeholder.markdown(accumulated + "")
placeholder.markdown(accumulated)

Compared to demo 0:

Demo 0 (terminal)Demo 1 (Streamlit)
print(token, end="", flush=True)placeholder.markdown(accumulated + " ▌")
while True: input() loopStreamlit handles reruns automatically
messages in a local variablemessages in st.session_state (survives reruns)

The rest is Streamlit packaging (~200 file lines):

  • st.chat_message to render bubbles;
  • st.chat_input for the input bar;
  • st.session_state.messages to persist history across reruns;
  • expanders for the pedagogical view (system prompt + raw messages).

The “Raw messages” panel — the most important pedagogical tool

Section titled “The “Raw messages” panel — the most important pedagogical tool”

This is the expander not to miss. When you click on it, you literally see the JSON sent to Ollama on the next turn:

[
{"role": "system", "content": "You are a pedagogical assistant..."},
{"role": "user", "content": "How are you?"},
{"role": "assistant", "content": "Good, and you?"},
{"role": "user", "content": "And you?"}
]

That’s the protocol, no framework, no abstraction. If a student asks “but where is the AI’s memory?”, you point at this expander: memory is this JSON, and you are sending it every turn.

  1. Ask the model: “How are you?”. It answers.
  2. Ask: “And you?”. It answers — and it knows you’re replying to its previous turn. Why? Because the code sends the whole conversation every turn. Open the “Show raw messages” panel: you see the 4 messages in the JSON.
  3. Edit the system prompt: “You always answer like a pirate.”
  4. Clear the conversation, ask anything.
  5. The model talks like a pirate. You touched neither the Python code nor the model weights. Just one line of prompt.

This makes the phrase concrete:

An agent = a system prompt + a model + (later) tools.

Streamlit picks port 8500 by default to not clash with the other demos if you run them in parallel:

DemoDefault port
ollama-demo-1-chat-streamlit (this demo)8500
ollama-demo-2-comparator8503
ollama-demo-3-agent-java (simple CLI agent)8501
ollama-demo-4-trio-agents-java (3 agents)8502

To stop cleanly:

Terminal window
.\stop.ps1
AspectDemo 0 (CLI)Demo 1 (Streamlit)
UITerminal, monochromeBrowser, bubbles, expanders
StateLocal variablest.session_state (rerun persistence)
Change system prompt/system <text>Text field in an expander
See history/history”Raw messages” expander
Multi-userNoPossible with Streamlit deployment
Lines of code~50 useful~200 (of which ~15 of chat, rest UI)
New LLM conceptsNone. Same Ollama calls.
  • Streamlit reruns the whole script on every interaction. That’s why st.session_state is crucial: without it, history would vanish.
  • Streaming is implemented with a placeholder that rewrites on each chunk — exactly like print(end="", flush=True) in demo 0.
  • The LLM logic is identical to demo 0. If you can explain demo 0, you can explain demo 1.
You want…Look at…
The most minimal version (terminal, ~50 lines)Demo 0 — CLI chat
Compare three models on the same promptDemo 2 — 3-way comparator
Understand how to add tools to the modelDemo 3 — minimal CLI agent
A richer UI with 3 collaborating agentsDemo 4 — three agents
  • Streamlit adds no new LLM concept — it’s just a UI on top of demo 0’s loop.
  • st.session_state is what makes history persist across reruns.
  • The “Raw messages” panel is your best pedagogical tool: it reveals the raw protocol.
  • Editing the system prompt on the fly = changing the assistant’s behaviour without touching the code.