Lesson 3.4: Multi-turn conversations: how memory really works | GeekHub Learn

A chatbot that "remembers" is doing one of three things: re-sending the whole history, summarizing it, or retrieving from a memory store. Knowing which avoids the most common production bugs.

Imagine a meeting where every five minutes a new attendee joins. To keep up, you either replay the whole tape, hand them a summary, or look up what they need from a shared notes app. LLMs have the same three options.

Three memory strategies:

Full history (replay): include every prior message. Highest fidelity, highest cost, blows up beyond a few thousand turns.
Rolling window: keep the last N messages. Cheap, loses old facts.
Summarized memory: after every K turns, summarize older history into a single message. Cheap, decent recall.
Retrieval-based memory: store messages in a vector DB, retrieve the most relevant ones at each turn. Scales infinitely, more complex.

Most production chatbots use a hybrid: rolling window for recent turns + summary for older + retrieval for important facts.

For a Streamlit demo, full history fits easily. For an app with thousands of turns, you need summarization. Here is the summarization pattern:

if len(messages) > 20:
    older = messages[:-10]
    summary = summarize(older)  # one LLM call
    messages = [{"role": "system", "content": f"Earlier summary: {summary}"}] + messages[-10:]

Visualize it

Four small diagrams side by side, one per strategy, with arrows showing message flow and a token-cost label.

Try it now

Have a 10-turn conversation with ChatGPT where in turn 1 you say "my favorite number is 42". By turn 10 ask "what is my favorite number". Note recall. Then start a new chat (no memory). Note loss.

Hands-on lab

Extend your Lesson 3.2 chat script to use a rolling window of 10 messages, and warn when older messages get dropped.

Try it now

Why is "summarized memory" not perfect? Where does it fail?

Common mistakes

Forgetting to persist memory across user sessions (memory dies on refresh)
Mixing memory of multiple users when scaling
Trusting the model to "remember" without sending the history

Debugging tip

If users complain "it forgot what I said", check your memory strategy. Almost always one of: missing assistant append, rolling window too small, no summarization.

Challenge

Implement summarized memory: after every 10 messages, call the model to compress older turns into a single 200-token summary.

Where this shows up

Customer support chatbots
AI tutors that remember your goals across sessions
Coding copilots that recall your project conventions

From the field

In 2026 the hottest memory pattern is "structured memory": extract facts (preferences, goals, identities) into a small JSON store and re-inject them as system text, instead of replaying raw turns.

Recap

LLMs are stateless. Memory is an application concern. Pick a strategy (replay, window, summary, retrieval) and engineer it deliberately.

Quick recall

3 prompts · think before you flip

Prompt 1 of 3

Name three memory strategies.

Quiz time

1 question · tap an answer to check it

1. For a customer support bot that runs for months per user, the best memory strategy is