GeekHub Learn
Module

Multi-turn conversations: how memory really works

A chatbot that "remembers" is doing one of three things: re-sending the whole history, summarizing it, or retrieving from a memory store. Knowing which avoids the most common production bugs.

Imagine a meeting where every five minutes a new attendee joins. To keep up, you either replay the whole tape, hand them a summary, or look up what they need from a shared notes app. LLMs have the same three options.

Three memory strategies:

  1. Full history (replay): include every prior message. Highest fidelity, highest cost, blows up beyond a few thousand turns.
  2. Rolling window: keep the last N messages. Cheap, loses old facts.
  3. Summarized memory: after every K turns, summarize older history into a single message. Cheap, decent recall.
  4. Retrieval-based memory: store messages in a vector DB, retrieve the most relevant ones at each turn. Scales infinitely, more complex.

Most production chatbots use a hybrid: rolling window for recent turns + summary for older + retrieval for important facts.

For a Streamlit demo, full history fits easily. For an app with thousands of turns, you need summarization. Here is the summarization pattern:

if len(messages) > 20:
    older = messages[:-10]
    summary = summarize(older)  # one LLM call
    messages = [{"role": "system", "content": f"Earlier summary: {summary}"}] + messages[-10:]

Visualize it

Four small diagrams side by side, one per strategy, with arrows showing message flow and a token-cost label.

Try it now

Have a 10-turn conversation with ChatGPT where in turn 1 you say "my favorite number is 42". By turn 10 ask "what is my favorite number". Note recall. Then start a new chat (no memory). Note loss.

Hands-on lab

Extend your Lesson 3.2 chat script to use a rolling window of 10 messages, and warn when older messages get dropped.

Try it now

Why is "summarized memory" not perfect? Where does it fail?

Common mistakes

  • Forgetting to persist memory across user sessions (memory dies on refresh)
  • Mixing memory of multiple users when scaling
  • Trusting the model to "remember" without sending the history

Debugging tip

If users complain "it forgot what I said", check your memory strategy. Almost always one of: missing assistant append, rolling window too small, no summarization.

Challenge

Implement summarized memory: after every 10 messages, call the model to compress older turns into a single 200-token summary.

Where this shows up

  • Customer support chatbots
  • AI tutors that remember your goals across sessions
  • Coding copilots that recall your project conventions

From the field

In 2026 the hottest memory pattern is "structured memory": extract facts (preferences, goals, identities) into a small JSON store and re-inject them as system text, instead of replaying raw turns.

Recap

LLMs are stateless. Memory is an application concern. Pick a strategy (replay, window, summary, retrieval) and engineer it deliberately.


Quick recall

3 prompts · think before you flip

Prompt 1 of 3

Name three memory strategies.

Quiz time

1 question · tap an answer to check it

  1. 1. For a customer support bot that runs for months per user, the best memory strategy is

Finished lesson 3.4?

Mark complete to update your module progress and unlock the streak.

Loading