Multi-turn conversations: how memory really works
A chatbot that "remembers" is doing one of three things: re-sending the whole history, summarizing it, or retrieving from a memory store. Knowing which avoids the most common production bugs.
Imagine a meeting where every five minutes a new attendee joins. To keep up, you either replay the whole tape, hand them a summary, or look up what they need from a shared notes app. LLMs have the same three options.
Three memory strategies:
- Full history (replay): include every prior message. Highest fidelity, highest cost, blows up beyond a few thousand turns.
- Rolling window: keep the last N messages. Cheap, loses old facts.
- Summarized memory: after every K turns, summarize older history into a single message. Cheap, decent recall.
- Retrieval-based memory: store messages in a vector DB, retrieve the most relevant ones at each turn. Scales infinitely, more complex.
Most production chatbots use a hybrid: rolling window for recent turns + summary for older + retrieval for important facts.
For a Streamlit demo, full history fits easily. For an app with thousands of turns, you need summarization. Here is the summarization pattern:
if len(messages) > 20:
older = messages[:-10]
summary = summarize(older) # one LLM call
messages = [{"role": "system", "content": f"Earlier summary: {summary}"}] + messages[-10:]
Quick recall
3 prompts · think before you flip
Prompt 1 of 3
Name three memory strategies.
Quiz time
1 question · tap an answer to check it
1. For a customer support bot that runs for months per user, the best memory strategy is
Finished lesson 3.4?
Mark complete to update your module progress and unlock the streak.