Multi-Turn Memory: How to Give Your AI Bot a Sense of Conversation History
Multi-Turn Memory: How to Give Your AI Bot a Sense of Conversation History
The most frustrating thing about most chatbots is not that they do not know the answer — it is that they do not remember what you already told them. A user who says "I'm having trouble with my MacBook" in turn 1 and "It keeps restarting" in turn 3 expects the bot to know that the subject of "it" is their MacBook. A stateless bot processes turn 3 with no knowledge of turn 1 and either asks redundant questions or gives irrelevant answers. Multi-turn memory is what makes a bot feel like it is paying attention.
Why Memory Is Architecturally Non-Trivial
For rule-based bots, conversation state is explicit: the bot tracks collected slots, current flow position, and the last few messages in a session object. This is memory in a narrow, structured sense.
For LLM-powered bots, memory requires engineering: the LLM itself is stateless — each API call is independent. "Memory" means ensuring that the relevant history is included in the context window of each LLM call. How much history to include, how to represent it, and how to handle conversations that exceed the context window are non-trivial engineering decisions.
Approach 1: Full History in Context Window
The simplest approach: append every user and assistant turn to a running list and pass it as the conversation history to the LLM.
messages = [{"role": "system", "content": system_prompt}]
# Add all prior turns
for turn in history:
messages.append({"role": turn["role"], "content": turn["content"]})
# Add current user message
messages.append({"role": "user", "content": current_message})
response = openai.chat.completions.create(model="gpt-4", messages=messages)
Pros: simple, preserves full context, no information loss. Cons: context window limits (most models: 8k-128k tokens); longer conversations become expensive; irrelevant early context dilutes focus on recent exchanges.
For conversations under 20-30 turns, full history in context is usually adequate and should be the default.
Approach 2: Sliding Window
Maintain only the last N turns in context, discarding older history. N = 6-10 turns works well for most support and task-completion use cases.
The risk: references to entities introduced in older turns ("that order I mentioned") are lost. Mitigate by explicitly including extracted entities (order numbers, user details, product names) as persistent context separate from the turn history.
Approach 3: Conversation Summarisation
For longer conversations, periodically summarise earlier turns and replace them with the summary:
[Conversation summary: User is troubleshooting a MacBook Pro (M3, 2023) that randomly restarts.
Battery diagnostics were checked and showed normal. We ruled out software issues and suspect
hardware — user was asked to schedule a Genius Bar appointment.]
[Last 4 turns: ...]
The summary compresses older context while preserving the key facts needed to continue the conversation. Summarisation can be triggered at a token threshold and run asynchronously after each turn.
Approach 4: External Memory Store
For very long-running contexts (a relationship that spans multiple sessions over days or weeks), external memory stores enable persistence beyond a single session:
- Entity memory: extract and store named entities (user name, account details, product purchases) in a structured store; inject relevant entities at the start of each session
- Episodic memory: store summaries of past conversations in a vector database; retrieve relevant episodes at the start of each new session using semantic similarity
- Procedural memory: store explicit facts the user wants the bot to remember ("I prefer email contact, not phone")
Tools like LangChain Memory, Mem0, and Zep implement these patterns with minimal boilerplate.
What to Include in System Prompt vs. History
Not all context belongs in the conversation history. User profile information, account state, and persistent preferences belong in the system prompt where they establish the invariant context for the conversation. Turn history captures the dynamic, evolving content of the current session. Mixing the two makes the conversation history noisy and makes it harder for the model to distinguish persistent facts from session-specific information.
Conclusion
Multi-turn memory is what makes the difference between a bot that answers questions and a bot that holds a conversation. For short sessions, full history in context is adequate. For longer sessions, sliding windows with entity extraction, periodic summarisation, and (for persistent relationships) external memory stores are the right tools. The goal is to give the model the context it needs to respond coherently without overwhelming it with irrelevant history.
Keywords: chatbot memory, multi-turn conversation, LLM context window, conversation history, RAG chatbot, entity memory, conversational AI, LangChain memory