Exam Room · Advanced GenAI

Designing Short-Term and Long-Term Memory for a Bedrock Chat Assistant

June 08, 2026 · 16 min read

The situation

A product team is building an AI support assistant for a mid-sized SaaS company. The assistant handles first-line queries, billing, account access, feature questions, refund requests, and escalates to human agents when it can’t. Measured over six weeks of closed beta:

Average conversation length: 15 turns, ranging from five to past thirty.
Return rate: 40% within 30 days. Median return gap eleven days; roughly half reference something from a previous thread, “did the refund you mentioned go through?”, “I’m still seeing the login error you helped me with last week”.
Tool useTool useLetting an LLM call structured functions you’ve defined – search, calculator, database query, API call – instead of trying to do everything in text. : three or four per conversation. Account lookups, subscription checks, ticket creation.
Platform: Bedrock. Nothing self-hosted.
Team: two backend engineers, one front-end, no dedicated ML-ops.
Compliance: GDPR. Conversation content is personal data; deletion-on-request has to be clean, retention has to be bounded.

What actually matters

“Memory” is two problems, not one. The first is keeping a single conversation coherent: turn fifteen has to know what happened at turn two. The second is recognising a returning user: someone who comes back eleven days later should land on a bot that already knows about their open refund, not one that asks them to retype it. Build both with one mechanism and you usually get one that does neither well, because the two pull in different directions. In-conversation memory has to be correct on every turn and fails loudly when it isn’t, which makes it backend plumbing. Cross-visit memory can be approximate, but it has two failure modes that are worse than approximate, which makes it product policy with engineering behind it.

Those two cross-visit failures are worth naming, because they set the privacy bar. Surfacing someone else’s conversation as if it were this user’s is a wrongful-disclosure incident: a stranger’s refund thread pulled up against this user’s login question. Failing to surface this user’s own open refund when they ask about it is milder, a trust dent rather than a breach, but still a product bug. Avoiding the first means per-user isolation has to be airtight, and it can’t be talked out of place by prompt injectionPrompt injectionAn attack where untrusted text the model is processing tries to override the instructions you actually gave it. . Avoiding the second means retrieval has to work on short, fragmented conversation text, which is exactly what document-retrieval tooling is bad at.

GDPR sets the next bar. When a user asks to be forgotten, every trace of their conversations has to go, cleanly and provably. A design where deletion cascades across four stores is one that eventually fails an audit. Aim instead for one delete call per store, each scoped to an identifier the application already holds. A single opaque per-user key deletes cleanly; per-turn vectors scattered through a shared index behind metadata filters can be made to work, but they’re far harder to stand behind when someone asks you to prove the data is gone.

Then there’s the team: two backend engineers, no ML-ops. Anything that scales with conversation volume is a liability by year two. A summarisation cron firing an LLMLLMA neural network trained to predict the next token in a sequence, large enough that it generalises to tasks it wasn’t explicitly trained for. call on every session close brings its own eviction policy, retention TTL, and retry logic, all of it infrastructure to own and operate. A managed option that does the same job behind a config flag buys that attention back for the product. The thing you give up is flexibility, and this product never spends it. One seam is worth leaving open, though: a billing agentAgentA system that wraps an LLM with tools, memory, and a loop, so it can take multi-step actions toward a goal rather than just answering one prompt. and a support agent may one day need to share what they each know about the same user, so memory keyed to user identity rather than to a single agent instance is the easier thing to grow into.

What we’ll filter on

Five things the design has to deliver.

In-session coherence. Turn fifteen must be aware of turn two. The agent needs to see the relevant history of this conversation when it generates the next response.
Cross-session recall. A user returning eleven days later should land on a bot that can reasonably answer “what was the last thing we talked about?” without asking them to retype context. Not perfect replay, a usable summary.
Orchestration included. Fifteen turns with three tool calls per conversation means the assistant is planning, calling tools, observing results, and deciding what to do next. The memory solution has to live next to the orchestration, not compete with it.
Retrieval quality for conversational context. Pulling the correct fact from a past conversation is a different retrieval problem from pulling the correct paragraph from a product manual. Conversation data is short, interleaved, and context-dependent.
Operational overhead low enough for two backend engineers. No bespoke orchestration loop, no custom summarisation pipeline, no self-hosted vector database. GDPR delete has to be a button, not a project.

The memory landscape on Bedrock

Four plausible ways to build this.

Bedrock Agents’ built-in memory. A Bedrock Agent is the managed orchestration primitive: modelModelA trained set of weights plus the architecture that makes them useful – the thing you load up and run inference against. + action groups (tool definitions) + knowledge bases + promptPromptThe input you hand to an LLM – system instructions, user message, examples, retrieved documents, tool descriptions, the lot. templates, all wired together so the platform handles the plan-call-observe loop. Memory comes in two layers. Session state is automatic: every InvokeAgent call within a session sees the full conversation history, pass a sessionId and the agent assembles the history itself. Long-term session summaries are opt-in: enable memoryConfiguration with SESSION_SUMMARY, set a memoryId per end user, set a retention window (1 to 365 days). After each session ends, the agent generates a concise summary and stores it keyed to that memoryId. Delete is a single DeleteAgentMemory call.

DynamoDB-backed session store (build-your-own). Roll the orchestration loop yourself. A Lambda receives the user turn, reads conversation-so-far from DynamoDB (partition key sessionId, sort key turn timestamp), builds the prompt, calls InvokeModel, writes the response back, returns it. Cross-session recall is a second table keyed by user ID holding rolled-up state. Summaries come from an LLM call you write and schedule.

Bedrock Knowledge Bases for long-term recall. Dump transcripts or summaries into S3 and query at runtime for “what’s this user’s history?”. Chunking strategies assume a prose document; conversations are short, fragmentary, and relevance is keyed to who spoke and when. A chunk from someone else’s refund thread retrieved as “relevant” to this user’s login question is a correctness problem with a compliance problem stapled to it.

Custom vector store with conversation embeddings. Embed each conversation (or turn, or summary) with Titan Embeddings V2, store in OpenSearch Serverless or pgvector with per-user metadata, at session start query for the current user’s top-k most relevant past interactions. Full control of chunking granularity, metadata filtering, ranking. Also a second stateful system to own alongside DynamoDB.

Side by side

Option	In-session coherence	Cross-session recall	Orchestration included	Retrieval for conversation	Low ops
Bedrock Agents memory	✓	✓	✓	✓	✓
DynamoDB session store (DIY)	✓	✓	✗	✓	✗
Knowledge Bases for past transcripts	✗	✗	✗	✗	✗
Custom vector store of conversation embeddings	✗	✓	✗	✓	✗

Matching the layers to the memory

One `InvokeAgent` call. Green dashed reads pull session history and prior-session summaries; red writes append the new turn and, on session end, write the summary. The developer passes `sessionId` and `memoryId`; the agent owns the plumbing.

Bedrock Agents memory, in depth

A Bedrock Agent is more than a model invocation; it’s an orchestration surface. Define action groups (tool schemas plus implementing Lambdas), optionally attach knowledge bases, write an instruction prompt, call InvokeAgent with a user turn and a sessionId. The runtime handles the ReAct-style loop.

Session memory is automatic. Every call with the same sessionId sees every prior turn, including tool calls and tool results. Idle timeout defaults to 30 minutes, configurable up to 24 hours. Turn fifteen sees turns one through fourteen because the agent reads them itself.

Long-term summary memory is configuration. Set memoryConfiguration on the agent with enabledMemoryTypes: [SESSION_SUMMARY] and a storageDays retention window. At runtime, pass a memoryId alongside the sessionId, typically a hash of the authenticated user ID. When the session ends, the agent generates a summary using a managed (customisable) prompt and stores it keyed to that memoryId. Subsequent sessions with the same memoryId have the prior summaries injected into context.

Retention and deletion. storageDays sets a TTL; once it lapses, the summary is gone. DeleteAgentMemory with a memoryId wipes everything for that user on demand. GDPR right-to-be-forgotten in one request.

Limits worth naming. memoryId lookup is exact-match, not semantic, no vector-search “find users with similar past experiences” built in. Summaries are bounded in length, so very long histories lose detail over time. Session memory is within an agent instance, moving a user from a support agent to a billing agent needs application-level plumbing to pass state across.

When build-your-own earns a place

Two situations flip the decision toward DynamoDB + a hand-rolled loop.

When you don’t want the orchestration. Bedrock Agents is opinionated about how tools get called: it runs the loop, chooses the action group, writes the reasoning. A team that needs tighter control over prompts, tool ordering, or failure modes sometimes builds its own loop instead. Session state then has to live somewhere, and DynamoDB is the natural home: partition key sessionId, sort key turn timestamp, TTL for auto-expiry.

When state is richer than turns. Conversations aren’t the only per-session state; a shopping cart, a configured quote, a workflow status are none of them naturally turns. DynamoDB holds that directly, and the tools read and write it.

Neither flip applies to the two-engineer support bot. Orchestration is standard ReAct-over-tools; state is conversational. Bedrock Agents covers both.

The hybrid worth knowing. Teams using Bedrock Agents memory often add a small DynamoDB or S3 store for structured cross-session facts, ticket numbers, subscription plan, last-known issue code, that the agent needs reliably regardless of whether they appear in a generated summary. Summary memory is the prose recall; the DynamoDB table is the structured one. A tool the agent calls to fetch it is the clean seam.

Why Knowledge Bases is the wrong shape for conversations

Four reasons.

Chunking doesn’t match. Knowledge Bases chunk documents, fixed-size (default ~300 tokens), hierarchical, or semantic, assuming nearby text is topically coherent. A conversation transcript has rapid speaker alternation, interleaved tool outputs, and short turns; a 300-token chunk spans three sub-topics and two speakers.

Retrieval relevance is topic, not speaker. A vector search for “refund” across a knowledge base of all transcripts will cheerfully return high-similarity chunks from other users’ refund conversations. Compliance problem plus correctness problem. Metadata filtering by user ID helps but has to be attached at ingestion and is less flexible than a native vector store’s.

Summaries vs transcripts. Storing raw transcripts means retrieving fragments. The correct thing to retrieve is summaries, and generating those is the job Bedrock Agents’ long-term memory already does.

GDPR is harder. Deleting a user’s data means locating every chunk that contains their content in a service-managed index, then re-ingesting. DeleteAgentMemory is one call.

Knowledge Bases are correct for “what does our support policy say about refunds?”, a reference corpus shared across users. Wrong for “what did this user say yesterday?”, per-user conversational state.

A worked design

Bedrock Agent wrapping Claude Haiku 4.5, latency-sensitive, cost-sensitive, reasoning bar for first-line support is low enough. Action groups for account lookup, subscription status, ticket create/query. One Knowledge Base attached for the product documentation corpus, the policy memory, not the user memory.
Session memory: on by default. sessionId is the chat-widget session, rotated on explicit “new conversation” or 30 minutes idle.
Long-term summary memory: memoryConfiguration with SESSION_SUMMARY, storageDays: 90. memoryId is sha256(userId), stable per authenticated user, doesn’t leak the raw ID. sessionId and memoryId both passed on every InvokeAgent call.
Structured cross-session state: a small DynamoDB table keyed by user ID, holding open ticket IDs, subscription tier, last-issue-code. A GetUserContext action group lets the agent fetch this at conversation start when relevant.
GDPR delete: a Lambda triggered by account closure calls DeleteAgentMemory with the user’s memoryId, deletes the DynamoDB row, records an audit trail.
Retention: summaries lapse after 90 days via storageDays.
Monitoring: CloudWatch on InvokeAgent latency and error rate; a weekly anonymised sample of summaries reviewed for quality.

No dedicated memory database, no custom summarisation cron, no per-user vector index. The memory plumbing comes with the agent.

What’s worth remembering

Short-term and long-term memory are different problems. Turn-level coherence within one conversation is session state; cross-visit recall is summary state. A single solution rarely does both well unless it was designed for both.
Bedrock Agents memory covers both layers as managed functionality. Session memory is automatic, pass a sessionId. Long-term summary memory is configuration, enable SESSION_SUMMARY, pass a memoryId, set storageDays.
memoryId scopes long-term memory to a user; sessionId scopes session memory to a conversation. Orthogonal identifiers, both passed on every InvokeAgent call when long-term memory is enabled.
DeleteAgentMemory is the GDPR delete button. One API call, scoped to a memoryId. Retention also lapses automatically via storageDays (1 to 365).
Knowledge Bases are for reference corpora, not conversational state. Chunking, retrieval relevance, and per-user isolation all work against using them for past-transcript recall.
DynamoDB fits as structured-state companion to Bedrock Agents memory. Ticket IDs, subscription tier, status flags, things the agent fetches via a tool call, not things the agent summarises in prose. A hybrid is common and clean.
A custom vector store over conversation embeddings is flexibility that costs a team. Justified when cross-user semantic similarity is a product feature; overkill when the product just needs “remember this user”.
Bedrock Agents includes orchestration. Action groups, knowledge bases, and the ReAct-style tool loop come with the agent. Build-your-own means rebuilding that loop, more code to own, no better outcome for standard shapes.

The answer: use a Bedrock Agent with session memory for in-conversation coherence and long-term summary memory (SESSION_SUMMARY, memoryId per user, storageDays retention) for cross-session recall. Attach a Knowledge Base for product documentation, the reference corpus every user shares. Add a small DynamoDB table of structured per-user state (open tickets, subscription tier) behind a GetUserContext action group. Wire DeleteAgentMemory into the account-closure path for GDPR. The two engineers ship a memory system without operating a memory system.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.