The Exam Room

Exploring AWS, one service or situation at a time.

Exam Room · Architecture

Routing to the Closest Healthy Region

A multi-region application needs to route requests to the closest healthy region, failing over automatically when the preferred one drops out, with no client-side retries and no extra health-check plumbing to maintain. Route 53 can do all of that in a single record set. Finding the correct combination means touring all seven routing policies and the attributes that separate them.

Read article

Exam Room · Architecture

Choosing an S3 Storage Class for Cold Archives

Some data exists for compliance, not for use. Tens of terabytes of records sitting untouched until an auditor wants them. S3 has eight storage classes; only one of them is built for that pattern, and getting it wrong can cost an order of magnitude in a year you weren't paying attention to the bill.

Read article

Exam Room · GenAI

Picking the AWS AI Service Tier for Each Feature

A product manager with no ML background has been told to add AI to a SaaS product, and has heard of Bedrock, SageMaker, Comprehend, Translate, Textract, Rekognition. AWS has three different shapes of AI offering, and the shortest path depends entirely on whether a ready-made service already does the job.

Read article

Exam Room · GenAI

How to Take a Foundation Model from Pick to Production Endpoint

A product team wants a chatbot that summarises support tickets. They have the tickets, a cloud account, and no ML background. Somebody says 'use a foundation model'. Between that sentence and a working endpoint sit roughly seven distinct stages, each with its own AWS service and its own decisions. Picking the model is the easy part; the real work is figuring out which stages this team can skip, which they absolutely cannot, and what AWS gives them at each step.

Read article

Exam Room · GenAI

Choosing Between Prompting, RAG, and Fine-Tuning

A legal-ops team wants a model that answers questions about their 4,000 in-house contract templates. The first prototype, a plain Claude call with the question in the prompt, hallucinates clause numbers. Someone suggests fine-tuning; someone else suggests RAG. They solve different problems, so 'which is better' is the wrong frame; what matters is which problem the team actually has, and what each adaptation technique costs in time, data, and recurring spend.

Read article

Exam Room · GenAI

Grounding a Chatbot in Your Own PDFs

A facilities team has 600 PDFs, equipment manuals, safety procedures, maintenance schedules, sitting on a SharePoint drive. Engineers want a chatbot that answers 'how do I reset the chiller on floor 4?' in seconds instead of a ten-minute PDF hunt. Retrieval-augmented generation can do this; whether it does it well depends on what the corpus actually looks like, what kinds of questions the engineers really ask, and which configuration knobs decide whether the answers are any good once a managed service is on the table.

Read article

Exam Room · GenAI

Forecasting Without Writing Python

A category manager has 18 months of weekly sales data for 400 SKUs and a deadline to forecast next quarter. She doesn't code. The ML team is booked until Q3. The ask is a tool that lets her build a forecast herself, importable, reviewable, explainable, without waiting for engineering. Which AWS box she clicks matters less than what kind of problem this actually is, what features of the data can honestly feed into a model, and what the business user has to understand for the output to be defensible when finance asks ''why this number?''.

Read article

Exam Room · GenAI

How to Make a Bedrock Chatbot Audit-Ready with Guardrails and Watermarks

A fintech ships a customer-facing chatbot on Bedrock. Legal asks: can it give financial advice? Risk asks: can it leak customer account numbers? Compliance asks: if an auditor requests proof a response came from our model, can we demonstrate it? Three questions, three different controls, all of them Bedrock-native. The controls exist; the work is matching the right one to each question and figuring out what the shape of a 'responsible AI' configuration actually looks like when the auditor arrives.

Read article

Exam Room · GenAI

Choosing Between Chains, Retrieval, and Agents for a GenAI Assistant

A product manager wants a 'GenAI assistant' for internal operations, something that can answer questions, look up customer records, draft emails, and file Jira tickets. Three architectural patterns keep coming up: chains, retrieval, and agents. They sound similar, they all use foundation models, and teams routinely reach for the most elaborate one when a simpler pattern would do. There's no single 'best' here; what matters is which one fits each piece of the assistant's workload, and when elaboration costs more than it earns.

Read article

Exam Room · GenAI

Choosing Between SageMaker, Bedrock, and Purpose-Built AI APIs

A platform team has five AI-shaped requests landing in a single sprint: transcribe call centre audio, detect anomalies in sensor data, extract text from scanned forms, summarise customer emails, and detect faces in CCTV. Someone has already typed 'use SageMaker' into three design docs. Someone else insists Bedrock is the answer. A third voice mutters about purpose-built services. AWS has at least three answers to every AI problem, so there's no single platform that wins; what matters is how to tell which layer of the stack each request lands on, and what that choice costs in time, money, and flexibility.

Read article

Exam Room · Advanced GenAI

Picking a Bedrock Model for High-Volume RAG

A million LLM requests a day, peaking at thirty per second, split across US and EU customers, with a P99 first-token target under 1.5 seconds and real reasoning over retrieved context. Bedrock has seven model families and four ways to buy capacity. Most of the landscape falls away once you name what actually decides it, and the real trick is what you do *after* you've picked the model.

Read article

Exam Room · Advanced GenAI

How to Build a Citations-Required RAG Over 50K Internal Documents

Fifty thousand internal documents, five gigabytes of text, weekly churn, a three-second latency budget, per-user access control, and a citation in every single answer. The RAG landscape on Bedrock is bigger than one product and the interesting part of the design is what falls away once you name the five things that actually decide it.

Read article

Exam Room · Advanced GenAI

Combining RAG and Fine-Tuning for a Legal Contract Assistant

A legal-tech team wants a contract review assistant that understands two hundred thousand past matters, speaks in the firm's voice with clause-by-section citations, and refuses anything off-domain. A hundred thousand dollars, three months. RAG, fine-tuning, and continued pre-training each solve a different half of that sentence; the interesting answer is which two to pick, not which one.

Read article

Exam Room · Advanced GenAI

Designing Short-Term and Long-Term Memory for a Bedrock Chat Assistant

A customer-support assistant where the average conversation runs fifteen turns before it resolves, and returning users pick up two weeks later expecting the bot to remember they've been waiting on a refund. Two memory problems in one product (what's live in the current conversation and what persists across visits) and four plausible ways to build it. Bedrock Agents' built-in memory handles one half cleanly; the other half is where teams reach for DynamoDB or a knowledge base and get it wrong.

Read article

Exam Room · Advanced GenAI

Configuring Bedrock Guardrails for PII, Topics, and Grounding

A consumer-facing chatbot on Bedrock has passed every red-team round on the obvious harms (no weapons, no hate, no CSAM) and is still shipping embarrassments: a card number pasted by one user echoing back in a reply, the bot cheerfully comparing the company's product with a named competitor, and a hallucinated policy line that nobody in the building wrote. Five different filter jobs wrap the same Bedrock invocation, and Guardrails is the one surface that does all five without five Lambdas.

Read article

Exam Room · Advanced GenAI

Spreading Bedrock Load with Cross-Region Inference Profiles

A Bedrock-backed SaaS serving US, EU and APAC customers is hitting regional quota in us-east-1 during peak while the same model sits idle in eu-west-1. The team wants to spread load without fracturing the product into three regional deployments. Three letters on the front of the model ID do the job, provided the model supports it and the geography fits the customer.

Read article

Exam Room · Advanced GenAI

Building RAG When the Source Documents Change Daily

A support assistant that has to answer from a product manual which the product team edits weekly, a pricing sheet that changes at month-end, and an operational runbook that mutates hourly. The base model doesn't know any of it, and fine-tuning won't keep up. Retrieval is the answer; the question is how much of the retrieval plumbing we want to own, and Bedrock Knowledge Bases, a LangChain stack, and a hand-rolled pipeline each put the lines in different places.

Read article

Exam Room · Advanced GenAI

How to Wire an LLM to Side-Effecting Actions with Bedrock Agents

An assistant that has to look up a customer's subscription, pause it, refund a charge, and email confirmation. Not just answer, act. The glue between a language model and the rest of our systems is a solved problem three different ways: Bedrock Agents, a LangChain agent loop, or a hand-written tool router. Each of them handles tool definition, invocation, and error recovery, but they put the guardrails in very different places.

Read article

Picking a Vector Store for Bedrock RAG

Twelve million embedding vectors, a 50ms retrieval budget, hybrid queries that mix keyword and semantic, and a bill that should not double the Bedrock spend on its own. OpenSearch Serverless, Aurora with pgvector, and Pinecone Serverless all serve the same shape of query, but their pricing curves, operational shapes, and query surfaces diverge the moment the corpus grows beyond demo scale.

Generative AI Developer · AIP-C01

Coming soon