Bedrock Agents vs Lambda Orchestration

April 05, 2028 · 18 min read

The situation

An online retailer is building a customer-facing support assistant. Users ask questions like “where’s my order?”, “can I return the blue one?”, “is the medium in stock?”. Resolving them requires:

Calling GetOrderStatus(order_id) against the orders microservice.
Calling CreateReturn(order_id, item_id, reason) against the returns microservice (write, requires user confirmation).
Calling CheckInventory(sku, size, location) against the catalog microservice.
Maintaining conversation state, “the blue one” from the last message needs to resolve to the correct SKU in this message.

Two options on the table:

Bedrock Agents: define the tools as OpenAPI specs, give the agentAgentA system that wraps an LLM with tools, memory, and a loop, so it can take multi-step actions toward a goal rather than just answering one prompt. a set of instructions, let the Bedrock-hosted agent LLM plan the steps, call the tools (Lambda functions invoked by the Agent), and manage conversation state.
Lambda-based orchestration: the LLM is a Bedrock Converse API call inside a Lambda; the Lambda code explicitly interprets the LLM’s response, calls the correct downstream service, feeds the result back into the next LLM call, and maintains state in DynamoDB.

Both paths deliver a working assistant. The question is which one matches this team’s operational modelModelA trained set of weights plus the architecture that makes them useful – the thing you load up and run inference against. , risk appetite, and ownership structure.

What actually matters

The fundamental difference between the two options is where the decision logic lives.

In Bedrock Agents, the LLM decides, at every step, given the current state and the tool catalogue, what to do next. The agent framework gives it a tool catalogue, a conversation history, a system-level instruction, and lets it emit either a user-facing response or a tool invocation. When a tool is invoked, the framework runs the corresponding Lambda, collects the result, feeds it back into the agent, and the cycle continues until the agent emits a terminal response. This is a form of the ReAct pattern (reason + act), baked into managed infrastructure.

The upside: the team writes less orchestration code. Adding a new tool is writing a Lambda + an OpenAPI schema entry. The LLM figures out when to call it. Multi-step flows (“look up the order, then check inventory for the replacement”) emerge from the agent’s planning without hand-coded state machines.

The downside: when the agent does something strange, the team has to debug why the LLM decided that. Logs show tool calls and their parameters; they don’t show the internal reasoning easily. Adding guardrailsGuardrailA filter or rule applied to an LLM’s inputs or outputs to keep it inside safe, legal, or on-brand behaviour. , “never call CreateReturn without explicit user confirmation”, is a matter of prompt engineeringPromptThe input you hand to an LLM – system instructions, user message, examples, retrieved documents, tool descriptions, the lot. plus the framework’s built-in confirmation features, not hard-coded control flow. Subtle bugs: the agent confidently calls a wrong tool with plausible parameters. Ownership of correctness is distributed across the prompt, the tool schema, the model version, and the agent framework’s updates.

In Lambda-based orchestration, the team decides what the LLM is used for. A typical shape:

User message arrives; Lambda pulls conversation history from DynamoDB.
Lambda prompts the LLM: “Given this conversation, what is the user’s intent?” or “Extract the following slots.”
Lambda interprets the LLM’s structured response and deterministically calls the correct service (GetOrderStatus, CreateReturn, etc).
Lambda prompts the LLM again: “Given this service response, draft a reply.”
Lambda persists state, returns the reply.

The upside: the orchestration is code. Every decision point has a traceable location in the Lambda. Guardrails are if statements, not prompt language. Correctness is inspectable; debugging is tail a log. Adding a tool is writing code; the team owns the control flow explicitly.

The downside: more code. Every new intent, every new tool, every new multi-step flow requires a code change. The LLM’s reasoning ability is less leveraged, the team is essentially using the LLM for narrow tasks (intent classification, slot filling, reply drafting) rather than as an autonomous planner.

Neither is universally better. The interesting discriminators:

How many tools? Two or three tools and a couple of common intents, hand-coded orchestration is simpler. Twenty tools with complex interactions, an agent’s planning starts to earn its keep.
How much does the team trust the LLM to pick? In high-stakes domains (financial transactions, medical advice), a hand-coded orchestration layer that constrains the LLM is safer. In low-stakes domains (product info, general Q&A), an agent’s autonomy is acceptable.
How important is debuggability? A state machine is a visualisable thing; an agent’s trace is harder to reason about.
How much operational ownership is the team willing to take? An agent is more managed; less code to own, but more “what is it doing” questions to answer.

What we’ll filter on

Five filters for the pick:

Who decides the flow. LLM at every step, or code?
How is a new tool added. OpenAPI spec + Lambda, or code change?
How traceable is a given user’s interaction, structured log per step, or a nested reasoning trace?
How are guardrails expressed, prompt + framework features, or conditionals?
What level of LLM reasoning is leveraged, full autonomous planning, or task-specific prompting?

The orchestration landscape

1. Bedrock Agents. Managed agent runtime: configure a system-level instruction, a set of action groups (each action group maps to an OpenAPI spec + a Lambda), and optionally a Knowledge Base. Bedrock hosts the agent LLM; at runtime, the user’s message goes in, the agent plans, invokes tools (via the Lambdas), assembles the response, and returns it. Session state is managed. Supports multi-step, multi-tool flows out of the box; user-confirmation for high-stakes actions is a built-in feature.

2. Bedrock Converse API + hand-coded Lambda orchestration. Same LLM; stripped-down harness. The team’s Lambda code calls Bedrock directly, interprets the response, decides what to do, calls downstream services, loops as needed. Step Functions optionally orchestrates multi-step flows. DynamoDB for conversation state.

3. SageMaker endpoint + hand-coded orchestration. Same shape as option 2, with a self-hosted LLM behind a SageMaker endpoint instead of Bedrock. Appropriate if the team needs a model Bedrock doesn’t host, or needs deeper control over the inference stack.

4. AWS Step Functions + Bedrock. The workflow is a Step Functions state machine; the LLM is a task inside it. Good when the flow is more like a business process (with retries, parallelism, and explicit error handling) than a conversation.

5. LangChain / LlamaIndex + anything. Framework-level agent patterns running in Lambda / Fargate / EC2. More ecosystem but more dependencies; not AWS-native. Useful when a team already has agent work in these frameworks.

Side by side

Option	Who decides	Add a tool	Traceability	Guardrails	LLM use
Bedrock Agents	LLM plans each step	OpenAPI + Lambda	Agent trace (structured but nested)	Prompt + framework confirmations	Full autonomy
Bedrock Converse + Lambda	Code	Code change	Per-step Lambda logs	Conditionals	Narrow, task-specific
SageMaker endpoint + Lambda	Code	Code change	Per-step Lambda logs	Conditionals	Narrow, task-specific
Step Functions + Bedrock	State machine	New state	Step Functions history	State machine conditionals	As narrow as needed
LangChain / LlamaIndex	Framework + code	Add a tool object	Framework tracing (Langfuse, Langsmith)	Framework conventions	Mixed

Reading the table against the retailer scenario:

Three tools, clear intents (order status, create return, check inventory), high stakes on one action (CreateReturn is a write). Hand-coded orchestration is simpler and safer.
The team is small, owns the retailer’s backend, and is already comfortable with Lambda + DynamoDB. Bedrock Agents’ managed-agent story would be a new operational surface.
Return creation must require explicit user confirmation; a hard-coded “if action=CreateReturn, require confirm-intent flag” beats “prompt the agent nicely.”

Verdict: Lambda-based orchestration for this team, this scope. Bedrock Agents would earn its keep if the tool catalogue grew to fifteen tools with unpredictable multi-step combinations; not at three tools with linear flows.

Two orchestration shapes

Same tools, same LLM, different locus of intelligence. Bedrock Agents hands the planning to the model; Lambda orchestration keeps the planning in code and uses the model for narrower tasks.

The pick in depth

Lambda orchestration for the retailer. The orchestrator Lambda:

import boto3, json, os

bedrock = boto3.client("bedrock-runtime")
ddb = boto3.resource("dynamodb").Table(os.environ["SESSIONS_TABLE"])
orders = boto3.client("lambda")  # calling the Orders service's Lambda
returns = boto3.client("lambda")
catalog = boto3.client("lambda")

INTENT_SCHEMA = {
    "type": "object",
    "properties": {
        "intent": {"type": "string",
                   "enum": ["order_status", "create_return", "check_inventory",
                            "smalltalk", "clarify"]},
        "slots": {"type": "object"},
        "needs_confirmation": {"type": "boolean"},
    },
    "required": ["intent", "slots"],
}

def handler(event, context):
    session_id = event["session_id"]
    user_msg = event["message"]

    session = ddb.get_item(Key={"session_id": session_id}).get("Item", {"history": []})
    session["history"].append({"role": "user", "content": user_msg})

    # 1. Classify intent + extract slots
    resp = bedrock.converse(
        modelId="anthropic.claude-sonnet-4-5-20250929-v1:0",
        system=[{"text": "Classify the user's intent. Return JSON matching schema."}],
        messages=[{"role": "user", "content": [{"text":
            f"History: {json.dumps(session['history'][-6:])}\nSchema: {json.dumps(INTENT_SCHEMA)}"
        }]}],
    )
    parsed = json.loads(resp["output"]["message"]["content"][0]["text"])

    # 2. Dispatch -- deterministic, guardrails explicit
    if parsed["intent"] == "order_status":
        result = orders.invoke(FunctionName="order-status",
                               Payload=json.dumps({"order_id": parsed["slots"]["order_id"]}))
        svc_data = json.loads(result["Payload"].read())

    elif parsed["intent"] == "create_return":
        if not session.get("pending_return_confirmed"):
            # Ask for confirmation; do NOT call CreateReturn yet
            session["pending_return"] = parsed["slots"]
            svc_data = {"needs_confirmation": True, "summary": parsed["slots"]}
        else:
            result = returns.invoke(FunctionName="create-return",
                                    Payload=json.dumps(session["pending_return"]))
            svc_data = json.loads(result["Payload"].read())
            session.pop("pending_return", None)

    elif parsed["intent"] == "check_inventory":
        result = catalog.invoke(FunctionName="check-inventory",
                                Payload=json.dumps(parsed["slots"]))
        svc_data = json.loads(result["Payload"].read())

    else:
        svc_data = None

    # 3. Draft the reply
    reply_resp = bedrock.converse(
        modelId="anthropic.claude-sonnet-4-5-20250929-v1:0",
        system=[{"text": "Draft a concise, friendly reply based on the service result and history."}],
        messages=[{"role": "user", "content": [{"text":
            f"History: {json.dumps(session['history'][-6:])}\n"
            f"Service result: {json.dumps(svc_data)}"}]}],
    )
    reply = reply_resp["output"]["message"]["content"][0]["text"]

    session["history"].append({"role": "assistant", "content": reply})
    ddb.put_item(Item=session)
    return {"reply": reply}

Three LLM-narrow tasks (intent classification, response drafting) wrapping deterministic orchestration. CreateReturn’s confirmation flow is an if, not prompt language. Every step is a CloudWatch log line.

When Bedrock Agents would be the correct answer instead. A different team, a different retailer: twenty tools including “search_products”, “compare_products”, “recommend_accessories”, “schedule_installation”, “track_shipment”, “check_warranty_eligibility”, “initiate_return”, and so on, with unpredictable multi-tool flows (“find me a toaster under $50 that ships to my address, then add a warranty if available”). Hand-coding the flow graph for all plausible combinations is combinatorial. The agent’s planner handles it.

The agent shape:

bedrock_agent = boto3.client("bedrock-agent")
agent = bedrock_agent.create_agent(
    agentName="retailer-bot",
    foundationModel="anthropic.claude-sonnet-4-5-20250929-v1:0",
    instruction=(
        "You are a retailer customer assistant. "
        "Use the action groups to answer questions. "
        "Never call CreateReturn without explicit user confirmation -- "
        "ask the user to confirm the summary before calling."
    ),
    idleSessionTTLInSeconds=900,
)
# Attach action groups...
bedrock_agent.create_agent_action_group(
    agentId=agent["agent"]["agentId"],
    actionGroupName="orders",
    actionGroupExecutor={"lambda": "arn:aws:lambda:...:function:order-status"},
    apiSchema={"s3": {"s3BucketName": "...", "s3ObjectKey": "orders-openapi.yaml"}},
)
# ...

User confirmation is a first-class feature: the action group schema can mark an action as requiring confirmation, and the agent framework will pause and ask the user before invoking.

A worked interaction

A user says “where’s order 12345?”

Lambda orchestration trace:

INFO orchestrator-123: fetched session 12345 (3 messages in history)
INFO orchestrator-123: classified intent=order_status slots={order_id: 12345}
INFO orchestrator-123: called order-status lambda → {status: "shipped", eta: "2027-09-27"}
INFO orchestrator-123: drafted reply "Your order has shipped and should arrive on 27 September."
Total latency: ~1.2s (two LLM calls + one service call + two DDB ops).

Bedrock Agents trace:

agent-trace: received user message
agent-trace: invoking OrdersAction(order_id=12345)
agent-trace: received response {status: "shipped", eta: "2027-09-27"}
agent-trace: generating final response
agent-trace: "Your order has shipped and will arrive on September 27."
Total latency: ~1.4s (agent plan + one action call + final response).

For a simple case, near-identical. For a multi-step case (“I want to return the blue one from my last order and replace it with the red one in size medium”), the agent plans the sequence (look up order, find blue item, start return, check red inventory); the Lambda orchestrator would need hard-coded logic to chain. Three-tool-simple-intent case: Lambda wins on clarity. Twenty-tool-unpredictable-intent case: Agents win on scalability of logic.

What’s worth remembering

The choice is where the intelligence sits. Bedrock Agents put planning in the LLM; Lambda orchestration puts planning in code.
Bedrock Agents reduce orchestration code. Adding a tool is an OpenAPI schema + Lambda; the framework calls it when the agent decides.
Lambda orchestration reduces reasoning distance from logs. Every step is a log line; debugging is conventional.
Confirmation for destructive actions matters. Agents support “action requires user confirmation” as a first-class feature; Lambda orchestration does it with if needs_confirmation guards.
Few tools → Lambda orchestration is simpler. Two or three well-understood tools, linear flows, small team: hand-coded wins.
Many tools with unpredictable interactions → Agents earn their keep. Fifteen+ tools, multi-step combinations you can’t enumerate: the agent’s planner is doing real work.
Conversation state is different. Agents manage it for you; Lambda orchestration writes to DynamoDB with a TTL. Trade is convenience vs control.
Guardrails and traceability diverge. Agents: prompt engineering + framework features + agent trace. Lambda: conditionals + CloudWatch Logs + Step Functions history.

“Agent or function” is a judgement about the team’s risk appetite, the tool-catalogue size, and who’s on call when the assistant says something strange. Both patterns work; picking the one that matches the team’s operational habits is what makes it maintainable.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.