Exam Room · Advanced GenAI

How to Wire Function Calling Through Bedrock

July 31, 2026 · 22 min read

The situation

An internal productivity assistant for the engineering team needs to do more than answer questions about docs. Common asks:

“What’s the status of build #4812?” → call the CI API, fetch the build status, summarise.
“Create a Jira ticket for the bug in the login page.” → call Jira, capture the returned ticket ID, confirm.
“Page the on-call for the payments team.” → call PagerDuty, trigger the incident.
“What did we deploy last week?” → call the deploy-history service, filter, summarise.
“Search our wiki for the runbook on database failover.” → call internal search, retrieve top 3, cite.

Five tools. The assistant should know which to call, pass the correct arguments, confirm before destructive ones (paging, ticket creation), and hand back structured results the model can weave into a coherent reply.

The team has Bedrock Claude Sonnet 5 available and familiar. They want to ship something this sprint, keep the tooling changeable (add a sixth tool next sprint, tweak a parameter), and avoid building orchestration that duplicates what the API already offers.

What actually matters

Function calling is a protocol between the model and the caller. The caller advertises a set of tools, each with a name, description, and typed arguments. The model, given a user message, decides whether to produce text, to call a tool, or both. On a tool call, the caller executes the function, feeds the result back to the model, and the model either calls more tools or produces a final response.

The first decision is which surface to call through. Some inference APIs offer tool use as a first-class, model-agnostic primitive, one code path regardless of which model is plugged in. Others require model-specific payload shapes. A layer above those, fully managed agent runtimes take the loop off the caller’s hands entirely, in exchange for a heavier abstraction. The correct level depends on how many tools, how much session state, and how much orchestration the team wants to own.

The second is how tools are described. JSON schema is the lingua franca: each tool has a name, a description, and typed parameters. The quality of the descriptions drives the quality of the model’s tool choice, a description of “Create a Jira ticket” is less helpful than “Create a Jira ticket in the specified project with summary, description, and assignee. Use when the user asks to report a bug, track a task, or file work. Returns the new ticket’s key and URL.”

The third is the execution loop. After the model emits a tool call, the caller parses the call, validates the arguments against the schema, executes the tool, captures the result (or error), packages it back to the model, and re-invokes. The model then either calls another tool or produces a final message. Loop until the model stops calling tools.

The fourth is whether the caller can constrain tool choice, force the model to use a specific tool, or any tool, or none. Useful for structured-output prompts: force a specific tool with a known schema and the response is guaranteed structured.

The fifth is confirmation and side-effect safety. Tools with side effects (creating a ticket, paging a human) should surface to the user for confirmation before the tool actually runs. This is pattern-level, not protocol-level: the loop pauses on “destructive” tool calls and waits for user confirmation.

And observability, which sounds optional until the first bad afternoon. Every tool call, name, arguments, result, duration, is a debug signal. When an assistant does the wrong thing, the trace of tool calls explains why.

What we’ll filter on

Multi-model fit, does this interface work across Claude, Nova, Llama, etc.?
Schema ergonomics, how easy is it to declare tools and parse calls?
Orchestration surface, how much loop code we write vs the platform runs?
Side-effect safety, is there a confirmation gate built in?
Observability, what traces do we get for free?

The function-calling landscape

Bedrock Converse API with toolConfig. The unified interface. Pass toolConfig: { tools: [{ toolSpec: { name, description, inputSchema: { json: {...} } } }, ...] } in the request. The model’s response includes output.message.content as a list of content blocks; toolUse blocks have toolUseId, name, and input. The caller executes, returns a toolResult block with the matching toolUseId, and sends the conversation back. Works across Claude, Nova, Llama, Mistral, any Bedrock model that supports tool use. Same code path regardless of model.
Claude-specific via InvokeModel with Anthropic’s Messages payload. Pre-Converse, Claude’s own tool-use schema was accessed through InvokeModel with model-specific body. Still works; Converse wraps it. Direct usage makes sense only when we need Claude-specific features Converse hasn’t exposed yet.
Bedrock Agents with action groupsAction groupThe bundle of API operations a Bedrock agent is allowed to call, described by a schema so the model knows what each one does. . A higher-level service. Define an action group with an OpenAPI schema or function schema; Bedrock Agents handles the tool loop, session state, and tracing. Adds the agentAgentA system that wraps an LLM with tools, memory, and a loop, so it can take multi-step actions toward a goal rather than just answering one prompt. runtime between the caller and the model. Suits long-lived conversations with many tools and complex session state; heavy for a simple five-tool assistant.
LangChain tool-use abstractions. Python-side framework wrapping Bedrock tool use. Cleaner developer ergonomics for some patterns (decorator-style tool definitions, structured parsing). Adds a dependency and an abstraction layer between our code and the API.
Custom orchestration around InvokeModel plain text. Parse a structured response (JSON with tool name and args) from the model’s output. Pre-Converse pattern, brittle, no longer recommended.

Side by side

Option	Multi-model	Schema ergonomics	Orchestration	Side-effect safety	Observability
Converse + toolConfig	Native	JSON schema in-line	Caller writes loop	Caller’s job	CloudWatch + CloudTrail
InvokeModel (Messages)	Claude only	Anthropic schema	Caller	Caller	Same
Bedrock Agents	Via Agents	OpenAPI / function	Managed loop	Built-in confirmation	Traces built in
LangChain tools	Any SDK	Decorator-friendly	Framework	Framework’s	Its own + CloudWatch
Custom plain-text	Any	Brittle	Everything	Ours	Ours

For five tools, a moderate conversational assistant, and a team already using Bedrock Converse elsewhere, the Converse API with toolConfig is the natural fit. It’s multi-model, the schema is declarative, the loop is under ~50 lines, and the observability story is the same as any other Bedrock call. Bedrock Agents earns its keep once the tool surface grows to 20+ tools with complex session requirements; for five, Agents is overkill.

The tool-use loop, in shape

The loop: call Converse, check for tool-use blocks, gate destructive tools on user confirmation, execute, feed the result back, repeat until the model returns a plain text response.

The pick in depth

Tool declarations. Each tool is a JSON spec passed in toolConfig.tools. The description is where the model learns when to use the tool; treat it as prompt engineeringPromptThe input you hand to an LLM – system instructions, user message, examples, retrieved documents, tool descriptions, the lot. , not documentation. Compare:

Bad: "description": "Create a Jira ticket."

Good: "description": "Create a Jira ticket in the specified project with a summary, description, and assignee. Use when the user asks to file a bug, report an issue, or track a task. Requires the project key (e.g., ENG, SRE) and assignee username. Returns the created ticket's key and URL."

The tool spec also declares inputSchema with typed, constrained parameters. Enums for fixed-value fields (urgency: ["high", "low"]), format strings where sensible (assignee: { type: "string", pattern: "^[a-z]+$" }), required vs optional explicitly marked. The model respects the schema for the most part; schema violations are rare but do happen with weaker models, and the caller should validate before executing.

The loop. In Python:

def run_assistant(user_message, session):
    messages = session.get_history() + [{"role": "user", "content": [{"text": user_message}]}]
    while True:
        resp = bedrock.converse(
            modelId=MODEL_ID,
            messages=messages,
            toolConfig={"tools": TOOL_SPECS},
            system=[{"text": SYSTEM_PROMPT}],
        )
        out_msg = resp["output"]["message"]
        messages.append(out_msg)

        tool_uses = [c["toolUse"] for c in out_msg["content"] if "toolUse" in c]
        if not tool_uses:
            return _extract_text(out_msg)

        tool_results = []
        for call in tool_uses:
            if call["name"] in DESTRUCTIVE_TOOLS:
                if not session.confirm(call):
                    tool_results.append({"toolUseId": call["toolUseId"], "content": [{"text": "user declined"}], "status": "error"})
                    continue
            try:
                result = dispatch(call["name"], call["input"])
                tool_results.append({"toolUseId": call["toolUseId"], "content": [{"json": result}]})
            except Exception as e:
                tool_results.append({"toolUseId": call["toolUseId"], "content": [{"text": str(e)}], "status": "error"})

        messages.append({"role": "user", "content": [{"toolResult": tr} for tr in tool_results]})

50 lines of orchestration; no framework. The model could emit multiple tool calls in one response (parallel tool use), the loop handles that by executing all and returning all results together.

Destructive-tool confirmation. A runtime set (DESTRUCTIVE_TOOLS = {"create_jira_ticket", "page_on_call"}) intercepts tool calls before execution and asks the user to confirm via the session’s UI (a button in chat, a modal, a Slack approve/deny). Only on approval does the actual call run. On denial, a toolResult saying “user declined” goes back; the model gracefully apologises and offers alternatives. The confirmation gate is application-layer, not Bedrock-layer. Converse itself doesn’t know which tools are destructive; we do.

Error handling. Tool errors (API timeout, 4xx from Jira, invalid arguments that passed schema but failed at execution) are returned as toolResult with status: "error" and a text explaining what went wrong. The model sees the error and either retries with different arguments, explains to the user what failed, or escalates. Surfacing errors as data lets the model recover; throwing exceptions up the stack stops the conversation.

Tool choice. toolConfig also accepts a toolChoice parameter: "auto" (default, model decides), "any" (force some tool), "tool": {"name": "..."} (force a specific tool). Forcing a specific tool is handy for structured outputs: define a tool with the desired output schema and force it; the model’s response is guaranteed to match the schema.

Observability. Every Converse call emits CloudWatch metrics (latency, token count, error rate). Each tool call is logged at the application layer with the tool name, arguments (redacted if sensitive), result, duration, and the session ID. The trace for one user message might be “user: …; tool_call: get_deploy_history; tool_result: …; tool_call: search_wiki; tool_result: …; assistant: …”, when something goes wrong, this is the debug surface.

A worked example: a paging flow

User: “The payments team’s on-call, can you page them? We’re seeing 500s on checkout.”

First Converse call. Response: toolUse(name=page_on_call, input={team: "payments", urgency: "high", note: "500s on checkout"}).
Caller sees destructive tool. Surfaces confirmation: “Page payments on-call (high) with note ‘Seeing 500s on checkout’?”
User confirms. Tool executes; PagerDuty returns incident ID INC-4521.
Caller sends back toolResult(toolUseId=..., content={incident_id: "INC-4521", url: "..."}).
Second Converse call. Response: text-only. “I’ve paged the payments on-call with a high-urgency incident (INC-4521). The on-call engineer should acknowledge within 5 minutes.”
Session history now includes the user message, the toolUse, the toolResult, and the final text. Ready for the next turn.

Total time: ~3 seconds of model latency, ~1 second for PagerDuty, ~5 seconds waiting on user confirmation. Total tool calls made that the user couldn’t see in the trace: zero.

What’s worth remembering

Use Converse, not InvokeModel, for tool use. Unified across models, clean schema, future-proof.
Tool descriptions are prompts. Write them like you’re instructing a new team member; the model reads them to decide when to use each tool.
The loop is yours but it’s small. ~50 lines covers the common case; frameworks exist but often aren’t needed.
Destructive tools need a confirmation gate at the application layer. Converse doesn’t know which tools are dangerous; we do.
Return errors as data, not exceptions. The model can recover from a structured error; it can’t recover from a stack trace.
Parallel tool use happens. The model can emit multiple tool calls in one response; execute all, return all.
Force a tool for structured output. toolChoice: { tool: { name } } guarantees the schema when that’s what you need.
Agents is for 20+ tools and complex state, not 5 tools. The simpler option is almost always correct at small tool counts.

Five tools, a 50-line loop, one confirmation gate, CloudWatch metrics for free, and an assistant that can actually do the things engineering asks it to do rather than just explain how to do them. The machinery is less than you think; the description quality matters more than you think.

These posts are LLM-aided. Backbone, original writing, and structure by Craig. Research and editing by Craig + LLM. Proof-reading by Craig.