Guide, keyword: aws multi agent orchestration
On AWS, pick the runtime first. The framework comes after.
Every AWS guide on multi agent orchestration pushes one of three runtimes as if it were the only option: Bedrock Agents, Step Functions, or a tutorial LangGraph stack. We have shipped production on the third. One named system, 400K+ emails, Bedrock as the primary model, OpenAI as the latency fallback, a Postgres audit row per transition. This guide walks the three runtime choices honestly and shows the actual graph we run in Fargate.
LangGraph on Fargate, Bedrock Runtime as a model, OpenAI as the conditional-edge fallback.
Upstate Remedial: 400K+ emails, zero reported compliance incidents
The runtime question no AWS blog answers honestly
The AWS developer docs recommend Bedrock Agents. The AWS architecture blog recommends Step Functions. Vendor tutorials recommend their framework running on an EC2 instance. Each source recommends the runtime that pays them. None publishes a cross-runtime selection rubric, because doing so would require admitting that the wrong call on a production multi agent system locks your orchestration inside a boundary you will later want to leave.
We do not sell a runtime. We sell a senior engineer who picks the one that fits the problem shape and leaves the decision checked into your repo next to graph.py. That is the whole differentiator of this page: we can say which AWS pattern we would not ship on, and why.
“Emails sent in production through a LangGraph-on-Fargate orchestrator, with Bedrock Runtime as the primary model and OpenAI as the conditional-edge fallback, and one Postgres audit row per transition. Zero reported compliance incidents in the shared metric window.”
PIAS case study: Upstate Remedial Management, compliance-gated outbound email
The three AWS runtimes, side by side
Not three frameworks; three runtimes. A framework like LangGraph can run on any of them, but the runtime you pick determines who owns the audit log, who owns the fallback edge, and how much it costs to leave. Most SERP results conflate framework and runtime. They are not the same thing.
Pattern 1: Bedrock Agents (managed runtime)
AWS owns the graph. You declare a supervisor agent, collaborator agents, and action groups; the execution loop is Bedrock's. Fastest to a first demo. Least portable when you want to change model vendor, audit schema, or cloud. Pick it when you are AWS-first and the system is internal tooling, not a product surface.
Pattern 2: Step Functions (managed control plane)
AWS orchestrates tasks; each task is yours. Great when the problem shape is a workflow with retries, timeouts, and parallel branches you want AWS to handle. Pick it when the multi-agent system is really a multi-step pipeline and the agent boundaries are fuzzy.
Pattern 3: Your own orchestrator, Bedrock as a model
You write the graph (LangGraph, Pydantic AI, custom DAG) and run it in ECS, Fargate, or Lambda. Bedrock is one of the models you call via bedrock-runtime:InvokeModel. This is the shape of Upstate Remedial's 400K+ email system. Most portable, most auditable, highest engineering cost to stand up from scratch.
The selection matrix, as one question: who owns the orchestrator?
Left column: you own the orchestrator (pattern 3). Right column: AWS owns it (pattern 1, Bedrock Agents). The matrix is the practical cost of that ownership line at week 0, week 52, and at the moment a regulator or a board member asks a hard question.
| Feature | Bedrock Agents owns it | You own the orchestrator |
|---|---|---|
| Who owns the orchestrator runtime | AWS. Bedrock Agents is a managed service and the graph lives inside its boundary. | You. The graph runs in your container, on your Fargate task, under your IAM role. |
| Where transitions are logged | CloudWatch in Bedrock Agents' schema. Great for Bedrock, opaque to a compliance auditor. | Your Postgres, one row per node transition, queryable with SQL your team already knows. |
| Model swap cost | Model list is Bedrock only. Adding OpenAI or Anthropic-direct means leaving the runtime. | Swap a function body in pick_primary(). Same graph, different model, zero rewire. |
| Multi-cloud exit cost | High. Bedrock Agents definitions, knowledge bases, and IAM wiring are AWS-shaped. | Low. The orchestrator is a container. It runs on ECS today, on GKE tomorrow. |
| Fallback edge | Bedrock Agents supervisor pattern cannot cross-vendor-failover to a non-Bedrock model. | A one-line conditional edge in LangGraph. Bedrock to OpenAI on latency SLO breach, shipped. |
| What the audit looks like | Trace events in CloudWatch. Useful for debugging, weak for a regulator request. | record_turn hook writes the pre-state, post-state, and rubric score for every transition. |
| Who reads the graph at 2am | On-call learning the Bedrock Agents console and its event schema. | On-call engineer reading graph.py and a SQL query. No Bedrock console required. |
Anchor: the actual graph running in Fargate
This is a faithful sketch of the orchestrator at Upstate Remedial’s compliance-gated outbound email system. The code is not a toy; it is the shape of the graph that has shipped 400K+ emails. The two load-bearing lines are the add_conditional_edges call on compliance_check, which routes to a Bedrock drafter when the latency budget is fine and to an OpenAI drafter when it is not, and the record_turn hook wired to every node, which writes one audit row to RDS Postgres per transition.
The portability comes from a second file, models.py. Bedrock is a library call behind an adapter function. Replacing the primary model with Azure OpenAI or Vertex is a one-file change; the graph does not notice.
How it wires up on AWS
No Bedrock Agents. No knowledge bases. No action-group Lambdas. The orchestrator is a container running on ECS Fargate. Bedrock Runtime is reached via bedrock:InvokeModel on the task role. The audit log goes to RDS Postgres. That is the whole box diagram.
Pattern 3 on AWS: LangGraph in Fargate, Bedrock as a model
What the IAM and Bedrock call surface actually looks like
The permission surface is narrow on purpose. If an auditor asks what the orchestrator can do, this is the list. No console-level Bedrock Agents permissions, no knowledge-base ARNs, no action-group Lambda permissions, because those objects do not exist in this pattern.
The five-step runtime decision we run on scoping calls
This is the literal questionnaire our senior engineers walk clients through on the week 0 scoping call. It lives in the engagement one-pager that ships with the proposal. It settles the runtime choice before framework debates start.
1. Name the failure domain
If a wrong outbound is a regulatory incident, your audit log must be in your database on your schema. That alone rules out pattern 1 for most compliance-heavy systems.
2. Ask who owns the orchestrator runtime
A managed runtime is a product commitment. If you expect to be on AWS for the next five years with no vendor review, pattern 1 is cheapest. If a board-level cloud or model review is plausible, pattern 3 is cheaper in total cost.
3. Inspect the fallback edge
If your SLO requires failover to a non-Bedrock model (OpenAI direct, Anthropic direct, a fine-tuned open-weights endpoint), you need pattern 3. Bedrock Agents cannot route to a model that is not in Bedrock.
4. Decide who writes the audit row
Compliance reviewers do not read CloudWatch JSON. They read SQL. If you need a queryable per-transition audit trail in Postgres, the orchestrator has to be yours, with a hook that writes on every edge.
5. Measure exit cost before you start
Count every place a future engineer would have to rewrite if you left AWS: agent definitions, knowledge bases, action groups, IAM wiring. Pattern 3 minimizes that count to one file: graph.py.
Anchor fact
The fallback edge is the tell. Bedrock Agents cannot cross-vendor-failover.
Upstate Remedial’s SLO requires that a drafter latency spike route to a second model, and that second model is OpenAI direct, not in Bedrock. That single requirement eliminates Bedrock Agents as a runtime on day one, because its supervisor/collaborator pattern only routes to models in Bedrock’s catalog. The shipped system uses LangGraph’s add_conditional_edges on compliance_check for that edge, and the edge is a single lambda: lambda s: "drafter_primary" if s.latency_budget_ok else "drafter_fallback". That line is the whole reason the system is on pattern 3.
Receipts
All numbers below are from named production systems on /wins. No invented benchmarks, no sector averages, no AWS marketing numbers.
The 0K+ email figure is on the LangGraph-on-Fargate system at Upstate Remedial. The 0compliance-incidents figure is on the client’s reported metric window, not an internal measure.
Want the runtime decision run on your AWS account, not ours?
Sixty-minute scoping call with the senior engineer who would own the build. You leave with a one-pager: the runtime we would pick, the IAM surface, the fallback policy, and a fixed weekly rate.
Book a call →AWS multi agent orchestration, answered
What does AWS multi agent orchestration actually mean in practice?
It means one of three things: using Bedrock Agents as a managed multi-agent runtime, using Step Functions to orchestrate tasks that happen to be agents, or running your own orchestrator (LangGraph, Pydantic AI, custom DAG) inside ECS, Fargate, or Lambda with Bedrock Runtime as a model endpoint. The AWS docs tend to conflate these under the Bedrock Agents banner; production teams choose between them, and the choice has large portability consequences.
When should we pick Bedrock Agents over running our own orchestrator on AWS?
When you are committed to AWS as both the cloud and the model catalog for the foreseeable future, the system is internal tooling rather than a customer-facing product, you do not need to fail over to a non-Bedrock model, and the Bedrock Agents audit trail in CloudWatch satisfies your compliance team. If any of those four things is not true, pattern 3 (your own orchestrator) is cheaper across a two-year horizon even though it costs more in week 1.
Why run LangGraph on Fargate instead of using Bedrock Agents directly?
Three reasons, all load-bearing for the production system we shipped at Upstate Remedial. First, the fallback edge has to route to OpenAI on latency SLO breach, and Bedrock Agents' supervisor pattern cannot call a model outside Bedrock. Second, the audit row has to land in RDS Postgres on a schema the compliance team already queries, not in CloudWatch on Bedrock's schema. Third, the orchestrator must be portable; if AWS pricing or the model catalog changes materially, we want to redeploy the same container on a different cloud without rewriting agent definitions.
What is the exact anchor pattern on the 400K+ email system?
A LangGraph StateGraph with an add_conditional_edges routing from compliance_check to drafter_primary (Bedrock) when latency_budget_ok is true, and to drafter_fallback (OpenAI direct) otherwise. Every node has a record_turn hook that writes the pre-state, post-state, transition label, and rubric score to Postgres. The Bedrock call goes through bedrock-runtime:InvokeModel on an IAM role scoped to the Fargate task. The graph definition itself is about thirty lines of Python.
Is Step Functions a multi agent orchestrator?
It is a workflow orchestrator that can host tool calls, Lambda functions, and Bedrock calls. You can use it to sequence agents, but it does not give you the typed state or the conditional-edge fallback that a graph library gives you. We use it when the problem is a workflow with well-defined stages and retries, and the agent boundaries are fuzzy. We do not use it when the orchestration is a conversation with a typed state and rubric-gated merges.
How do we audit multi agent systems on AWS for compliance?
The short answer is: do not rely on the model provider's trace. Write your own audit row per transition, in your own database, on a schema your compliance reviewer can query. At Upstate Remedial that is a Postgres table with columns for case_id, node_from, node_to, model_id, latency_ms, rubric_score, transcript_hash, and event_ts, written by a record_turn hook attached to every node in the graph. CloudWatch is fine for ops; it is not fine as the single source of truth for a regulator.
What are the IAM wiring pieces for pattern 3 on AWS?
A Fargate task role with bedrock:InvokeModel scoped to the specific model ARNs, Secrets Manager read for the OpenAI fallback key, RDS IAM auth or Secrets Manager read for the Postgres audit log, KMS decrypt on the CMK that wraps the secrets, and optionally a VPC endpoint for Bedrock Runtime so model traffic does not leave your VPC. That is the whole permission surface. No Bedrock Agents, no knowledge-base ARNs, no action-group Lambdas.
Can we start on Bedrock Agents and migrate to pattern 3 later?
Technically yes; economically it is the most expensive path. Agent definitions, knowledge bases, action groups, and the audit schema all have to be rewritten, and the system is usually in production by the time the migration lands. Our engagement model is to do a week 0 read of the current AWS setup, pick the pattern against a written rubric, and commit to a week-2 prototype and week-6 production handoff on that pattern. Rewrites get priced like greenfield.
Do you have model-vendor neutrality on AWS?
Yes, by construction. The model boundary is a function, pick_primary() and pick_fallback(), that returns a LangChain chat model. Today those are ChatBedrockConverse and ChatOpenAI. Tomorrow they can be ChatVertexAI and ChatAnthropic direct. The graph does not change, the audit log does not change, and the IAM story is localized to the adapter. That is the whole point of picking pattern 3.
What leave-behind artifacts ship with an AWS multi agent orchestration engagement?
Four files committed on main: graph.py (the orchestrator), models.py (the Bedrock and fallback adapters), an eval harness running in GitHub Actions against a case-specific rubric plus ragas, and a runbook keyed to your on-call rotation that names the failover condition, the audit-log query, and the IAM role the task assumes. A 90-minute handoff session with your team. Then we leave. The same engineer stays available for paid two-hour consults at a capped rate for 12 months.
Adjacent guides
More on shipping production AI
Multi agent orchestration, framework choice
LangGraph, Pydantic AI, custom DAG. The selection matrix from five shipped production systems.
The 6-week FDE engagement model
Week 0 scoping, week 2 prototype gate, week 6 handoff. The rubric your board can hold us to.
Shipped systems, cited on the record
Named clients, production metrics, per-system stacks. Includes the 400K+ email system on AWS.