Guide, keyword: multi agent orchestration patterns
Eight patterns in the tutorial. Three on the allowlist.
Every guide for multi agent orchestration patterns enumerates the named shapes as if each one is a neutral design option. We do not. Across five shipped production systems we let three patterns past the deploy gate and reject five by name. The allowlist lives in a YAML file in your repo. A Python script in CI fails any PR whose graph does not fingerprint-match one of the three. The patterns this page names are not style preferences. They are the shortlist that survives a worker restart at 2am.
patterns.md is ~60 lines of YAML. topology_walker.py is ~80 lines of pure Python. Both checked into your repo, both required for a merge to main.
Senior engineer embedded in your repo. Your graph.py. Your CI.
Why the SERP treats patterns as a menu, and why we do not
Search multi agent orchestration patterns and you will find a clean list of named shapes: supervisor, swarm, hierarchical, sequential pipeline, planner-executor, debate, reflector, auction. Every page describes them the same way: here is the diagram, here is the trade-off, pick what fits your use case. A framework vendor cannot publish a stricter view, because every pattern on that list maps to an API they already ship.
We can, because the product is not a framework. It is a senior engineer inside your repo for six weeks and the IP they leave behind. On an enterprise or Series A AI-native client, we have seen five named patterns fail silently in production often enough to put them on a reject list. The other three we ship with a specific numeric cap that stops the specific MAST failure each invites. The allowlist is the artifact. Published here, enforced in your CI.
“Orchestration patterns we allow past a production deploy gate across five shipped systems (Monetizy.ai, Upstate Remedial, OpenLaw, PriceFox, OpenArt). The allowlist is a YAML file called patterns.md; a topology walker in CI fingerprints graph.py and fails the build if it does not match one of the three approved shapes. Five named SERP patterns are explicitly rejected by the file, with the MAST failure mode each one invites cited next to it.”
PIAS pattern allowlist, applied across five production engagements
The four numbers on the allowlist
These are not a style preference. Each one is a field in a CI check. The count of approved patterns, the count of named rejections, the file that controls both, and the tolerable drift from fingerprint to graph.py.
0 patterns approved. 0 patterns rejected. 0 file (patterns.md) is source of truth. 0 fingerprint drift tolerated by CI.
The three patterns we ship, named
Each one has a single numeric cap, a single Pydantic field, and a single conditional edge that reads the cap before routing. The cap is the thing that keeps the pattern's MAST failure mode from triggering. Remove it and you have a tutorial graph, not a production graph.
pattern allowlist, in the order we added them to the engagement checklist
1. supervisor + capped turn budget
One supervisor node routes to N worker nodes via add_conditional_edges. Workers return data. Routing lives in a pure function over state. A typed turn_budget field counts re-routes and terminates when it hits zero.
2. sequential pipeline + idempotency key
Nodes run in a fixed linear order. No loops, no conditional fan-out. Every node is pure over (state, idempotency_key) and writes its result to the same row in an append-only audit table. Re-runs are safe.
3. planner-executor + bounded plan depth
A planner node emits a typed Plan(list[Step]) once, the executor runs each Step, the planner is not called again on that case. The plan length is bounded. Re-planning is a new case, not a recursive call.
The five patterns we reject, by MAST failure mode
Every rejected entry carries the MAST code that labels its failure mode. MAST is the Multi-Agent System Failure Taxonomy, used across the LangGraph eval corpus, the AutoGen paper set, and our own internal post-mortems. The taxonomy is the reason the rejects are named instead of shamed.
Hierarchical (nested supervisors)
A supervisor whose workers are themselves supervisors. MAST-1.3 inter-agent misalignment. Every nesting level multiplies the state surface the on-call has to reconstruct at 2am. We grep for any second-level StateGraph and fail the build.
Swarm / peer-to-peer
Any agent can hand off to any other. MAST-2.4 infinite handoff. Inevitable with more than 3 agents and no turn budget. We test the graph for the longest simple path; if it exceeds 4, the PR fails.
Debate / critic loop
Two agents critique each other until one concedes. MAST-3.1 unproductive persistence. No published eval method covers debate transcripts across vendors. We do not ship it. If you need a critic, it is a single non-loop scorer node.
Reflector / self-rewrite
An agent reflects on its own output and rewrites. MAST-3.2 unbounded self-critique. p95 tokens drift 40% in 30 days. Replaced with a single-pass scorer plus a hard turn_budget retry, not a reflection loop.
Auction / market
Agents bid for tasks. MAST-1.4 opaque selection. The routing decision is a ranked list of LLM outputs, not graspable in a review screen. For enterprise audit we need every route printable by a pure function over state. Rejected.
The state file that backs every approved pattern
Every field in state.py maps to a required_state_fields entry in patterns.md. Drop turn_budget and the supervisor pattern no longer fingerprint-matches. Drop idempotency_key and the pipeline pattern no longer fingerprint-matches. The type system and the CI walker keep each other honest.
patterns.md: the allowlist, in one file
Sixty lines of YAML. Three approved entries with a topology fingerprint and one MAST code each. Five rejected entries with a MAST code each. This file is the orchestration contract between us and your on-call.
topology_walker.py: the CI check
Eighty lines of pure Python standard library. Parses graph.py with the ast module, computes the topology fingerprint, compares against every entry in patterns.md. No runtime execution of graph.py required. Wired as a required status check on main.
What the supervisor pattern looks like under load
A case with two workers and a turn budget of three. Every re-route decrements the budget. Budget zero forces the deterministic human review node, never a fourth LLM call. The numeric cap is what turns a supervisor topology into a production-safe supervisor pattern.
supervisor-capped-budget, three re-routes then budget exhausted
SERP-default view vs. allowlist view, side by side
The left column is how nearly every page treats the topic. The right column is how we treat it inside a client repo. Same pattern names. Very different operational surface area.
| Feature | SERP-default neutral-menu view | PIAS allowlist view |
|---|---|---|
| How many patterns you are presented with | 8 or more. Supervisor, swarm, hierarchical, sequential, debate, planner-executor, reflector, auction. All labeled as viable design choices. | 3 approved, 5 rejected. The allowlist lives in your repo, not in a blog post. |
| How you choose between them | Pick the one that fits your use case. Use your judgment. Read the trade-offs. | One question: what is the shape of state? Conversation with re-route, scored pipeline, or plan-then-execute. Each maps to one approved pattern. |
| What prevents the pattern from drifting | Style guide in a Notion page. Code review. Maybe a Confluence doc nobody reads. | patterns.md plus topology_walker.py in CI. Required status check on main. A PR that changes the topology fingerprint fails the build. |
| What stops the pattern's specific failure mode | Nothing specific. The tutorial does not mention MAST. Failures are attributed to "prompt engineering." | Each approved pattern carries one numeric cap enforced by a Pydantic Field constraint and one conditional edge that reads it before routing. |
| What happens when a PR adds a new pattern | It merges, because the style guide has no teeth. | topology_walker.py fails the PR. Adding a fourth approved pattern requires editing patterns.md, which requires a scoping discussion. |
| What you keep after the vendor leaves | The framework docs. They change with releases. | patterns.md (~60 lines YAML) and topology_walker.py (~80 lines Python) in your repo. Your on-call tightens or loosens constraints in a PR. |
The pre-merge pattern review checklist
Seven statements. Six are machine-checked by topology_walker.py and friends. The seventh is a one-sentence PR description line that names the pattern. The cheapest PR template in the repo and the one that prevents the most 2am incidents.
pattern review gate
- scripts/topology_walker.py prints OK and names one of: supervisor-capped-budget, sequential-pipeline-idempotent, planner-executor-bounded-depth.
- patterns.md in the PR is identical to main or is the PR's intended scope. A patterns.md edit requires its own review thread.
- For supervisor: state has turn_budget: int = Field(ge=0, le=3) and the supervisor router reads it before routing. No node can mutate turn_budget upward.
- For pipeline: every node call sites include idempotency_key, and audit_rows writes use ON CONFLICT (case_id, node_name, turn_id) DO NOTHING.
- For planner-executor: the planner node is referenced by exactly one add_edge call, and len(state.plan.steps) is checked in the executor before iteration.
- No new file under src/ imports StateGraph except graph.py. A subgraph is a second fingerprint and patterns.md forbids nested supervisors.
- The PR description names which approved pattern it touches. A PR titled 'add reflection step' cannot merge because reflector is on the reject list.
Anchor fact
Three fingerprints. Everything else fails the build.
Every engagement commits patterns.md and scripts/topology_walker.py on day 1. patterns.md names three approved orchestration patterns by topology fingerprint: supervisor-capped-budget (nodes_max 8, edges_max 8, turn_budget required), sequential-pipeline-idempotent (linear, cycle forbidden, idempotency_key required), planner-executor-bounded-depth (plan_depth <= 4, plan.steps <= 8, planner called once). It also names five rejected patterns with the MAST failure mode each invites: hierarchical (MAST-1.3), swarm (MAST-2.4), debate (MAST-3.1), reflector (MAST-3.2), auction (MAST-1.4). topology_walker.py is a required status check on main. A PR whose graph.py fingerprints into anything outside the three approved shapes cannot merge. patterns.md is the shortest file in the repo and the one doing the most work.
Fingerprint your existing multi agent graph with us
Sixty-minute scoping call with the senior engineer who would own the allowlist landing. You leave with a written topology fingerprint of your current graph.py, which approved pattern it matches (or does not), and an estimate for getting patterns.md plus topology_walker.py wired as a required CI check.
Book a call →Multi agent orchestration patterns, answered
What is multi agent orchestration patterns, in plain terms?
Multi agent orchestration patterns are the named topologies people use to wire several LLM-driven nodes together: supervisor routing to workers, a sequential pipeline, a planner-executor split, a swarm where anyone hands off to anyone, a debate between two critics, a reflector rewriting its own output, and a few more. In production the right question is not which one to use, but which subset is audit-safe and cycle-bounded enough to survive a worker restart at 2am. Our allowlist contains three patterns. The rest we reject by name with the specific MAST failure they invite.
Why reject hierarchical orchestration? It is in almost every tutorial.
A supervisor whose workers are themselves supervisors multiplies the state surface on every nesting level. MAST-1.3 inter-agent misalignment sits at the top of that taxonomy for a reason: the outer supervisor and the inner supervisor develop subtly different routing assumptions, and the only place those assumptions are reconcilable is in the prompt. We reject hierarchical because the single-layer supervisor-with-capped-budget pattern is strictly enough to handle every use case we have shipped. If a client insists on a nested shape, the honest answer is two separate graphs with a queue between them, not a nested StateGraph.
Why reject swarm / peer-to-peer?
Any agent handing off to any other makes MAST-2.4 infinite handoff inevitable past three agents unless you layer a budget on top. If you are going to layer a budget on top, you have just rebuilt the supervisor pattern with extra edges. The supervisor-with-capped-budget pattern achieves everything swarm promises with one-tenth the routing surface, and its topology fingerprint is enforceable in CI. Swarm is appealing in demos and unscalable in audits.
Why reject the debate / critic loop?
The debate pattern pairs two agents that critique each other until convergence. MAST-3.1 unproductive persistence lands here: the loop can terminate after one turn or after twenty, and which it does is a property of prompt and seed, not of graph structure. Worse, evaluating debate transcripts requires a pairing-specific rubric that does not transfer across model vendors, which breaks our vendor-neutral rule. When clients ask for a critic, we ship a single-pass scorer node and use the supervisor's turn budget to bound any re-routing.
Is planner-executor safe? It is recursive by nature.
It is safe when and only when recursion is forbidden at the graph level. Our approved variant has a single call to the planner per case, a typed Plan with len(steps) <= 8, and a plan_depth bound of 4. The executor iterates the plan list in order. The planner node is not called from within a step. A step is not allowed to return a new plan. MAST-3.2 unbounded self-critique is prevented by the grammar, not by the prompt. If your repo has a plan-inside-plan path, it is not the pattern we approve, and topology_walker will reject the PR.
What is in patterns.md exactly?
patterns.md is a ~60 line YAML file with two top-level keys: approved and rejected. Each approved entry has a name, a fingerprint (nodes_max, edges_max, cycle_policy, fan_out_max, required_state_fields, optional constraints) and the MAST code it prevents. Each rejected entry has a name and the MAST code it invites. The file is source of truth for scripts/topology_walker.py. Changing patterns.md is a scoping event, not a hotfix.
How does topology_walker.py read graph.py?
It parses graph.py with the standard library ast module. It counts every add_node, add_edge, and add_conditional_edges call. It infers fan-out per node by counting how many distinct next-node string literals appear inside each conditional router function. It walks the induced graph and computes whether a cycle exists outside the supervisor-budget pattern. It compares the resulting fingerprint to every approved entry in patterns.md. If no match, it exits non-zero. It is ~80 lines of pure Python. No runtime execution of graph.py is needed.
What if my use case genuinely needs a fourth pattern?
Two answers. First: across five shipped systems at Monetizy.ai, Upstate Remedial, OpenLaw, PriceFox, and OpenArt, we have not needed one. Every use case has fit supervisor-capped-budget, sequential-pipeline-idempotent, or planner-executor-bounded-depth, sometimes composed as two graphs joined by a queue. Second: if your use case truly does not fit, the answer is to scope a week of engagement on defining the fourth pattern's fingerprint, its MAST failure mode, and its CI check. It is not to silently add it during a feature PR. patterns.md makes that discussion explicit.
How is this different from the ceiling numbers (8 edges, 3 turn budget) you publish elsewhere?
The ceiling numbers are one layer of defense. Topology fingerprints are a higher layer. You can satisfy the 8-edge ceiling with a swarm of 4 agents and still have a peer-to-peer topology we reject. topology_walker.py asks a question the edge counter cannot: is the shape of this graph one we are willing to run at 2am? The numbers live inside patterns, not on top of them.
Does this apply to LangGraph, Pydantic AI, and custom DAGs equally?
Yes, because the fingerprint is computed from the source of the graph file, not from a framework-specific runtime. In LangGraph it walks StateGraph calls. In Pydantic AI we parse Agent / Tool definitions the same way. In a custom DAG we parse the .add_step / .add_branch calls in your orchestrator file. patterns.md does not reference a framework. A client who switches from LangGraph to a custom DAG keeps their allowlist. This is part of why the leave-behind survives the engagement.
What happens to all the other orchestration patterns I have seen in papers?
The academic literature describes many more patterns, including voting ensembles, tree-of-thought search, graph-of-thought, skill libraries with tool-caching. We treat them the same way: run a MAST analysis, pick the closest approved pattern, map the paper's idea onto it. Most of what people call 'a new pattern' is a new prompting strategy inside supervisor + worker or planner + executor. The fingerprint does not care about the prompt. It cares about the graph.
Adjacent guides
More on shipping production multi agent systems
The numerical ceiling we refuse to cross
Eight edges, a three-turn handoff budget, two models in the critical path. The four numbers that sit underneath every approved pattern fingerprint.
Five LangGraph idioms we delete in week 1
The grep script we run on day 1. Five patterns that pass review and fail in prod: Command(goto), interrupt, MessagesState, with_structured_output in supervisors, MemorySaver.
Framework selection matrix
Orthogonal to this page. Five shipped systems, three frameworks. When LangGraph, when Pydantic AI, when a custom DAG. Framework choice does not set the pattern, it hosts it.