Guide, keyword: multi agent orchestration patterns

Eight patterns in the tutorial. Three on the allowlist.

Every guide for multi agent orchestration patterns enumerates the named shapes as if each one is a neutral design option. We do not. Across five shipped production systems we let three patterns past the deploy gate and reject five by name. The allowlist lives in a YAML file in your repo. A Python script in CI fails any PR whose graph does not fingerprint-match one of the three. The patterns this page names are not style preferences. They are the shortlist that survives a worker restart at 2am.

M
Matthew Diakonov
15 min read
4.9from pattern allowlist applied to five shipped production systems
3 approved orchestration patterns, fingerprinted in patterns.md
5 SERP-default patterns rejected by MAST failure mode
topology_walker.py runs as a required CI status check

patterns.md is ~60 lines of YAML. topology_walker.py is ~80 lines of pure Python. Both checked into your repo, both required for a merge to main.

Senior engineer embedded in your repo. Your graph.py. Your CI.

APPROVED: supervisor-capped-budgetAPPROVED: sequential-pipeline-idempotentAPPROVED: planner-executor-bounded-depthREJECTED: hierarchical (nested supervisors)REJECTED: swarm / peer-to-peerREJECTED: debate / critic loopREJECTED: reflector / self-rewriteREJECTED: auction / marketturn_budget: int = Field(ge=0, le=3)plan_depth: int = Field(ge=1, le=4)idempotency_key: str (required)MAST-1.3: inter-agent misalignmentMAST-2.4: infinite handoffMAST-3.2: unbounded self-critiquetopology_walker --fingerprint-must-be supervisor|pipeline|planner

Why the SERP treats patterns as a menu, and why we do not

Search multi agent orchestration patterns and you will find a clean list of named shapes: supervisor, swarm, hierarchical, sequential pipeline, planner-executor, debate, reflector, auction. Every page describes them the same way: here is the diagram, here is the trade-off, pick what fits your use case. A framework vendor cannot publish a stricter view, because every pattern on that list maps to an API they already ship.

We can, because the product is not a framework. It is a senior engineer inside your repo for six weeks and the IP they leave behind. On an enterprise or Series A AI-native client, we have seen five named patterns fail silently in production often enough to put them on a reject list. The other three we ship with a specific numeric cap that stops the specific MAST failure each invites. The allowlist is the artifact. Published here, enforced in your CI.

3

Orchestration patterns we allow past a production deploy gate across five shipped systems (Monetizy.ai, Upstate Remedial, OpenLaw, PriceFox, OpenArt). The allowlist is a YAML file called patterns.md; a topology walker in CI fingerprints graph.py and fails the build if it does not match one of the three approved shapes. Five named SERP patterns are explicitly rejected by the file, with the MAST failure mode each one invites cited next to it.

PIAS pattern allowlist, applied across five production engagements

The four numbers on the allowlist

These are not a style preference. Each one is a field in a CI check. The count of approved patterns, the count of named rejections, the file that controls both, and the tolerable drift from fingerprint to graph.py.

0Orchestration patterns on our production allowlist
0Named SERP patterns we reject, by MAST failure mode
0File (patterns.md) that controls the allowlist in your repo
0Fingerprint drift allowed. Topology walker blocks the PR.

0 patterns approved. 0 patterns rejected. 0 file (patterns.md) is source of truth. 0 fingerprint drift tolerated by CI.

The three patterns we ship, named

Each one has a single numeric cap, a single Pydantic field, and a single conditional edge that reads the cap before routing. The cap is the thing that keeps the pattern's MAST failure mode from triggering. Remove it and you have a tutorial graph, not a production graph.

pattern allowlist, in the order we added them to the engagement checklist

1

1. supervisor + capped turn budget

One supervisor node routes to N worker nodes via add_conditional_edges. Workers return data. Routing lives in a pure function over state. A typed turn_budget field counts re-routes and terminates when it hits zero.

Numeric cap: turn_budget: int = Field(ge=0, le=3). MAST failure it prevents: infinite handoff (MAST-2.4). The conditional edge is a pure function that reads state.turn_budget before it reads any score. Shipped in Upstate Remedial (400K+ emails), OpenLaw (case drafting), PriceFox (quote review).
2

2. sequential pipeline + idempotency key

Nodes run in a fixed linear order. No loops, no conditional fan-out. Every node is pure over (state, idempotency_key) and writes its result to the same row in an append-only audit table. Re-runs are safe.

Numeric cap: every node call includes an idempotency_key str derived from (case_id, node_name, turn_id). MAST failure it prevents: duplicated side effect (MAST-4.1). Shipped in Monetizy.ai (~8K emails/day) and in the scoring half of Upstate Remedial. Not a conversation graph. A scored pipeline.
3

3. planner-executor + bounded plan depth

A planner node emits a typed Plan(list[Step]) once, the executor runs each Step, the planner is not called again on that case. The plan length is bounded. Re-planning is a new case, not a recursive call.

Numeric cap: plan_depth: int = Field(ge=1, le=4) and len(plan.steps) <= 8. MAST failure it prevents: unbounded self-critique and plan-inside-plan recursion (MAST-3.2). Shipped in OpenArt (custom DAG over scene generation) and partially in OpenLaw for research-then-draft cases. The planner is forbidden from returning a plan that contains another planner call.

The five patterns we reject, by MAST failure mode

Every rejected entry carries the MAST code that labels its failure mode. MAST is the Multi-Agent System Failure Taxonomy, used across the LangGraph eval corpus, the AutoGen paper set, and our own internal post-mortems. The taxonomy is the reason the rejects are named instead of shamed.

Hierarchical (nested supervisors)

A supervisor whose workers are themselves supervisors. MAST-1.3 inter-agent misalignment. Every nesting level multiplies the state surface the on-call has to reconstruct at 2am. We grep for any second-level StateGraph and fail the build.

Swarm / peer-to-peer

Any agent can hand off to any other. MAST-2.4 infinite handoff. Inevitable with more than 3 agents and no turn budget. We test the graph for the longest simple path; if it exceeds 4, the PR fails.

Debate / critic loop

Two agents critique each other until one concedes. MAST-3.1 unproductive persistence. No published eval method covers debate transcripts across vendors. We do not ship it. If you need a critic, it is a single non-loop scorer node.

Reflector / self-rewrite

An agent reflects on its own output and rewrites. MAST-3.2 unbounded self-critique. p95 tokens drift 40% in 30 days. Replaced with a single-pass scorer plus a hard turn_budget retry, not a reflection loop.

Auction / market

Agents bid for tasks. MAST-1.4 opaque selection. The routing decision is a ranked list of LLM outputs, not graspable in a review screen. For enterprise audit we need every route printable by a pure function over state. Rejected.

The state file that backs every approved pattern

Every field in state.py maps to a required_state_fields entry in patterns.md. Drop turn_budget and the supervisor pattern no longer fingerprint-matches. Drop idempotency_key and the pipeline pattern no longer fingerprint-matches. The type system and the CI walker keep each other honest.

src/state.py

patterns.md: the allowlist, in one file

Sixty lines of YAML. Three approved entries with a topology fingerprint and one MAST code each. Five rejected entries with a MAST code each. This file is the orchestration contract between us and your on-call.

patterns.md

topology_walker.py: the CI check

Eighty lines of pure Python standard library. Parses graph.py with the ast module, computes the topology fingerprint, compares against every entry in patterns.md. No runtime execution of graph.py required. Wired as a required status check on main.

scripts/topology_walker.py

What the supervisor pattern looks like under load

A case with two workers and a turn budget of three. Every re-route decrements the budget. Budget zero forces the deterministic human review node, never a fourth LLM call. The numeric cap is what turns a supervisor topology into a production-safe supervisor pattern.

supervisor-capped-budget, three re-routes then budget exhausted

intakesupervisorworker_aworker_bhuman_reviewcase in, turn_budget=3route(worker_a), budget unchangedresult, score 0.68 < 0.82re-route(worker_b), budget=2result, score 0.71 < 0.82re-route(worker_a), budget=1result, score 0.77 < 0.82budget=0, route_to_human (forced)audit: budget_exhausted, END

SERP-default view vs. allowlist view, side by side

The left column is how nearly every page treats the topic. The right column is how we treat it inside a client repo. Same pattern names. Very different operational surface area.

FeatureSERP-default neutral-menu viewPIAS allowlist view
How many patterns you are presented with8 or more. Supervisor, swarm, hierarchical, sequential, debate, planner-executor, reflector, auction. All labeled as viable design choices.3 approved, 5 rejected. The allowlist lives in your repo, not in a blog post.
How you choose between themPick the one that fits your use case. Use your judgment. Read the trade-offs.One question: what is the shape of state? Conversation with re-route, scored pipeline, or plan-then-execute. Each maps to one approved pattern.
What prevents the pattern from driftingStyle guide in a Notion page. Code review. Maybe a Confluence doc nobody reads.patterns.md plus topology_walker.py in CI. Required status check on main. A PR that changes the topology fingerprint fails the build.
What stops the pattern's specific failure modeNothing specific. The tutorial does not mention MAST. Failures are attributed to "prompt engineering."Each approved pattern carries one numeric cap enforced by a Pydantic Field constraint and one conditional edge that reads it before routing.
What happens when a PR adds a new patternIt merges, because the style guide has no teeth.topology_walker.py fails the PR. Adding a fourth approved pattern requires editing patterns.md, which requires a scoping discussion.
What you keep after the vendor leavesThe framework docs. They change with releases.patterns.md (~60 lines YAML) and topology_walker.py (~80 lines Python) in your repo. Your on-call tightens or loosens constraints in a PR.

The pre-merge pattern review checklist

Seven statements. Six are machine-checked by topology_walker.py and friends. The seventh is a one-sentence PR description line that names the pattern. The cheapest PR template in the repo and the one that prevents the most 2am incidents.

pattern review gate

  • scripts/topology_walker.py prints OK and names one of: supervisor-capped-budget, sequential-pipeline-idempotent, planner-executor-bounded-depth.
  • patterns.md in the PR is identical to main or is the PR's intended scope. A patterns.md edit requires its own review thread.
  • For supervisor: state has turn_budget: int = Field(ge=0, le=3) and the supervisor router reads it before routing. No node can mutate turn_budget upward.
  • For pipeline: every node call sites include idempotency_key, and audit_rows writes use ON CONFLICT (case_id, node_name, turn_id) DO NOTHING.
  • For planner-executor: the planner node is referenced by exactly one add_edge call, and len(state.plan.steps) is checked in the executor before iteration.
  • No new file under src/ imports StateGraph except graph.py. A subgraph is a second fingerprint and patterns.md forbids nested supervisors.
  • The PR description names which approved pattern it touches. A PR titled 'add reflection step' cannot merge because reflector is on the reject list.

Anchor fact

Three fingerprints. Everything else fails the build.

Every engagement commits patterns.md and scripts/topology_walker.py on day 1. patterns.md names three approved orchestration patterns by topology fingerprint: supervisor-capped-budget (nodes_max 8, edges_max 8, turn_budget required), sequential-pipeline-idempotent (linear, cycle forbidden, idempotency_key required), planner-executor-bounded-depth (plan_depth <= 4, plan.steps <= 8, planner called once). It also names five rejected patterns with the MAST failure mode each invites: hierarchical (MAST-1.3), swarm (MAST-2.4), debate (MAST-3.1), reflector (MAST-3.2), auction (MAST-1.4). topology_walker.py is a required status check on main. A PR whose graph.py fingerprints into anything outside the three approved shapes cannot merge. patterns.md is the shortest file in the repo and the one doing the most work.

Fingerprint your existing multi agent graph with us

Sixty-minute scoping call with the senior engineer who would own the allowlist landing. You leave with a written topology fingerprint of your current graph.py, which approved pattern it matches (or does not), and an estimate for getting patterns.md plus topology_walker.py wired as a required CI check.

Book a call

Multi agent orchestration patterns, answered

What is multi agent orchestration patterns, in plain terms?

Multi agent orchestration patterns are the named topologies people use to wire several LLM-driven nodes together: supervisor routing to workers, a sequential pipeline, a planner-executor split, a swarm where anyone hands off to anyone, a debate between two critics, a reflector rewriting its own output, and a few more. In production the right question is not which one to use, but which subset is audit-safe and cycle-bounded enough to survive a worker restart at 2am. Our allowlist contains three patterns. The rest we reject by name with the specific MAST failure they invite.

Why reject hierarchical orchestration? It is in almost every tutorial.

A supervisor whose workers are themselves supervisors multiplies the state surface on every nesting level. MAST-1.3 inter-agent misalignment sits at the top of that taxonomy for a reason: the outer supervisor and the inner supervisor develop subtly different routing assumptions, and the only place those assumptions are reconcilable is in the prompt. We reject hierarchical because the single-layer supervisor-with-capped-budget pattern is strictly enough to handle every use case we have shipped. If a client insists on a nested shape, the honest answer is two separate graphs with a queue between them, not a nested StateGraph.

Why reject swarm / peer-to-peer?

Any agent handing off to any other makes MAST-2.4 infinite handoff inevitable past three agents unless you layer a budget on top. If you are going to layer a budget on top, you have just rebuilt the supervisor pattern with extra edges. The supervisor-with-capped-budget pattern achieves everything swarm promises with one-tenth the routing surface, and its topology fingerprint is enforceable in CI. Swarm is appealing in demos and unscalable in audits.

Why reject the debate / critic loop?

The debate pattern pairs two agents that critique each other until convergence. MAST-3.1 unproductive persistence lands here: the loop can terminate after one turn or after twenty, and which it does is a property of prompt and seed, not of graph structure. Worse, evaluating debate transcripts requires a pairing-specific rubric that does not transfer across model vendors, which breaks our vendor-neutral rule. When clients ask for a critic, we ship a single-pass scorer node and use the supervisor's turn budget to bound any re-routing.

Is planner-executor safe? It is recursive by nature.

It is safe when and only when recursion is forbidden at the graph level. Our approved variant has a single call to the planner per case, a typed Plan with len(steps) <= 8, and a plan_depth bound of 4. The executor iterates the plan list in order. The planner node is not called from within a step. A step is not allowed to return a new plan. MAST-3.2 unbounded self-critique is prevented by the grammar, not by the prompt. If your repo has a plan-inside-plan path, it is not the pattern we approve, and topology_walker will reject the PR.

What is in patterns.md exactly?

patterns.md is a ~60 line YAML file with two top-level keys: approved and rejected. Each approved entry has a name, a fingerprint (nodes_max, edges_max, cycle_policy, fan_out_max, required_state_fields, optional constraints) and the MAST code it prevents. Each rejected entry has a name and the MAST code it invites. The file is source of truth for scripts/topology_walker.py. Changing patterns.md is a scoping event, not a hotfix.

How does topology_walker.py read graph.py?

It parses graph.py with the standard library ast module. It counts every add_node, add_edge, and add_conditional_edges call. It infers fan-out per node by counting how many distinct next-node string literals appear inside each conditional router function. It walks the induced graph and computes whether a cycle exists outside the supervisor-budget pattern. It compares the resulting fingerprint to every approved entry in patterns.md. If no match, it exits non-zero. It is ~80 lines of pure Python. No runtime execution of graph.py is needed.

What if my use case genuinely needs a fourth pattern?

Two answers. First: across five shipped systems at Monetizy.ai, Upstate Remedial, OpenLaw, PriceFox, and OpenArt, we have not needed one. Every use case has fit supervisor-capped-budget, sequential-pipeline-idempotent, or planner-executor-bounded-depth, sometimes composed as two graphs joined by a queue. Second: if your use case truly does not fit, the answer is to scope a week of engagement on defining the fourth pattern's fingerprint, its MAST failure mode, and its CI check. It is not to silently add it during a feature PR. patterns.md makes that discussion explicit.

How is this different from the ceiling numbers (8 edges, 3 turn budget) you publish elsewhere?

The ceiling numbers are one layer of defense. Topology fingerprints are a higher layer. You can satisfy the 8-edge ceiling with a swarm of 4 agents and still have a peer-to-peer topology we reject. topology_walker.py asks a question the edge counter cannot: is the shape of this graph one we are willing to run at 2am? The numbers live inside patterns, not on top of them.

Does this apply to LangGraph, Pydantic AI, and custom DAGs equally?

Yes, because the fingerprint is computed from the source of the graph file, not from a framework-specific runtime. In LangGraph it walks StateGraph calls. In Pydantic AI we parse Agent / Tool definitions the same way. In a custom DAG we parse the .add_step / .add_branch calls in your orchestrator file. patterns.md does not reference a framework. A client who switches from LangGraph to a custom DAG keeps their allowlist. This is part of why the leave-behind survives the engagement.

What happens to all the other orchestration patterns I have seen in papers?

The academic literature describes many more patterns, including voting ensembles, tree-of-thought search, graph-of-thought, skill libraries with tool-caching. We treat them the same way: run a MAST analysis, pick the closest approved pattern, map the paper's idea onto it. Most of what people call 'a new pattern' is a new prompting strategy inside supervisor + worker or planner + executor. The fingerprint does not care about the prompt. It cares about the graph.