Guide, topic: anthropic enterprise compliance deployment

Anthropic ships the compliance surface. Your repo still needs the seven files that wire it up.

BAA, ASL-3 deployment posture, Compliance API, SOC 2 Type II, HIPAA-ready services on the first-party API. None of that is a working compliant Claude agent in your repository. This is the six-week PIAS playbook for landing one, with the file names of the seven artifacts we put on main, the cron that powers the audit sink, and the gate that blocks merges before a region drift becomes an audit finding.

M
Matthew Diakonov
16 min read
4.9from 5 named production agents on Anthropic, Bedrock, and Vertex
compliance/baa_receipt.md + .github/workflows/compliance-gate.yml as a required CI check
compliance/compliance_api_sink.py polling every 5 minutes, joined by correlation_id
phi_leak_max=0 enforced on eval/cases_compliance.yaml on every PR

Shipped on Monetizy.ai (Bedrock us-east-1), Upstate Remedial (Anthropic direct, HIPAA), OpenLaw (ZDR + Splunk), PriceFox (Vertex multi-tenant), OpenArt (mixed Anthropic + open-weight DAG).

compliance/baa_receipt.md, compliance/redactor.py, compliance/compliance_api_sink.py, eval/cases_compliance.yaml, .github/workflows/compliance-gate.yml, runbook/compliance.md, rubric.yaml#compliance

BAA scope receiptASL-3 deployment postureCompliance APIaudit log retentionPHI redaction pre-flightSIEM webhookrubric.yaml compliance sectioneval/cases_compliance.yamlcompliance-gate.ymlrubric_min_score 0.82phi_leak_max 0model vendor neutralAnthropic direct APIBedrock invocationVertex region pinningno PIAS hosted runtime

The dangerous halfway point

Most guides on this topic walk you through what Anthropic ships: the BAA, the Compliance API, the AI Safety Level 3 deployment standard, the SOC 2 Type II attestation, the Trust Center. All of that is real, all of that is necessary, and none of it is what an enterprise team actually has to build.

Anthropic ships the BAA, ASL-3 deployment posture, the Compliance API, and SOC 2 Type II. Your VP of Data reads the trust center page and tells the board the compliance question is solved. Six weeks later, the agent is in production, calling an out-of-scope Bedrock region for latency, the prompt still contains the member ID the redactor was supposed to strip, and the audit log lives only in Anthropic's dashboard with a retention window shorter than your auditor wants. Anthropic did everything they promised. The wiring on your side never got built.

The mistake is treating Anthropic's compliance surface as a finished product instead of a primitive. The fix is to treat the customer-side plumbing as code, write it once, version-control it, and gate every merge against it. That is what the seven files do.

Five things Anthropic explicitly does not do for you

Each card is something we have watched a real production incident hinge on, on at least one of the five shipped agents. None of these are theoretical. Each one is either a row in eval/cases_compliance.yaml or a job in compliance-gate.yml.

Inputs Anthropic explicitly does not redact for you

Anthropic's BAA covers transit and storage. It does not promise to scrub PHI out of the prompt you send. If the prompt contains a date of service plus a member ID plus a diagnosis code, those three fields are in the trace until you remove them. compliance/redactor.py is the file that does the removing.

Trace correlation Anthropic does not synthesize for you

The Compliance API gives you events on Anthropic's side. Your SIEM gives you events on yours. Joining the two requires a correlation ID you set on every request. Most teams discover this the day an auditor asks for end-to-end timing on a single conversation.

Region and provider drift the BAA is silent on

Your BAA covers a list of endpoints. Bedrock in us-east-1 is one endpoint. Bedrock in eu-west-1 is a different one. The day a developer flips the region for latency, the BAA may no longer apply. The CI gate is what catches that PR before it merges.

Audit log retention windows that outlast Anthropic's defaults

Anthropic retains org-level audit data for a fixed window. Your auditor wants seven years. The sink that pulls Compliance API events into your SIEM is also what keeps a copy under your retention policy, on your storage, on your encryption keys.

PHI in tool outputs the model never read

Tool calls return data the model summarizes back into the prompt of the next turn. A function that returns a patient record without redaction will quietly inject PHI into the conversation history Claude sees. The eval suite has rows that fail the build when this happens, on every PR.

Anchor fact: the seven files we put on main in six weeks

Every PIAS engagement on a regulated Claude deployment lands the same seven artifacts in the client repo. None of them are hosted by us. None of them require a platform license. The file names are identical across the five shipped agents, so an engineer who has worked on one can read the others cold.

Anchor fact

Seven files. One CI gate. Zero PIAS-hosted services.

  1. compliance/baa_receipt.md. Inventory of (provider, region, model_id) tuples your signed BAA covers, plus the explicit not_covered list. The legal reviewer's name and signature date are in the file.
  2. compliance/redactor.py. Runs on every prompt before it hits Claude. Patterns for SSN, MRN, member IDs, ICD-10, DOB, email, phone. Returns a hit count the eval rows score against.
  3. compliance/compliance_api_sink.py. Polls the Anthropic Compliance API every 5 minutes, joins to your trace correlation ID, forwards to your SIEM webhook, escrows a copy to your audit bucket on your encryption keys.
  4. eval/cases_compliance.yaml. Rows that fail the build on a single PHI surface form surviving the redactor or a single tool-output injection.
  5. .github/workflows/compliance-gate.yml. Required CI check with three jobs: baa-scope, redactor-coverage, phi-leak-eval. Pages the engagement owner on failure.
  6. runbook/compliance.md. On-call playbook. Read out loud with three on-call engineers in week 5. If anything is unclear, the runbook is rewritten that day.
  7. rubric.yaml#compliance. Compliance section in the rubric so model qualification, regression tail mining, and the compliance gate share one source of truth for thresholds.

compliance/baa_receipt.md, the file the gate reads on every PR

The exact shape we ship. The covered list is the only set of (provider, region, model_id) tuples your code is allowed to call. Adding a row requires a new signed amendment from Anthropic and a PR that updates this file in the same commit as the code that uses the new tuple. The not_covered list is what stops a developer from quietly enabling an EU region for latency.

compliance/baa_receipt.md

compliance/redactor.py, the only PHI scrubber we ship

Stays under 200 lines on purpose. Coverage is guaranteed by the eval suite, not by the size of the file. New patterns land here only with a corresponding row in eval/cases_compliance.yaml, so a regex change without a test case fails the gate.

compliance/redactor.py

compliance/compliance_api_sink.py, the 5-minute audit pipe

Polls the Anthropic Compliance API, normalizes each event, joins to your trace by the correlation_id you set on every Claude request, forwards to your SIEM webhook, and escrows a copy to your audit bucket so retention outlasts Anthropic's defaults. The cursor in compliance/_state.json is what makes the sink crash-resumable.

compliance/compliance_api_sink.py

eval/cases_compliance.yaml, the rows that fail the build on phi_leak_max=0

Six representative rows from a real engagement. Notice the tool-return injection case: tool calls return data the model summarizes back into the next turn, so a function that returns a patient record without redaction quietly poisons the conversation history. The eval row is what catches it.

eval/cases_compliance.yaml

.github/workflows/compliance-gate.yml, the required check

Three jobs, all required for merge. baa-scope rejects any PR that introduces a (provider, region, model) tuple not in the receipt. redactor-coverage AST-walks the repo and rejects any PR that adds a Claude call site reachable without redact() in its call graph. phi-leak-eval enforces phi_leak_max=0 on the eval suite.

.github/workflows/compliance-gate.yml

rubric.yaml gains a compliance section in week 2

One source of truth for compliance thresholds. The model qualification PR, the regression tail-mining job, and the compliance gate all read the same numbers. When the threshold changes, it changes in one file.

rubric.yaml

A literal cron tick, traced

Captured from a recent run on a Monetizy.ai-style deployment. No human kicked this off. The cron fires every five minutes. The gate report runs as a side effect on every tick so the next PR's feedback latency is zero.

compliance_api_sink + gate_redactor_coverage -- 2026-04-23 14:25 UTC

The wiring: Anthropic's surface in, your enforcement out

Left: everything Anthropic ships on the platform side. Center: the seven-file customer-side wiring that turns it into enforcement. Right: the artifacts your engineers, your auditors, and your on-call engineers actually use.

Anthropic compliance surface -> seven-file wiring -> your enforcement

BAA + HIPAA-ready services
Compliance API event feed
ASL-3 deployment posture
seven-file wiring
Required CI gate on every PR
Joined SIEM stream + 7-yr escrow
Runbook + on-call playbook

The six-week shape, week by week

Each week ends with a specific artifact on main. The week-2 prototype gate and the week-6 production gate are pass-or-fail rubrics with refund and exit clauses on the first two. PIAS is gone at week 6.

1

Week 0 -- scoping call writes the compliance one-pager

60-minute call with the senior engineer who would own the build. Output is a one-pager naming the regulated data classes, the BAA status, the SIEM you already run, and the retention window your auditor expects. No 40-page roadmap. The compliance one-pager is what week 1 starts from.

2

Week 1 -- engineer in the repo, baa_receipt.md goes on main

Named senior engineer in your GitHub, Slack, and standup. First PR within seven days, or billing pauses. That PR adds compliance/baa_receipt.md, the inventory of covered endpoints, regions, and model IDs. If the BAA is not yet signed, this same PR ships in placeholder mode and the workflow gate runs in warn-only.

3

Week 2 -- redactor.py, eval rows, compliance section in rubric.yaml

compliance/redactor.py lands with the ten or so PHI surface forms your domain actually emits. eval/cases_compliance.yaml gets its first six rows. rubric.yaml gains its compliance section. By the end of week 2, the prototype runs in your staging behind a feature flag, hitting your data. The phi_leak_max=0 gate is enforced from this PR forward.

4

Week 3 -- Compliance API sink wired to your SIEM

compliance/compliance_api_sink.py runs on a 5-minute cron, pulls the Anthropic Compliance API events, joins them to the correlation ID we set on every Claude call, and forwards normalized events to your SIEM webhook. A second copy lands in your audit bucket so retention outlasts Anthropic's defaults. The cursor in compliance/_state.json is what makes the sink resumable.

5

Week 4 -- compliance-gate.yml is required for merge

.github/workflows/compliance-gate.yml turns on as a required check. Three jobs: baa-scope, redactor-coverage, phi-leak-eval. A PR that flips a region, swaps a provider, or skips the redactor cannot merge. The escalation path opens a GitHub issue and posts to your compliance Slack channel automatically.

6

Week 5 -- runbook.md walked through with on-call

runbook/compliance.md is read out loud with three on-call engineers in the room. BAA scope, redactor coverage report, Compliance API cursor, SIEM webhook URL, rollback PR template. The runbook is read cold by an engineer who was not on the engagement. If anything is unclear, the runbook is rewritten that day, not after handoff.

7

Week 6 -- production cutover and handoff, all seven files on main

Feature flag flipped to 100%. The eval gate is green. The Compliance API sink is at 100% join rate on the last 24 hours. The seven files (baa_receipt.md, redactor.py, compliance_api_sink.py, cases_compliance.yaml, compliance-gate.yml, runbook/compliance.md, rubric.yaml's compliance section) are version-controlled in your repo. PIAS is gone. The senior engineer who built the system is available for paid two-hour consults at a capped rate for twelve months. No retainer.

How this differs from the common playbook

Left: the PIAS week-6 leave-behind. Right: the shape most teams end up with when they treat Anthropic's compliance surface as a finished product. Every row is a difference we have watched matter on a real outage or a real audit.

FeatureTrust-the-vendor defaultPIAS leave-behind
Where the BAA lives in your engineering processA signed PDF in someone's email. Engineering does not see it. The first time it matters is during an audit, and by then the agent has been calling an out-of-scope endpoint for three months.compliance/baa_receipt.md on main, with the exact API endpoints, regions, and model IDs the BAA covers, plus the reviewer who signed it. Every PR that touches model_provider or region runs a check that reads this file.
How PHI or PII gets out of the prompt before it hits ClaudeBest-effort string replace inside one tool wrapper, with no test that fails when someone adds a new call site that bypasses it. Coverage decays silently as the agent grows.compliance/redactor.py is invoked from every Claude call site. A regression test in eval/cases_compliance.yaml fails the build if a known PHI surface form survives the redactor.
What happens when Anthropic returns a response that touches a sensitive labelTrace lives in a vendor observability dashboard. Your SIEM has no idea it happened. Two systems, two IDs, no join.Trace is annotated with compliance: {scope, label}, dropped into compliance/compliance_api_sink.py's batch, and forwarded to your SIEM with the same correlation ID Anthropic's Compliance API uses. One ID across both systems.
How the Compliance API gets pulled into your audit pipelineSomeone exports CSVs once a quarter when an auditor asks. The retention window may or may not cover the question being asked.compliance/compliance_api_sink.py polls the Anthropic Compliance API on a 5-minute cron, dedupes against the last cursor in compliance/_state.json, and writes a normalized event into your SIEM webhook. No CSV exports, no manual review.
How a model swap (Claude version, Bedrock region, Vertex region) is gatedEngineer changes a string in a config file, opens a PR, gets it merged in 30 minutes. Compliance team finds out at the next quarterly review..github/workflows/compliance-gate.yml checks that the new model_id, provider, and region are present in compliance/baa_receipt.md before letting the PR merge. If they are not, the workflow fails and the engagement owner is paged.
What the runbook says when an incident lands at 02:00A Confluence page that has not been updated since the original signing. The on-call engineer is paging the original author who left the company seven months ago.runbook/compliance.md has the BAA scope, the redactor coverage report, the Compliance API cursor, the SIEM webhook URL, and the rollback PR template. Three on-call engineers can read it cold.
Vendor and platform dependencyA dashboard you do not own, behind an SSO you cannot escrow, on a contract that renews on a date you do not control.All seven files are version-controlled in your repo. The redactor and the sink read from your existing trace store. No PIAS-hosted runtime, no platform license, no agent framework.

The same seven files, five different stacks

Each card is a named production agent with the playbook live on main. Provider, region, regulated data class, and SIEM differ. The seven file shapes do not.

Monetizy.ai -- outbound email orchestrator on Bedrock

BAA covers Bedrock us-east-1 only. compliance-gate.yml has rejected three PRs that tried to flip to us-west-2 for latency. Each rejection became a 30-minute conversation with legal, not a quarterly audit finding.

Upstate Remedial -- 400K+ debt-notice emails, mixed PHI surface

redactor.py covers SSN, MRN, member IDs, and the four state-specific debt-notice numbering schemes. eval/cases_compliance.yaml has 41 rows. Two of them are tool-return injection tests that have caught real regressions on model upgrades.

OpenLaw -- AI-native legal editor, Anthropic direct API

ZDR header is asserted on every request by an eval row. The Compliance API sink writes to the firm's existing Splunk index using the same correlation ID the editor's frontend sets on each session.

PriceFox -- multi-tenant retrieval, Vertex region pinning

Each tenant's BAA scope is a separate row in compliance/baa_receipt.md. The gate rejects a PR that would route a tenant's traffic through a region not in their row. No tenant has ever seen another tenant's traces, by construction.

OpenArt -- per-scene DAG, Anthropic + open-weight inference

Open-weight nodes are explicitly marked out_of_scope in baa_receipt.md and the gate enforces that any node tagged contains_phi must route to a covered Anthropic endpoint. The redactor runs on every node boundary, not only at the entrypoint.

7

Seven files in your repo, one cron, one required gate. Anthropic shipping the BAA, the Compliance API, and ASL-3 is the floor. The wiring on your side is what determines whether the agent is actually compliant in production, or just compliant in a slide deck.

PIAS leave-behind on Monetizy.ai, Upstate Remedial, OpenLaw, PriceFox, OpenArt

Receipts

File counts and cadence facts, not invented benchmarks. Per-client production metrics are on /wins.

0Files the week-6 leave-behind adds to main for compliance wiring
0Weeks from scoping call to compliance-gate.yml as a required merge check
every 0 minCron interval at which compliance_api_sink.py polls Anthropic
0Platform license fee for any of the seven files (your repo, your CI, your SIEM)

The 0-file rule is the same discipline that makes new-model qualification a 0-PR job. Everything else, 0 USD in platform license, 0 vendor-hosted runtimes, on purpose.

Want the seven files on main in your repo in six weeks, not a 60-page assessment?

60-minute scoping call with the senior engineer who would own the build. You leave with a one-pager: the regulated data classes, the BAA status, the SIEM you already run, the retention window, and a named engineer to deliver it.

Anthropic enterprise compliance deployment, the questions procurement asks

What does Anthropic actually cover in an enterprise deployment, and what is my team responsible for?

Anthropic covers transit, storage, the model platform, ASL-3 deployment posture, the Compliance API as a feed of events on their side, SOC 2 Type II, and a BAA on the Claude Enterprise plan and the first-party API for HIPAA-ready services. Your team is responsible for everything that happens before the request reaches Anthropic and after the response leaves it. That means PHI/PII redaction in the prompt, correlation IDs that join Anthropic's events to your traces, audit log retention beyond Anthropic's defaults, gating model and region changes against your signed BAA scope, and a runbook your on-call engineer can read at 02:00. The seven files this guide names are the customer-side wiring for those responsibilities.

Where do these seven files live in the repo and what is each one for?

compliance/baa_receipt.md is the inventory of endpoints, regions, and model IDs your signed BAA covers, plus the legal reviewer who signed it. compliance/redactor.py runs on every prompt before it hits Claude. compliance/compliance_api_sink.py polls Anthropic's Compliance API on a 5-minute cron and forwards events to your SIEM webhook with a join key. eval/cases_compliance.yaml has rows that fail the build on a single PHI leak. .github/workflows/compliance-gate.yml is the required CI check with three jobs (BAA scope, redactor coverage, PHI eval). runbook/compliance.md is the on-call playbook. rubric.yaml gains a compliance section so model qualification and regression tail mining read the same thresholds. All seven are version-controlled in the client's repo, none are PIAS-hosted.

We already have a BAA with Anthropic. Why do we still need compliance/baa_receipt.md?

Because the signed PDF lives in legal's drive, and engineering writes the code that picks endpoints, regions, and model IDs. Without a checked-in receipt that the CI gate can read, a developer flipping a region for latency is a one-line PR that ships in 30 minutes, and the audit finding shows up six months later. baa_receipt.md gives the gate a machine-readable list of (provider, region, model) tuples that are in scope, and a not_covered list of tuples that explicitly are not. The PR cannot merge if any new tuple is missing from covered, and the file's diff is the audit trail of every scope change.

Why a custom redactor and not just trust Anthropic's filters?

Anthropic's filters operate on the model side. They protect against the model emitting harmful content; they do not promise to strip PHI out of the prompt you sent. If your prompt contains an SSN, an MRN, a member ID, or a date of birth, those values are in the request payload, in the trace, and in any retained audit data until you remove them. The redactor lives in your code so you control which surface forms get scrubbed and which do not, the eval suite asserts that every Claude call site in the repo invokes it, and the CI gate fails if a new call site bypasses it.

How does compliance/compliance_api_sink.py join Anthropic's events to traces in our SIEM?

Every Claude call sets request_metadata.correlation_id at the SDK call site. Anthropic echoes that ID back on the matching Compliance API event. The sink polls the Compliance API on a 5-minute cron, normalizes each event into your SIEM's schema, sets the correlation_id field to the value Anthropic echoed, and POSTs to your SIEM webhook. Your SIEM then has one ID across both systems, so an end-to-end query (prompt to response to compliance event) is one filter, not a join across two dashboards. The sink also escrows a copy to your audit bucket so retention outlasts Anthropic's defaults.

What does the compliance-gate.yml workflow actually block?

Three classes of merges. (1) baa-scope rejects any PR that introduces a (provider, region, model) tuple not present in compliance/baa_receipt.md under covered, or any tuple present under not_covered. (2) redactor-coverage AST-walks the repo and rejects any PR that adds a Claude SDK call site reachable without compliance.redactor.redact in its call graph. (3) phi-leak-eval runs eval/cases_compliance.yaml against the proposed code with phi_leak_max=0, so a single PHI surface form surviving the redactor fails the build. On failure, an issue is opened against the engagement owner and your compliance Slack channel is pinged automatically.

How is this engagement different from hiring a Big 4 firm to do an Anthropic compliance assessment?

Three differences that show up in the SOW. The engineer you meet in the scoping call is the same engineer writing the PRs, named in the SOW, not a partner-plus-grads rotation. The output is seven files on main in your repo, not a 60-page assessment with recommendations to staff a separate implementation. First PR within seven days or billing pauses, week 2 prototype gate, week 6 production gate, with refund and exit clauses on the first two. A typical Big 4 compliance engagement starts with an 8-week kickoff and ends with a slide deck. Ours starts with a PR.

What if we are on Bedrock or Vertex instead of the Anthropic direct API? Does the playbook change?

The shape of the seven files does not change. baa_receipt.md gains rows for the AWS or Google BAA scopes you have signed for those providers; the gate reads the same file. The Compliance API sink is replaced by the equivalent CloudTrail or Cloud Audit Logs reader for the model invocation events, with the same correlation_id join key. The redactor and the eval suite are unchanged. The model_provider field in your config tells the gate which scope file and which sink module to enforce, all in the same rubric.yaml. Three of the five named clients we have shipped this on are on Bedrock or Vertex, not direct Anthropic.

What happens to the compliance pipeline when Anthropic ships a new Claude version?

Two things, in this order. First, the model qualification PR lists the new model_id in compliance/baa_receipt.md under covered, and the gate confirms the BAA actually covers it (this is a manual step with legal, not an automated assumption). Second, the regression eval, including eval/cases_compliance.yaml, runs the new model against every PHI surface form your repo has ever shipped a row for. If any row regresses, the qualification PR cannot merge. The compliance bar grows as the rubric grows; a new model that passes the original eval but regresses on a year of accumulated PHI surface forms cannot make it to production. This is how compliance becomes a moat instead of a one-time approval.

Who owns the IP, the repo, and the seven files at the end of the engagement?

You do, from clause one of the MSA, in writing, before the engineer starts week 1. No platform license, no proprietary agent framework, no PIAS-hosted runtime, no data residency on PIAS infrastructure. The seven files are version-controlled in your GitHub from the first PR. The runbook is read by your on-call engineers, not ours. The Compliance API key is in your secret store, not ours. The audit bucket is on your cloud, on your encryption keys. At the end of week 6 we are gone and the system runs without us in the room. The engineer who built it is available for paid two-hour consults at a capped rate for 12 months, no retainer.