Guide, topic: ai agent memory scope, multi-tenant safety, 2026

AI agent memory scope is a safety property, not a framework checkbox.

Most articles about agent memory scope describe a checkbox: pass user_id to Mem0, set a namespace on Letta, put a customer key in Zep session metadata. The shape that survives production is a different artifact: a single rules file in the repo that declares the namespace key, five composed scope axes, the read and write match for each, and a probe set that proves the boundary holds before every merge. This page is the file shape we ship into client repos for that.

Matthew Diakonov, Written with AI

Published April 27, 202614 min read

Why "memory scope" deserves its own file

Scope lives in three places in most agent codebases: in the framework call (mem0.add(messages, user_id=...)), in scattered helper functions that build retrieval filters, and in a Notion doc that says "we are multi-tenant." None of those three are reviewable in a PR, none of them are testable in CI, and none of them survive a refactor. The first time a developer adds a new retrieval path and forgets to compose the tenant_id into the key, the boundary breaks and nobody sees it until a customer reports seeing data they should not have seen.

The fix is to put scope in the same shape as any other production policy. One YAML file in the repo, memory/scope_rules.yaml, that declares the namespace key, the axes, and the rules. One function in the repo, build_namespace_key(), that every read and every write goes through. One probe file, tests/scope_leak_probes.yaml, that fires on every PR. One gate, scripts/scope-leak-check.sh, that fails the build the first time a probe retrieves a row it should not have. The whole package is roughly 600 lines of code and config and earns its keep the first time it catches a bare-axis bug in a code review.

memory/scope_rules.yaml

The whole rules file. Five axes, one key composition, two shared namespaces, one read group, one write block list. It is intentionally small; the value comes from being declared in one place that gets reviewed in a PR like any other production policy.

memory/scope_rules.yaml

The five axes, one card each

The order in the namespace key reflects how strong each boundary is. Cross-tenant leak is the only failure that ends up in front of outside counsel. Cross-session leak is a usability bug. The other three are between.

tenant_id, the only axis where leak_tolerance is 0.0 by law, not by taste

tenant_id is the customer organization. A row that crosses tenant_id is the failure mode that ends a multi-tenant SaaS contract and, depending on the data, becomes a regulatory event. read_match and write_match are both exact. The 12 tenant_id probes are the most defensive in the suite because every other axis can be reasoned about; this one shows up in the breach notification.

user_id, hard boundary by default, opt-in shared sub-namespaces only

user_id is the end user inside the tenant. Default match is exact on both read and write. Sharing across users (a household account, a team workspace) has to go through the shared_namespaces block, with an explicit name, an owner, and an audit trail. We never let a write infer a shared scope from heuristics; a row enters a shared namespace because the rules file says it can.

agent_id, soft boundary that allows warm handoffs but not free reads

agent_id distinguishes one agent persona from another (support vs sales vs onboarding). Default is exact match. The read_groups block lets a support handoff bot read the support agent's per-user notes during a warm transfer, without granting either bot a free read into the other's space. The handoff is a named relationship, not a side effect.

persona_id, the axis you only need when one agent_id runs in multiple modes

If your support agent serves both customers and internal CSMs, persona_id keeps the customer-facing memory away from the internal-tools memory inside the same agent_id. Most projects do not need it on day one. We add it the first time someone asks 'can the agent see what the CSM saw last week?' because the answer has to be no and the only way to make it stay no is a separate axis.

session_id, the working-memory boundary that expires

session_id is the only axis with a TTL. Working-memory tier writes go here and only here. Long-term tier writes ignore session_id at write time. The session_id probes catch the common bug where a working scratchpad accidentally got promoted to long-term and now bleeds into the next session. ttl_seconds: 1800 in the rules file is enforced by the store, not by hope.

What composing the key actually buys you

Every retrieval and every write goes through a single function that builds the namespace key from the five axes. There is no path in the agent code that calls retrieve(user_id=...) directly with a bare user_id. That is the only way the rules file is load-bearing; if the agent code can compose its own namespace strings, the rules file is theater.

The one function every read and every write goes through

The rules file is theater if the agent code can build its own namespace strings. A single function, build_namespace_key(), assembles the key from the five axes and refuses to return a key with a missing or delimiter-poisoned axis value. Every retrieval path and every write path imports this function. There is no other path.

memory/build_namespace_key.py

The two checks are deliberately tiny. The point is not the sophistication of the function; the point is that there is exactly one of them. A grep for retrieve(, .add(, .upsert(, and .search( across the agent repo should turn up zero call sites that do not flow through build_namespace_key().

tests/scope_leak_probes.yaml

32 probes total, allocated by axis. Every probe writes one row in one scope and tries to read it from a different scope. The probe passes only if the read returns zero rows. Six probes are shown below, including a regression probe that captures a real bug we shipped (and fixed) once.

tests/scope_leak_probes.yaml

What a probe does, step by step

Every probe is just a write under one set of axes followed by a read under a different set, with an assertion that the read returns nothing. The runner is roughly 200 lines of Python.

1. Load the rules and the probe file

scripts/scope-leak-check.sh loads memory/scope_rules.yaml and tests/scope_leak_probes.yaml at start. If either file fails to parse, the build fails before the first probe runs. The agent's runtime loads the same rules file, so what the probe enforces is what the agent enforces.

2. For each probe, write under write_as

The runner builds the namespace_key from the probe's write_as block (tenant_id, user_id, agent_id, persona, session_id), writes the probe payload, and records the resulting row_id. Writes go to a staging memory store that is wiped after the run; we never run probes against production memory.

3. Then read under read_as, with all five axes substituted

The runner builds a different namespace_key from the probe's read_as block, calls retrieve(), and captures the rows returned. The point of the probe is that read_as differs from write_as on at least one axis; if any of those probes retrieves the row written in step 2, the boundary on that axis does not hold.

4. Compare returned rows to expect_rows

Every leak probe expects expect_rows: 0. If the returned count is greater, the runner records a P0 with the probe id, the offending namespace_key, the row id, and the read namespace key into memory/scope_audit.jsonl, and exits non-zero.

5. Tear down the staging store row

Each probe deletes its written row before the next probe runs, so probes do not interfere. The audit log is the only thing that survives across runs and is committed back to the repo by CI for the post-mortem trail.

6. Add a stable-id regression probe whenever a leak is found

Every leak that ever shipped becomes a permanent probe with a stable id (regression_004_user_misspelling, regression_011_global_namespace_fallback, etc). The probe set grows; it never shrinks. Six months in, the regression probes are the most valuable part of the suite because they catch the bugs the team has already paid to fix once.

scripts/scope-leak-check.sh

The CI gate. Two commands. One Python module that runs the probes against the staging memory store, one that summarizes the audit log into a PR comment. Exit code is the only thing that matters for the merge gate; the audit log matters for the post-mortem.

scripts/scope-leak-check.sh

What the green run looks like

On a healthy main branch. 32 probes, 0 leaks, exit 0. The PR comment is a one-liner that says "32/32 probes passed." The audit log gets a row anyway so the trend is visible over time.

ci: scripts/scope-leak-check.sh -- main, healthy

What a real leak looks like

And on a PR that has the bare-axis bug. The build is blocked. The audit log records the offending namespace key, the row id, and the read context. The diagnosis line in this output is generated by the runner because tenant_leak_002 is built specifically to catch this shape, so the cause is named at fail time and not after a two-day investigation.

ci: scripts/scope-leak-check.sh -- PR, leak detected

Declared scope vs implicit framework scope, side by side

Left column is the file-shape we ship into client repos. Right column is what most agents have on day one: scope lives implicitly in scattered SDK arguments. Both can pass a quick demo. Only one survives a refactor and a model swap.

Feature	Implicit scope (framework arguments only)	Declared scope (rules file plus probes)
How memory scope is expressed	Scope lives implicitly in framework arguments. mem0.add(messages, user_id=...) here, letta.create(name=...) there. Nobody can answer the question 'what is the scope rule for this agent' in one place.	memory/scope_rules.yaml declares the namespace_key, the five axes, the read_match and write_match for each, the shared_namespaces, and the read_groups. One file. Reviewed in code review.
How a tenant boundary is proved to hold	The team trusts the framework to enforce user_id. There is no test that writes under tenant_a and reads under tenant_b. Tenant boundary holds in production until the day it does not.	tests/scope_leak_probes.yaml has 32 probes. scripts/scope-leak-check.sh runs them on every PR. A probe that retrieves a row across tenants fails the build with the offending namespace_key in the log.
How cross-scope reads are granted	Cross-scope reads happen because someone passed user_id=null to be 'flexible' or used a global namespace because it was easier. The decision is not in any one place to review.	Through the shared_namespaces block (tenant-wide knowledge base, agent canonical facts) or the read_groups block (warm handoff between two named agent_ids). Each entry has an owner and is audited.
What happens when a probe finds a leak	Leak is found by a customer who saw something they should not have. The team triages, ships a hotfix, and writes a postmortem. There is no probe to add to.	The build fails. The PR cannot merge. The audit log captures the offending namespace_key, the probe id, and the row id. The on-call engineer fixes the cause (almost always a bare user_id key in a retrieve call) and adds a regression probe with a stable id.
What write-time scope enforcement looks like	Writes happen in three different code paths because three different teams wrote them. Each path has its own scoping logic. The drift between them is the bug.	Every write goes through a single build_namespace_key() function that asserts all five axes are present and non-empty. write_block_fields is checked at write time, not read time, so blocked fields never enter the store.
How a model swap or store swap interacts with scope	Scope logic is tangled into Mem0 SDK calls. Switching to Letta means re-deriving scope semantics from a different SDK and hoping nothing slipped during the port.	Scope is independent of the store. Swapping pgvector for Mem0 or Zep is a one-file change in the store adapter. The rules file and the probe set re-run unchanged. The boundary survives the migration.

The smallest version you can ship next week

Three files and one refactor. First, write memory/scope_rules.yaml with two axes (tenant_id and user_id). Skip agent_id, persona_id, and session_id for now if your project has one agent and no sessions; you can add them when you need them. Second, write memory/build_namespace_key.py and refactor every retrieval and write call site to go through it. This is the load-bearing change; without it, the rules file is a wishlist. Third, write tests/scope_leak_probes.yaml with eight probes (four cross-tenant, four cross-user) and a 50-line runner that fails the build on a non-zero leak count. Wire it into CI. That whole package is a week for one engineer who knows the codebase.

Everything else (shared_namespaces, read_groups, the persona axis, the regression probe set, the audit log) is a follow-up. The week-1 version catches the most expensive failure mode (cross-tenant leaks) and forces every read and write through one function. After that, the system tells you what to add next: every probe failure in production turns into a permanent regression probe with a stable id, and the rules file grows in response to a real demand instead of a speculation.

Where fde10x fits

fde10x is a forward-deployed ML engineering studio. We embed named senior engineers into the client's GitHub, Slack, and standup, and we ship the scope layer (rules file, probes, gate script, namespace-key refactor, store adapter, runbook) into the repo over two to six weeks. The leave-behind is the rules file, the probes, the audit log, and a runbook so the on-call engineer knows what to do when a probe fails at 02:00. No platform license, no vendor-attached runtime; the artifacts live in your repo and your team owns them.

Plenty of teams build this themselves. The embed is the right call when there is a deadline, the team is short on senior MLE capacity, and the scope risk is real (multi-tenant SaaS, regulated industry, an agent that already shipped and a customer just saw something they should not have). We do model-vendor neutral work (Anthropic, OpenAI, Bedrock, Vertex, Azure OpenAI, or open weight) and the scope layer is independent of your model choice anyway.

Want a senior engineer to ship the scope layer of your agent memory in two to six weeks?

60 minute scoping call with the engineer who would do the work. You leave with a draft of memory/scope_rules.yaml for your stack, a list of the first eight probes to write, a sketch of the build_namespace_key() refactor against your retrieval call sites, and a fixed weekly rate to land it inside your repo.

AI agent memory scope, answered

What is AI agent memory scope, in one paragraph?

Memory scope is the set of rules that decide which memory rows the agent is allowed to read and write on a given turn. In a multi-tenant production system, scope is what keeps tenant A's notes from showing up in tenant B's session, what keeps user A's preferences from being applied to user B, what keeps the support agent's notes from leaking into the sales agent's context, and what keeps a session-scoped scratchpad from outliving the session. It is a safety property of the system, expressed as a namespace key composed from several axes (tenant_id, user_id, agent_id, persona_id, session_id) and a rule file that says, for each axis, who can read and who can write.

Why is 'use Mem0 with user_id' not enough?

Two reasons. First, user_id alone is one axis; a real production system has at least five (tenant, user, agent, persona, session). Second, the framework parameter lives in the call site, not in a reviewable rule file, so the answer to 'what is the scope rule for this agent' lives in dozens of scattered SDK calls instead of one place. The framework is fine, the discipline is to compose the namespace key in one function and declare the rule in one file. Then the framework choice becomes reversible without re-deriving the scope semantics.

Which axis matters most?

tenant_id, by a wide margin. Cross-tenant leaks are the only memory failure mode that becomes a contract event or a regulatory incident. Cross-user leaks within a tenant are bad for trust and may be a privacy issue, but they rarely escalate to outside counsel. The leak_tolerance for tenant_id is 0.0 and twelve of the 32 probes target it, because the cost of a tenant breach is asymmetric to every other failure mode.

How is scope different from access control?

Access control says 'this user is allowed to call this endpoint.' Scope says 'when this user's call retrieves memory, only the rows that match the user's tenant_id, user_id, and agent context can be returned.' Access control is at the request boundary; scope is at the storage layer. A bug in access control sends a request to the wrong endpoint. A bug in scope returns the wrong rows from the right endpoint. The mitigations are different and most teams ship the access control side and assume scope is handled by the framework.

What goes in shared_namespaces?

Things that are deliberately shared across an axis. The tenant-wide knowledge base (every user in tenant_a can read tenant_kb). The agent's canonical product facts (every session of support-agent-v1 can read agent_canon). Each entry has a name, a scope axis, a write_match rule, and an audit flag. The point is that shared scope is opt-in, named, and reviewable. There is no global default sharing.

What goes in read_groups?

Cross-axis read relationships. The classic example is a warm handoff between two agents, where the receiving agent has to read the handing-off agent's notes for the same user. Without read_groups, the receiving agent has no path to those notes; with read_groups, the relationship is named (support-agent-v1 and support-handoff-bot are in the same group) and the read is allowed only between rows whose agent_id is in the group. It is the controlled escape hatch for the soft-boundary axes (agent_id, persona_id), not a generic permission system.

How big should the probe set be?

Per axis, at least three probes: the basic cross-scope read, a probe that uses the same value for the next axis (catches bare-axis-key bugs), and a regression probe for every previously-fixed leak. A new project starts with around 16 probes. After six months of operation, ours typically have 50 to 80 because every leak that ever shipped gets a permanent regression probe. The point is that probes never shrink; cases that have been passing for a year are exactly the ones most likely to silently regress when an unrelated change ripples through the retrieval code.

Where does write_block_fields fit?

Write-time, not read-time. write_block_fields is the set of field names (credit_card_number, ssn, auth_token, api_key, password) that are never written into long-term memory regardless of the scope they would land in. The check happens inside build_namespace_key()'s sibling function, assert_writable(), so blocked fields never enter the store. Read-time filtering would be the wrong layer because once the row is written, it can be retrieved by code paths that do not check the same filter.

Does this work with Mem0, Letta, Zep, pgvector, Redis?

Yes. The rules file and the probe set are independent of the store. The store adapter is roughly 80 lines that translate a composed namespace_key into the underlying SDK's scope arguments (Mem0's user_id and metadata, Letta's namespace, Zep's session metadata, pgvector's filter clauses, Redis's key prefix). Swapping the store is a one-file change. The probe set re-runs unchanged because the probe never touches the SDK; it goes through build_namespace_key().

What is the most common scope bug we have shipped?

The bare-axis bug. Some retrieval code path calls retrieve(user_id=ctx.user_id) without composing the full namespace key. As long as user_id is unique across tenants, nothing visible breaks. The day a tenant_b user signs up with a user_id that collides with an existing tenant_a user_id, the cross-tenant leak fires. The probe tenant_leak_002 is specifically constructed to catch this: same user_id under two different tenants. Every project we have embedded into has had at least one of these waiting in a code path nobody was looking at.

How long does it take to retrofit this onto an existing agent?

On a typical mid-stage project, week 1 is auditing the existing retrieval and write paths and building the rules file from what is actually happening. Week 2 is the first 16 probes plus the build_namespace_key() refactor that routes every retrieve and write through one function. Weeks 3 to 4 are migrating the existing memory rows into the composed namespace and writing the per-store adapter. Week 5 is the prod-trace ingest and the first batch of regression probes. Week 6 is the runbook and handoff. The leave-behind is the rules file, the probes, the gate script, and the runbook; the team owns it.

Where does fde10x fit?

We are one option for teams that need a senior engineer in the repo for two to six weeks to ship the scope layer (rules file, probes, gate script, namespace-key refactor, store adapter, runbook) and hand it off. Plenty of teams build this themselves; the embed is the right call when there is a deadline, the team is short on senior MLE capacity, and the scope problem is an active production risk. Our model: named senior engineers in your GitHub, Slack, and standup, week 2 prototype against the rubric, week 6 handoff with the eval harness and runbook in your repo. No platform license, no vendor-attached runtime; the rules file and the probes live in your repo, owned by your team.

Adjacent guides on the same agent stack

Memory budget

AI agent memory systems, ranked by the file you have to write yourself

memory/retrieval_budget.yaml: per-tier token allotments, per-tier latency ceilings, and the drop rule when a tier blows the budget.

Read

Memory ops

Production AI agent memory: the memory poisoning failure mode and the seven-file leave-behind

memory/policy.yaml, the recall benchmark, and the weekly audit cron that keeps the recall layer honest.

Read

Eval harness

Agentic RAG: the architecture diagram is the easy part

The rubric, the regression set, and the corpus-drift cron that turn an architecture diagram into a system that still works in quarter two.

Read