The Access Layer Pattern

Merivant — 2026

Research

Implementation status. The core pattern described here — typed placeholders, bidirectional map, streaming rehydration — is implemented and in production. The three-layer detection model (Pattern → List → NER) is partially implemented: the Pattern layer (regex-based detection for SSNs, cards, phones, emails, credentials, PHI identifiers) is active. The List and NER layers described in the technical deep dive are on the roadmap. Unstructured PII that doesn't match a pattern (e.g., freeform street addresses, contextual references) is a known gap until the NER layer ships.

The problem

You want to use AI on your real business data — your projects, your clients, your internal systems — without sending secrets to the cloud or breaking the model's ability to reason. That is the tension every organization using AI faces right now.

The existing tools force a choice: protect the data and lose the AI, or use the AI and accept the exposure.

The AI governance problem is not detection. It is designing systems where the wrong data physically cannot reach the wrong place.

Traditional data loss prevention was built for a world where information moves in documents and emails.17 It scans, blocks, and logs. That worked when the risk was someone attaching the wrong file. It does not work when the risk is a continuous stream of conversation between your team and an AI model, where removing the sensitive parts destroys the context the AI needs to be useful.23

The industry response has been to add more scanning: more detection models, more cloud-based proxies, more dashboards.4 All of them share the same fundamental flaw. They strip data out and never put it back. The AI sees [REDACTED] where it needed to see a relationship between things, and the response comes back confused or useless.


What an access layer does

An access layer is a component inside your system — not in a cloud, not as a third-party service — that replaces sensitive values with labeled stand-ins before anything leaves your infrastructure. When the AI responds using those stand-ins, the access layer swaps the original values back in before anyone sees the result.

The AI never sees your real data. Your team never sees the stand-ins. The reasoning works because the stand-ins carry enough structure for the AI to understand what kind of thing it's working with — a project name, a person, an account number — without knowing the actual value.

Here's what that looks like in practice:

What your team types

"Should we move Project Phoenix to Q3? Alice disagrees."

What the AI sees

"Should we move <<PROJECT_0>> to Q3? <<PERSON_0>> disagrees."

What the AI responds

"Moving <<PROJECT_0>> to Q3 would give <<PERSON_0>> more time to prepare."

What your team sees

"Moving Project Phoenix to Q3 would give Alice more time to prepare."

The AI reasoned correctly about two distinct things — a project and a person — without ever knowing what they were called. When two projects are mentioned, it sees <<PROJECT_0>> and <<PROJECT_1>> and understands they are different. With traditional redaction, both would just say [REDACTED] and the AI could not tell them apart.


How it compares

There are five common approaches to preventing sensitive data from leaking through AI systems. Each makes a different tradeoff.

1. Network-level scanning

A gateway inspects outbound traffic and blocks or replaces anything that looks sensitive.6 Good for catching accidental pastes. Bad for AI, because it strips out the context the model needs. If someone asks about two different clients and both names are removed, the AI cannot reason about the difference.

2. AI-powered detection

A second AI model scans for sensitive information by understanding context, not just patterns.5 Better at catching things like "my social is the one I gave to the bank last Tuesday." But it adds latency, still performs one-way removal, and creates a second AI system to maintain and monitor.

3. Fake data substitution

Replace real names with realistic fakes.89 "Acme Corp" becomes "Nexus Technologies." Preserves grammar, but the mapping between real and fake names is itself sensitive. Generating consistent fakes across a long conversation is hard. And if the fake name exists in the AI's training data, you get contamination between the fake and whatever the AI already knows about it.

4. Cloud scanning services

Third-party platforms that sit between your application and the AI provider.5 They scan and redact in real time. The problem: your sensitive data now passes through a third party's servers to prevent it from passing through a different third party's servers. For regulated industries, this is often a non-starter.

5. The access layer

Runs inside your own process. Replaces values with labeled stand-ins that preserve the AI's ability to reason. Reverses them on the way back. No data leaves your infrastructure unprotected, and the AI's output is as useful as if it had the real data.

What matters Scanning AI detection Fake data Cloud service Access layer
AI output is usable No No Mostly No Yes
Reversible No No Complex No Yes
Data stays on your infra Yes Yes Yes No Yes
Works across a conversation No No Hard No Built-in
Logs don't contain secrets No No No No Yes

The highlighted rows are the decisive differences. Only the access layer delivers all three: the AI's reasoning stays intact, the process is fully reversible, and the audit trail contains no sensitive data. Every other approach either breaks the AI or creates a secondary data liability in your logging infrastructure.


What this means for governance

The access layer maps to several control concepts that auditors and compliance teams care about:

The auditor conversation sounds like this: "Can you prove that no vendor ever sees raw SSNs?" With the access layer, the answer is yes — and the proof is architectural, not procedural. The system replaces SSNs before any outbound call. The audit log confirms it happened. The vendor's logs confirm they received stand-ins. Three independent verification points, none of which contain the actual SSN.


The principle

Redaction is a property of the data flow, not a checkbox on a settings page. The cloud never sees the full picture because the architecture makes it physically impossible, not because policy says so.

Policy can be misconfigured, overridden, or forgotten. A system that replaces values before they reach the network is harder to bypass. The protection is structural, not administrative. This is the difference between "we have a DLP policy" and "the system won't let it happen."

The access layer is one answer. Not the only one. But one that works today, in production, without a cloud proxy, without additional AI infrastructure, and without destroying the reasoning that makes AI useful in the first place.

Technical deep dive → Implementation details, code walkthrough, architecture diagrams, integration scenarios, and performance characteristics for engineering teams evaluating the access layer pattern.

Working with us

Merivant helps teams design and implement access layers for regulated workloads. We work at the architecture level — control design, registry strategy, audit integration, and the organizational patterns that make AI governance sustainable rather than theatrical.

If you are building AI systems that handle sensitive data and want to evaluate the access layer pattern for your environment: request a working session.


Sources

  1. Data Loss Prevention: A Complete Guide for the GenAI EraLakera
  2. Smarter PII Handling in LLMs: Privacy Without CompromiseFirstsource
  3. The Empirical Impact of Data Sanitization on Language ModelsarXiv
  4. PII Sanitization for LLMs and Agentic AIKong
  5. AI Data Governance with AI-Native Data Loss ProtectionNightfall
  6. Data Leakage Prevention for LLMs: Essential GuideNightfall
  7. What is DLP? 2026 GuideConcentric AI
  8. Enforcing Data Privacy in LLM ApplicationsRadicalbit
  9. PII Data Masking Techniques ExplainedGranica

The throughline

Research from the same practice.