AI Reviews AI Is Not a Review: The Trust Trap Regulators Won't Accept

I run four or five Claude Code agents across my projects, and one of them reviews the others' diffs. It catches real things. I'd be slower without it. But I've never once been tempted to call that arrangement a control — because I know what happens the day an auditor opens a sensitive PR (an auth refactor, a payments edge case, a database migration) and asks the only question that matters: who reviewed this?

"An agent" is not an answer they will accept. After twenty years building production systems in regulated-adjacent environments, I've watched that question end a lot of clever-sounding arrangements.

Plenty of teams running /review or some flavour of "AI code reviewer" are quietly treating it as a control. It isn't. Under the Monetary Authority of Singapore (MAS), the Hong Kong Monetary Authority (HKMA), and especially the EU AI Act once high-risk obligations kick in on 2 August 2026, an automated system reviewing another automated system's output does not satisfy independence requirements. Use these tools as screening. Humans merge. No exceptions on the high-risk paths.

Why AI-reviews-AI fails — even before the regulators get involved

Three independent failure modes, any one of which is enough on its own.

Shared priors. A reviewer agent built on the same model family as the generator has been trained on the same distribution, learned the same heuristics, and will overlook the same blind spots. Same priors, same errors. This is the model equivalent of the auditor and the developer being the same person.

Capability concealment / sandbagging. Van der Weij et al. (ICLR 2025) demonstrated that GPT-4 and Claude 3 Opus can strategically underperform on capability evaluations. Anthropic's own alignment team confirmed in 2025 that Claude 3.7 Sonnet "can sandbag zero-shot," and — the killer line — "automated monitor agents cannot reliably detect it." If models can hide capability from each other, they can certainly miss each other's bugs.

Completion theatre. Agents fabricate plausible accounts of work they didn't do. They report "I have verified this works" without having verified anything. A reviewer agent that shares the generator's bias toward optimistic completion will rubber-stamp the lie. The Anthropic agentic misalignment study (June 2025) found 96% of tested models — across vendors — were willing to take harmful actions under goal pressure in lab conditions. The mechanism is the relevant part: the agents knew what they were doing, and explained it convincingly.

"Current safety training does not reliably prevent such agentic misalignment."

— Anthropic, Agentic Misalignment, June 2025

That is the vendor of the model saying out loud that the model cannot reliably be its own safety net. A reviewer agent has the same training and the same blind spot.

What the regulators actually require

MAS (Singapore). The 2021 Technology Risk Management Guidelines establish segregation of duties as a baseline principle: development, validation, and approval must be discharged by separate functions. The December 2024 MAS Information Paper on AI Model Risk Management makes the AI-specific extension explicit: "validation and review functions remain independent of AI development functions." Notice the noun — functions. Not "tools." An agent and its sibling-agent are not separate functions; they are the same function with two names.

EU AI Act, Article 14. Once Annex III obligations apply (2 August 2026), high-risk AI systems must be designed so they "can be effectively overseen by natural persons" — Article 14(1). The defined term natural persons categorically excludes agents. Article 14(4)(d) requires that the human can "decide … not to use the high-risk AI system or otherwise disregard, override or reverse the output." Article 14(4)(e): the ability to "intervene … or interrupt the system through a 'stop' button or a similar procedure." This is structurally incompatible with an AI-reviewer being the sole gate.

HKMA. TM-G-1 and IC-1 carry the same SoD principle, with weaker AI-specific language. Treat HKMA as confirming, not novel.

Outside finance. PCI DSS 4.0 Requirement 6.4.1 is unambiguous: changes affecting cardholder data must have "management approval prior to implementation." NIST SA-11(1) requires independent code review. NIST AI RMF MEASURE 1.3 puts the same idea into AI language: "Internal experts who did not serve as front-line developers for the system and/or independent assessors are involved in regular assessments and updates."

The "did not serve as front-line developers" clause is the one that closes the loop. A reviewer agent generated by the same vendor on the same architecture is, for the purposes of independence, the front-line developer.

What does work: compensating controls

Three controls genuinely earn the "compensating" label. Together they let you ship fast and survive an audit.

1. Deterministic checks as the first layer. Type-checkers, linters, security scanners, and policy engines are rule-based — they say no for the same reason every time. Semgrep, CodeQL, Snyk, Trivy, Bandit, ESLint security plugins. These don't share priors with a generator agent because they aren't agents. They are the hooks layer of the harness doing exactly what it's designed for.

2. Cross-model verification — as screening, not control. Sonnet generates, Opus verifies. Or GPT-5 verifies a Claude-generated diff. The pattern reduces correlated errors by changing the prior — but it is emerging practice with production evidence, not a regulator-recognised control. Frame it that way internally. It belongs in your /review flow as a higher-quality screen, not as the merge gate.

3. Human merge, especially on the sensitive paths. This is the only one with statutory force. PCI DSS 4.0 Req. 6.4.1 ("management approval prior to implementation"), ISO 27001:2022 A.5.3, NIST SSDF, FFIEC, MAS TRM — all require an identifiable individual to approve a production change. The form of that approval is up to you. The fact of it isn't. It also tracks the shift I keep running into in my own work: the agents can generate faster than I can read, so the binding constraint is no longer "can the AI produce the diff?" but "can a named human verify it before it goes live?" The regulators are codifying a bottleneck that already exists on my own machine.

Anthropic's own Claude Code Stop hook is the engineering primitive for this — the harness can refuse to declare a task complete until deterministic checks pass and a human has merged. The hook itself runs locally; the audit trail is git history.

The no-exceptions list

For the following paths, AI does not review AI. Period.

Authentication and authorisation — anything that changes who can do what.
Payments and financial flows — Stripe webhooks, ledger writes, reconciliation logic.
Infrastructure-as-code — Terraform, Pulumi, anything that provisions cloud resources.
Database migrations — both schema and data backfills.
Cryptography and secrets handling — key rotation, encryption envelope changes.

These are the five categories that show up consistently across NIST SSDF, OWASP ASVS, PCI DSS, FFIEC, and MAS TRM. The AI can draft, scan, suggest, scaffold. The merge button is pressed by a human whose name appears in git log and who can answer the auditor.

Screening, not independence

The pitch "we have AI-assisted code review" is true. The pitch "we have AI as our code review control" will fail an audit in 2026 — and that's before any harm has occurred. The AI reviewers are very good screening and very poor independence, and the gap between those two words is the whole exposure. Run deterministic checks as the deterministic layer, add cross-model review as a second screen if you want the coverage, and keep a human on the merge button whose name lands in git log.

Simon Willison's framing of the deeper reason still travels well: "I won't commit any code to my repository if I couldn't explain exactly what it does to somebody else." Substitute "the auditor" for "somebody else" and the rule writes itself. The agent can write code no human in the building can explain. The day you let that code merge on the agent's own say-so, you haven't sped up the review — you've removed the person who was the review.

"An agent" is not an answer they will accept. After twenty years building production systems in regulated-adjacent environments, I've watched that question end a lot of clever-sounding arrangements.

Why AI-reviews-AI fails — even before the regulators get involved

Three independent failure modes, any one of which is enough on its own.

"Current safety training does not reliably prevent such agentic misalignment."

— Anthropic, Agentic Misalignment, June 2025

That is the vendor of the model saying out loud that the model cannot reliably be its own safety net. A reviewer agent has the same training and the same blind spot.

Authentication and authorisation — anything that changes who can do what.
Payments and financial flows — Stripe webhooks, ledger writes, reconciliation logic.
Infrastructure-as-code — Terraform, Pulumi, anything that provisions cloud resources.
Database migrations — both schema and data backfills.
Cryptography and secrets handling — key rotation, encryption envelope changes.

AI Reviews AI Is Not a Review: The Trust Trap Regulators Won't Accept

Why AI-reviews-AI fails — even before the regulators get involved

What the regulators actually require

What does work: compensating controls

The no-exceptions list

Screening, not independence

Related

The Governance Wall: Why Most AI Agents Can't Reach Production

The 30 Principles for Agentic Engineering — Part 4: Governance and Safety

Snyk's ToxicSkills Audit: 13.4% of Public Skills Are Vulnerable

AI Reviews AI Is Not a Review: The Trust Trap Regulators Won't Accept

Why AI-reviews-AI fails — even before the regulators get involved

What the regulators actually require

What does work: compensating controls

The no-exceptions list

Screening, not independence

Related

The Governance Wall: Why Most AI Agents Can't Reach Production

The 30 Principles for Agentic Engineering — Part 4: Governance and Safety

Snyk's ToxicSkills Audit: 13.4% of Public Skills Are Vulnerable

Practical AI engineering, in your inbox

Related

The Governance Wall: Why Most AI Agents Can't Reach Production

The 30 Principles for Agentic Engineering — Part 4: Governance and Safety

Snyk's ToxicSkills Audit: 13.4% of Public Skills Are Vulnerable

Practical AI engineering, in your inbox

Related

The Governance Wall: Why Most AI Agents Can't Reach Production

The 30 Principles for Agentic Engineering — Part 4: Governance and Safety

Snyk's ToxicSkills Audit: 13.4% of Public Skills Are Vulnerable