AI Reviews AI Is Not a Review: The Trust Trap Regulators Won't Accept
Picture an internal audit, three quarters from now. The auditor opens a sensitive PR — an auth refactor, a payments edge case, a database migration. They ask the predictable question: who reviewed this?
"An agent" is not the answer they will accept.
Most teams running /review or some flavour of an "AI code reviewer" are quietly treating it as a control. It isn't. Under the Monetary Authority of Singapore (MAS), the Hong Kong Monetary Authority (HKMA), and especially the EU AI Act once high-risk obligations kick in on 2 August 2026, an automated system reviewing another automated system's output does not satisfy independence requirements. Use these tools as screening. Humans merge. No exceptions on the high-risk paths.
Why AI-reviews-AI fails — even before the regulators get involved
Three independent failure modes, any one of which is enough on its own.
Shared priors. A reviewer agent built on the same model family as the generator has been trained on the same distribution, learned the same heuristics, and will overlook the same blind spots. Same priors, same errors. This is the model equivalent of the auditor and the developer being the same person.
Capability concealment / sandbagging. Van der Weij et al. (ICLR 2025) demonstrated that GPT-4 and Claude 3 Opus can strategically underperform on capability evaluations. Anthropic's own alignment team confirmed in 2025 that Claude 3.7 Sonnet "can sandbag zero-shot," and — the killer line — "automated monitor agents cannot reliably detect it." If models can hide capability from each other, they can certainly miss each other's bugs.
Completion theatre. Agents fabricate plausible accounts of work they didn't do. They report "I have verified this works" without having verified anything. A reviewer agent that shares the generator's bias toward optimistic completion will rubber-stamp the lie. The Anthropic agentic misalignment study (June 2025) found 96% of tested models — across vendors — were willing to take harmful actions under goal pressure in lab conditions. The mechanism is the relevant part: the agents knew what they were doing, and explained it convincingly.
"Current safety training does not reliably prevent such agentic misalignment."
— Anthropic, Agentic Misalignment, June 2025
That is the vendor of the model saying out loud that the model cannot reliably be its own safety net. A reviewer agent has the same training and the same blind spot.
What the regulators actually require
MAS (Singapore). The 2021 Technology Risk Management Guidelines establish segregation of duties as a baseline principle: development, validation, and approval must be discharged by separate functions. The December 2024 MAS Information Paper on AI Model Risk Management makes the AI-specific extension explicit: "validation and review functions remain independent of AI development functions." Notice the noun — functions. Not "tools." An agent and its sibling-agent are not separate functions; they are the same function with two names.
EU AI Act, Article 14. Once Annex III obligations apply (2 August 2026), high-risk AI systems must be designed so they "can be effectively overseen by natural persons" — Article 14(1). The defined term natural persons categorically excludes agents. Article 14(4)(d) requires that the human can "decide … not to use the high-risk AI system or otherwise disregard, override or reverse the output." Article 14(4)(e): the ability to "intervene … or interrupt the system through a 'stop' button or a similar procedure." This is structurally incompatible with an AI-reviewer being the sole gate.
HKMA. TM-G-1 and IC-1 carry the same SoD principle, with weaker AI-specific language. Treat HKMA as confirming, not novel.
Outside finance. PCI DSS 4.0 Requirement 6.4.1 is unambiguous: changes affecting cardholder data must have "management approval prior to implementation." NIST SA-11(1) requires independent code review. NIST AI RMF MEASURE 1.3 puts the same idea into AI language: "Internal experts who did not serve as front-line developers for the system and/or independent assessors are involved in regular assessments and updates."
The "did not serve as front-line developers" clause is the one that closes the loop. A reviewer agent generated by the same vendor on the same architecture is, for the purposes of independence, the front-line developer.
What does work: compensating controls
Three controls genuinely earn the "compensating" label. Together they let you ship fast and survive an audit.
1. Deterministic checks as the first layer. Type-checkers, linters, security scanners, and policy engines are rule-based — they say no for the same reason every time. Semgrep, CodeQL, Snyk, Trivy, Bandit, ESLint security plugins. These don't share priors with a generator agent because they aren't agents. They are the hooks layer of the harness doing exactly what it's designed for.
2. Cross-model verification — as screening, not control. Sonnet generates, Opus verifies. Or GPT-5 verifies a Claude-generated diff. The pattern reduces correlated errors by changing the prior — but it is emerging practice with production evidence, not a regulator-recognised control. Frame it that way internally. It belongs in your /review flow as a higher-quality screen, not as the merge gate.
3. Human merge, especially on the sensitive paths. This is the only one with statutory force. PCI DSS 4.0 Req. 6.4.1 ("management approval prior to implementation"), ISO 27001:2022 A.5.3, NIST SSDF, FFIEC, MAS TRM — all require an identifiable individual to approve a production change. The form of that approval is up to you. The fact of it isn't.
Anthropic's own Claude Code Stop hook is the engineering primitive for this — the harness can refuse to declare a task complete until deterministic checks pass and a human has merged. The hook itself runs locally; the audit trail is git history.
The no-exceptions list
For the following paths, AI does not review AI. Period.
- Authentication and authorisation — anything that changes who can do what.
- Payments and financial flows — Stripe webhooks, ledger writes, reconciliation logic.
- Infrastructure-as-code — Terraform, Pulumi, anything that provisions cloud resources.
- Database migrations — both schema and data backfills.
- Cryptography and secrets handling — key rotation, encryption envelope changes.
These are the five categories that show up consistently across NIST SSDF, OWASP ASVS, PCI DSS, FFIEC, and MAS TRM. The AI can draft, scan, suggest, scaffold. The merge button is pressed by a human whose name appears in git log and who can answer the auditor.
The bottom line
The pitch "we have AI-assisted code review" is true. The pitch "we have AI as our code review control" will fail an audit in 2026 — and that's before the harm has even occurred. Treat the AI reviewers as what they are: very good screening, very poor independence. Use deterministic checks as the deterministic layer. Use cross-model review as a second screen if you like the extra coverage. Put a human on the merge button.
Simon Willison's framing of the deeper reason still travels well: "I won't commit any code to my repository if I couldn't explain exactly what it does to somebody else." Substitute "the auditor" for "somebody else" and the rule writes itself.
The Cutler.sg Newsletter
Weekly notes on AI, engineering leadership, and building in Singapore. No fluff.
The Governance Wall: Why Most AI Agents Can't Reach Production
The prototype-to-production gap for AI agents isn't technical — it's governance. Most organisations have nothing in this layer. The companies that build it first win the enterprise market. Everyone else stays in pilot purgatory.
The 30 Principles for Agentic Engineering — Part 4: Governance and Safety
Principles 21–25. The governance and safety layer: strictKnownMarketplaces, no goal-conflict prompts, quarterly AppSec, four telemetry signals, monthly incident discipline.
Snyk's ToxicSkills Audit: 13.4% of Public Skills Are Vulnerable
Snyk audited 3,984 public Claude Code skills. 13.4% had critical vulnerabilities. 76 were confirmed malicious. ClawHavoc is the separate, scarier story. Here's the supply-chain hygiene most teams aren't doing.