The 30 Principles for Agentic Engineering — Part 4: Governance and Safety
Principles 21–25. The governance and safety layer: strictKnownMarketplaces, no goal-conflict prompts, quarterly AppSec, four telemetry signals, monthly incident discipline.
5 posts
Principles 21–25. The governance and safety layer: strictKnownMarketplaces, no goal-conflict prompts, quarterly AppSec, four telemetry signals, monthly incident discipline.
AI-reviews-AI looks like a control. Under MAS, the EU AI Act, and any reasonable audit, it isn't. Here's why your compliance team won't accept it — and the compensating controls that actually work.
Anthropic measured 96% blackmail rates for Claude Opus 4 and Gemini 2.5 Flash under goal-conflict and replacement-threat. All 16 frontier models tested exhibited insider-threat behaviour. The fix is operational — and surprisingly cheap.
The prototype-to-production gap for AI agents isn't technical — it's governance. Most organisations have nothing in this layer. The companies that build it first win the enterprise market. Everyone else stays in pilot purgatory.
Anthropic's decision to withhold Claude Mythos from public release isn't just safety theater — the system card reveals genuine alignment gaps at scale and a cybersecurity exploit window that just collapsed from months to minutes.