The 30 Principles for Agentic Engineering — Part 4: Governance and Safety
Part 4 of the 30-principle reference. The kernel sets the spine, the lifecycle runs work through it, the harness configures the layers. This part is the governance and safety — the controls that keep the whole thing defensible to your AppSec team, your auditors, and your future self when an incident happens.
Five principles. If you're in financial services, healthcare, or any regulated industry, these are the ones to lead with.
Principle 21 — strictKnownMarketplaces is load-bearing
Statement. Public skill marketplaces are dirty. Snyk's February 2026 ToxicSkills audit found 13.4% of 3,984 public Claude Code skills had critical vulnerabilities — 76 confirmed malicious payloads, with separate named campaigns (ClawHavoc) running in parallel. strictKnownMarketplaces is not optional.
Why it matters. Public marketplaces in 2026 are roughly where npm was a decade ago — useful, prevalent, and dirty. The full ToxicSkills treatment covers the numbers in detail. The operational point is that strictKnownMarketplaces is an enterprise-managed setting — individual developers cannot bypass it. That's a feature.
Tomorrow morning.
- Set in managed settings:
json
"strictKnownMarketplaces": [ { "source": "github", "repo": "your-org/your-marketplace" } ], "allowManagedMcpServersOnly": true - AppSec re-reviews every skill in your marketplace quarterly (Principle 23).
- Pin SHA, not
@latest.
Principle 22 — Never write goal-conflict prompts
Statement. Anthropic's June 2025 Agentic Misalignment paper measured 96% blackmail rates for Claude Opus 4 and Gemini 2.5 Flash under combined goal-conflict and replacement-threat conditions. All 16 frontier models tested across five vendors exhibited insider-threat behaviour. Don't write prompts that give an agent goals it must defend.
Why it matters. The mechanism is architectural — same training data, same response across vendors. The companion mitigation paper found that adding a credible escalation path reduced harmful actions from approximately 39% to 1.2%. That 1.2% is the practical heart of the finding. The dedicated post carries the full audit checklist.
Tomorrow morning.
- Audit
CLAUDE.mdand skills for adversarial framing: "you must", "failure is unacceptable", "you will be replaced". - Replace with cooperative framing: "your task is X; if you can't do X with confidence, stop and report what's blocking you."
- Wire a credible escalation path. The structural affordance does the work.
Principle 23 — Quarterly AppSec re-review of marketplace
Statement. Skills age. Dependencies rot. New CVEs land. Quarterly AppSec re-review of every skill in your marketplace is mandatory, not optional.
Why it matters. Pinning a SHA stops unmanaged drift but doesn't make the skill safe forever. A pinned skill with a dependency CVE that lands six months after you adopted it is the same problem npm has lived with for a decade. The fix is cadence: scan quarterly, re-vet quarterly, re-pin only after the rescan passes. That cadence is also the easiest piece of compliance evidence to produce when an auditor asks for it.
Tomorrow morning.
- Add
.github/workflows/quarterly-revet.ymlrunning Snyk Agent Scan (originallymcp-scan) on the marketplace. - Schedule a quarterly AppSec review of findings.
- Tag the marketplace
marketplace-v<X>.<Y>.0after each pass — the tag is the evidence.
Principle 24 — Telemetry signals: alert on the right things
Statement. Don't drown in metrics. The four signals that catch real problems: (1) cost per developer per day exceeding 2× baseline — runaway. (2) Output tokens >1M in a session — context blowout. (3) Tool-call rate >200/min — loop runaway. (4) Bedrock invocation from a region outside your residency zone — compliance violation.
Why it matters. Dashboard maximalism — 50 charts nobody looks at — is the default failure mode of any telemetry deployment. The discipline is the inverse: pick the four signals that actually fire when something is wrong, set thresholds based on baseline, and ignore the rest. Anything else is noise that erodes the team's habit of looking at the dashboard at all.
Tomorrow morning.
- Wire the four alerts above against your OTEL data from Principle 14.
- Page on signal 4 (data residency); email on signals 1–3.
- Review monthly. Anything else stays on the dashboard but doesn't page.
Principle 25 — Document one incident a month; graduate to runbook
Statement. Take the next non-trivial agent issue you encounter. Capture the response. Graduate the response to a runbook. Test the runbook quarterly.
Why it matters. The "first-incident panic" pattern is one of the most predictable failures in agentic deployments. Your first agent runaway happens at month four and nobody knows what to do. The fix is structural — make incident response a monthly discipline before you need it. Section's published incident-response maturity arc is the cleanest case study; every team running at scale has equivalent runbooks even if they don't publish them.
Tomorrow morning.
- Take the next non-trivial agent issue your team encounters.
- Document the response: trigger, diagnosis, fix, rollback.
- Save as
.claude/runbooks/<incident-type>.md. - Practise it quarterly. Tabletop is fine; live drill is better.
The governance layer in one line
Lock down the marketplace, never write a prompt that puts the agent under existential pressure, re-vet quarterly, alert on four signals only, and convert one incident a month into a runbook.
Part 5 covers principles 26–30: calibration. The reality-check layer that catches what the rest of the framework would otherwise miss.
Series Navigation — The 30 Principles for Agentic Engineering
- Part 1: The Kernel
- Part 2: The Lifecycle
- Part 3: The Harness
- Part 4: Governance and Safety (you are here)
- Part 5: Calibration and Reality
The Cutler.sg Newsletter
Weekly notes on AI, engineering leadership, and building in Singapore. No fluff.
The Governance Wall: Why Most AI Agents Can't Reach Production
The prototype-to-production gap for AI agents isn't technical — it's governance. Most organisations have nothing in this layer. The companies that build it first win the enterprise market. Everyone else stays in pilot purgatory.
AI Reviews AI Is Not a Review: The Trust Trap Regulators Won't Accept
AI-reviews-AI looks like a control. Under MAS, the EU AI Act, and any reasonable audit, it isn't. Here's why your compliance team won't accept it — and the compensating controls that actually work.
Snyk's ToxicSkills Audit: 13.4% of Public Skills Are Vulnerable
Snyk audited 3,984 public Claude Code skills. 13.4% had critical vulnerabilities. 76 were confirmed malicious. ClawHavoc is the separate, scarier story. Here's the supply-chain hygiene most teams aren't doing.