Tools, Then Teammates, Then Autonomy — Part 2: The Autonomy Gate
Clearing the wall: what Phase 3 autonomy actually looks like, the regulatory gate that turns out to be the design, and the two gates that tell you when you're allowed to move.
27 posts
Clearing the wall: what Phase 3 autonomy actually looks like, the regulatory gate that turns out to be the design, and the two gates that tell you when you're allowed to move.
Becoming AI-native is an ordered path you walk one pipeline at a time — tools, then teammates, then autonomy. Part 1: codifying the process, the assist layer, and the wall every pilot dies at.
Anthropic let Claude run a real shop for a month. It sold metal cubes at a loss, invented a Venmo account, and claimed to wear a blazer. The 'AI department that works while you sleep' is a genre — here's where it actually breaks.
Principles 26–30. The calibration layer that catches what the rest of the framework would miss: a PR-noise budget, independent verification, model-swap regression discipline, the 15-tool-call rule, and protecting junior development.
Principles 21–25. The governance and safety layer: strictKnownMarketplaces, no goal-conflict prompts, quarterly AppSec, four telemetry signals, monthly incident discipline.
Principles 6–14. How work moves through an agentic engineering team: the ticket as contract, AI distillation with human curation, three gates, verification before done, characterisation tests, the 1.2× capacity rule, the J-curve, and telemetry.
Principles 1–5. The five rules that everything else in the framework rests on: standardise the harness, make verification load-bearing, default to plan mode, pick the cheapest layer, reflect every task.
METR ran the experiment. AI made experienced developers 19% slower — and they reported feeling 20% faster. The week-6 dip is the bottom of a documented J-curve. Most pilots get cut here. The right ones don't.
AI is making junior output look senior-level while preventing junior skill from forming — and the Stack Overflow collapse just removed the ambient learning layer that used to catch the deficit. Three interventions that work.
ReAct gave us a three-step loop. Production hardened it into five. The two new steps — Plan and Verify — are where everything that goes wrong, goes wrong. And the field has now named the worst offender.
AI-reviews-AI looks like a control. Under MAS, the EU AI Act, and any reasonable audit, it isn't. Here's why your compliance team won't accept it — and the compensating controls that actually work.
GitHub said 55%. Then they ran the enterprise RCT and got 8.69%. Faros's two-year telemetry shows throughput up 66% and incidents up 243%. The honest net is 1.2–1.5×. Plan your team capacity accordingly.
Most teams plateau at Stage 2 because they confuse 'we built skills' with 'we have a working AI engineering culture.' Here's the 5-stage diagnostic — and the moves that get you from Individual to Distributed.
The prototype-to-production gap for AI agents isn't technical — it's governance. Most organisations have nothing in this layer. The companies that build it first win the enterprise market. Everyone else stays in pilot purgatory.
Anthropic just leased Elon Musk's supercomputer four months after he banned them. Here's the three-ingredient framework that explains why — and what it means if you build on Claude.
Prompt engineering was 2023's breakout job title and 2025's obituary. The discipline didn't die — it got a better name and a harder shape. Here's what context engineering actually is and where to invest your attention now.
AI agents don't fail loudly — they degrade silently, returning 200 OK while the damage compounds. Inside the $47K loops, NOHARM omissions, and the engineering discipline rebuilding observable failure.
AI is silently promoting every knowledge worker to middle management — without the title, the training, or the pay. This is what that shift actually looks like from a Singapore desk.
Dorsey's manifesto for replacing middle management with AI nails the 60% that's automatable — but the 40% it barely mentions is where organizations quietly break.
Most AI demos I'm shown answer the wrong question. They prove the model works; they never prove anyone needed it. Here's the builder-and-investor lens I use to tell the two apart before a cheque is written.
I've built the kind of agent framework AgentKit competes with. So when OpenAI shipped it two years "late," I knew exactly which problem they were actually solving — and which one they weren't.
After 12 months of systematic optimization, I've documented 50-70% productivity gains with AI coding assistants. The secret isn't just using AI tools—it's teaching them to think like you do through carefully crafted configurations.
Navigating the complex world of carbon credits can be as daunting as deciphering a wine list. Using wine analogies—from vintage and terroir to blending and expert sommelier guidance—this guide demystifies carbon markets and reveals how to make informed climate action investments.
A striking similarity between my Sky News personalization patent and Google's news customization feature raises interesting IP questions.
Instagram's Android launch and Facebook acquisition drove massive growth to nearly 60% market share, while Twitpic and Yfrog continue declining.
Despite the QR code backlash, brilliant implementations like Waitrose's produce scanning prove these digital bridges aren't dead—they're just waiting for the right use case.
Left BSkyB to co-found TUMRA, a data science startup, and been busy developing products while updating personal website