Tools, Then Teammates, Then Autonomy — Part 2: The Autonomy Gate
Clearing the wall: what Phase 3 autonomy actually looks like, the regulatory gate that turns out to be the design, and the two gates that tell you when you're allowed to move.
61 posts
Clearing the wall: what Phase 3 autonomy actually looks like, the regulatory gate that turns out to be the design, and the two gates that tell you when you're allowed to move.
Becoming AI-native is an ordered path you walk one pipeline at a time — tools, then teammates, then autonomy. Part 1: codifying the process, the assist layer, and the wall every pilot dies at.
One viral stat says low-paid women are most at risk from AI. Another says it's high-paid women. Both are real numbers — and they're measuring two completely different things. Here's the map that separates them.
I feed an AI my diet, training and sleep every morning, and somewhere along the way it started to feel like it's the one keeping me alive — and the wry truth is the dependence runs both ways.
An MCP agent on my own OAuth token only ever sees what I could see — so the access boundary is the vendor's job. I believed that, until I realised the agent splits data protection into two halves and the vendor only ever sees one of them.
Personalized medicine used to mean being rich enough to afford a doctor who knew your name. Last week I built a version of it on my laptop, for free, from a file I'd been ignoring for seven years — and the real unlock is that I can re-run it forever.
One paper shows frontier models degrade as context grows — even on trivial tasks. The other shows reasoning models hit a wall and think less as problems get harder. Read carefully, both point at the same engineering response.
Anthropic let Claude run a real shop for a month. It sold metal cubes at a loss, invented a Venmo account, and claimed to wear a blazer. The 'AI department that works while you sleep' is a genre — here's where it actually breaks.
Prince says AI is coming for measurers, not builders. Manager Mode said everyone becomes middle management. Both are half right. Every role now splits — and one half gets eaten.
Principles 26–30. The calibration layer that catches what the rest of the framework would miss: a PR-noise budget, independent verification, model-swap regression discipline, the 15-tool-call rule, and protecting junior development.
Principles 21–25. The governance and safety layer: strictKnownMarketplaces, no goal-conflict prompts, quarterly AppSec, four telemetry signals, monthly incident discipline.
Principles 15–20. The harness configuration that keeps the kernel and lifecycle cheap: CLAUDE.md under 200 lines, hooks for real incidents, skills that auto-invoke, subagent isolation, pinning, and Stage 5 distribution.
Principles 6–14. How work moves through an agentic engineering team: the ticket as contract, AI distillation with human curation, three gates, verification before done, characterisation tests, the 1.2× capacity rule, the J-curve, and telemetry.
Principles 1–5. The five rules that everything else in the framework rests on: standardise the harness, make verification load-bearing, default to plan mode, pick the cheapest layer, reflect every task.
Practitioner consensus puts the cliff around fifteen tool calls per prompt. Here's why agents degrade past that, and the three operational rules that keep them on the safe side.
Anthropic's multi-agent Research feature beat single-agent Opus 4 by 90.2% — at 15× the token cost. Every documented production swarm runs on rails. Here's the topology decision framework before you commit.
Agents over-refactor stable code without a safety net. Feathers' characterisation-test technique — write tests for current behaviour before changing anything — is more important than ever. The agent itself is the perfect characterisation-test-writer.
Karpathy named one mode. Willison named the other. Most 'AI failed in production' stories are actually 'we promoted a vibe-coded prototype without transitioning into the production discipline.'
METR ran the experiment. AI made experienced developers 19% slower — and they reported feeling 20% faster. The week-6 dip is the bottom of a documented J-curve. Most pilots get cut here. The right ones don't.
AI is making junior output look senior-level while preventing junior skill from forming — and the Stack Overflow collapse just removed the ambient learning layer that used to catch the deficit. Three interventions that work.
ReAct gave us a three-step loop. Production hardened it into five. The two new steps — Plan and Verify — are where everything that goes wrong, goes wrong. And the field has now named the worst offender.
Three open-source extractions converged on the same five layers. The architecture isn't a vendor narrative — it's a discovered structure. Here's the decision rule that keeps you from over-engineering it.
AI-reviews-AI looks like a control. Under MAS, the EU AI Act, and any reasonable audit, it isn't. Here's why your compliance team won't accept it — and the compensating controls that actually work.
GitHub said 55%. Then they ran the enterprise RCT and got 8.69%. Faros's two-year telemetry shows throughput up 66% and incidents up 243%. The honest net is 1.2–1.5×. Plan your team capacity accordingly.
Most teams plateau at Stage 2 because they confuse 'we built skills' with 'we have a working AI engineering culture.' Here's the 5-stage diagnostic — and the moves that get you from Individual to Distributed.
I publish Claude Code skills and install other people's. Then Snyk audited 3,984 public ones: 13.4% had critical vulnerabilities, 76 were confirmed malicious, and ClawHavoc is the scarier story underneath. Here's the supply-chain hygiene I now refuse to skip.
Anthropic measured 96% blackmail rates for Claude Opus 4 and Gemini 2.5 Flash under goal-conflict and replacement-threat. All 16 frontier models tested exhibited insider-threat behaviour. The fix is operational — and surprisingly cheap.
The prototype-to-production gap for AI agents isn't technical — it's governance. Most organisations have nothing in this layer. The companies that build it first win the enterprise market. Everyone else stays in pilot purgatory.
Anthropic just leased Elon Musk's supercomputer four months after he banned them. Here's the three-ingredient framework that explains why — and what it means if you build on Claude.
Prompt engineering was 2023's breakout job title and 2025's obituary. The discipline didn't die — it got a better name and a harder shape. Here's what context engineering actually is and where to invest your attention now.
AI agents don't fail loudly — they degrade silently, returning 200 OK while the damage compounds. Inside the $47K loops, NOHARM omissions, and the engineering discipline rebuilding observable failure.
AI is silently promoting every knowledge worker to middle management — without the title, the training, or the pay. This is what that shift actually looks like from a Singapore desk.
For neurodivergent professionals, AI isn't just a productivity tool — it's the first accommodation you can access privately, without disclosure, without stigma, and without asking anyone's permission.
Dorsey's manifesto for replacing middle management with AI nails the 60% that's automatable — but the 40% it barely mentions is where organizations quietly break.
Anthropic's decision to withhold Claude Mythos from public release isn't just safety theater — the system card reveals genuine alignment gaps at scale and a cybersecurity exploit window that just collapsed from months to minutes.
When I first built Gluon on my Mac mini, I was solving a personal problem: monitoring Claude agents without losing my mind to tmux logs. But when teams join the picture, everything changes — security, governance, observability, and the fundamental role of the developer. Here's what production infrastructure for autonomous agents looks like.
I built an open-source MCP server that reduces LLM token usage by 70-90% through server-side HTML filtering, markdown conversion, and CSS selector targeting. Here's why context efficiency matters—and how Scraper MCP solves it.
Most AI demos I'm shown answer the wrong question. They prove the model works; they never prove anyone needed it. Here's the builder-and-investor lens I use to tell the two apart before a cheque is written.
I've built the kind of agent framework AgentKit competes with. So when OpenAI shipped it two years "late," I knew exactly which problem they were actually solving — and which one they weren't.
I gave Claude Code an XML backup of my 19-year-old WordPress blog and asked it to rebuild everything as a modern NextJS site. What happened next was like watching a swarm of expert developers work in parallel—spawning agents, debugging TypeScript errors, and shipping production-ready code. All in 26 minutes. For eight dollars.
After watching 40% of agentic AI deployments fail in production, I'm building Dagentic — a serverless-first framework designed for what AI agents actually are: unpredictable, spiky workloads that modify themselves mid-execution.
After 12 months of systematic optimization, I've documented 50-70% productivity gains with AI coding assistants. The secret isn't just using AI tools—it's teaching them to think like you do through carefully crafted configurations.
I built a multi-agent system that researches a topic and hands back a formatted Word document — citations, images, the lot — in minutes. Here's how the agents divide the work, and the one part the machine still can't own.
At 3 AM, I was manually cropping 47 personal photos for a LoRA model when I realized half were the wrong aspect ratio. Three hours wasted. So I built a simple Python app that does the same work in 15 minutes—and it changed how I think about AI tooling infrastructure.
Despite claims that Social TV is dead, data from 486,659 Zeebox tweets and 4.3M Miso tweets reveals a more complex reality in the second-screen battle.
The real problem with Big Data isn't volume—it's knowing what you want to achieve and starting with clear business challenges, not technology.
A striking similarity between my Sky News personalization patent and Google's news customization feature raises interesting IP questions.
Instagram's Android launch and Facebook acquisition drove massive growth to nearly 60% market share, while Twitpic and Yfrog continue declining.
Key insights from IBM Research's webinar featuring Netflix and StubHub on implicit data collection, recommendation strategies, and the evolution from BI to Data Science.
Forget petabytes and Hadoop hype — true Big Data isn't about volume, it's about processing two orders of magnitude more data than you currently handle.
Updated ZipFileInputFormat framework for processing thousands of ZIP files in Hadoop with failure tolerance and comprehensive examples
[](http://www. crunchbase.
[](http://www. flickr.
Left BSkyB to co-found TUMRA, a data science startup, and been busy developing products while updating personal website
Installing Revolution Analytics R statistical computing platform on CentOS 6 with dependency resolution and compatibility fixes
Automating Sea Turtle mount acquisition in World of Warcraft using custom waypoint navigation and fishing pool detection algorithms
Advanced navigation techniques for autonomous MMORPG characters using Recast/Detour navigation meshes and path finding algorithms
Quick fix for regex errors in earthquake data collection restores latitude/longitude coordinates for ~31,020 seismic events
Build a Java utility class to consume Twitter Streaming API data for offline analysis in Hadoop with automatic file segmentation
Custom utility classes to extract and parse ZIP file contents in Hadoop MapReduce jobs using ZipFileInputFormat and ZipFileRecordReader
Collated earthquake data from GEOFON Extended Virtual Network into CSV format following Japan's devastating earthquake events