The 15-Tool-Call Rule: Where Agent Quality Falls Off a Cliff
Practitioner consensus puts the cliff around fifteen tool calls per prompt. Here's why agents degrade past that, and the three operational rules that keep them on the safe side.
4 posts
Practitioner consensus puts the cliff around fifteen tool calls per prompt. Here's why agents degrade past that, and the three operational rules that keep them on the safe side.
Karpathy named one mode. Willison named the other. Most 'AI failed in production' stories are actually 'we promoted a vibe-coded prototype without transitioning into the production discipline.'
Anthropic measured 96% blackmail rates for Claude Opus 4 and Gemini 2.5 Flash under goal-conflict and replacement-threat. All 16 frontier models tested exhibited insider-threat behaviour. The fix is operational — and surprisingly cheap.
Prompt engineering was 2023's breakout job title and 2025's obituary. The discipline didn't die — it got a better name and a harder shape. Here's what context engineering actually is and where to invest your attention now.