30 Agentic Engineering Principles — Part 3 Harness

A well-configured harness is invisible. When it's working, the principles from Parts 1 and 2 stay cheap to run — the agent knows what it should know, hooks fire when they should fire, skills trigger on the right prompts. When it's misconfigured, you feel it everywhere: context windows full of stale instructions, hooks blocking things that should never have been a hook, skills the team built and nobody uses.

Part 3 of 5. Six principles. All operational. Spend a quiet afternoon on these once and you'll save weeks downstream.

Principle 15 — `CLAUDE.md` under 200 lines, always

The root CLAUDE.md must stay under 200 lines and stay stable. Anything deep, topic-specific, or path-scoped belongs in rules/<topic>.md or a skill.

A 2,000-line CLAUDE.md is the most common form of harness pollution. It consumes 30–40% of every context window with content the agent doesn't need for most tasks. Boris Cherny's published guidance is unambiguous on the size cap; Jose Parreo Garcia's "You probably don't understand Claude Code memory" walks through the mechanism. The root CLAUDE.md is an index, not an encyclopedia.

Count your lines. If it's over 200, start moving topics to rules/*.md. Treat the root file as a table of contents that points at depth — and audit it quarterly for stale content. Everything that shouldn't be there is paying a context-window tax on every single agent interaction.

Principle 16 — Hooks for invariants that have caused real incidents

Use hooks for rules that must fire without exception — secret scanning, dangerous-bash blocks, formatting. Reserve hooks for rules that have caused incidents. Don't over-hook.

Hooks are deterministic: exit 2 blocks the agent cold. That property is valuable exactly where determinism matters — rules with a measurable failure history. Hooks for soft preferences are the hook-hammer anti-pattern: they bloat the harness and slow every turn for nothing.

The discipline: list every incident your team has had. Forgotten test run. Leaked secret. Wrong package manager committed. That list is your hook backlog — one hook per incident class. pre-commit-secret-scan, block-dangerous-bash, block-wrong-pm. Set allowManagedHooksOnly: true in managed settings so developers can't disable them. Everything else stays in CLAUDE.md as guidance, not as a hard gate.

Principle 17 — Skills auto-invoke based on description matching

A skill's description: field is its activation phrase. Vague descriptions mean the skill won't trigger. Broad descriptions mean the wrong skill triggers. The description is half the work of building the skill.

"Skill bit-rot" is the most common reason teams ship a marketplace of skills nobody ever uses. The skill is technically correct; the description doesn't match how anyone phrases the prompt. The right test: write the user phrase first, then check whether your description matches it — not the other way around.

Audit .claude/skills/*/SKILL.md frontmatter. For each skill: "What user phrase would cause this to fire?" Update the description to match that phrase. Test by typing it as a prompt and watching whether the skill triggers. Repeat until it does.

I maintain a private skills marketplace across several projects — the single highest-leverage maintenance task is keeping descriptions precise. A skill nobody uses is pure overhead: it sits in the marketplace, consumes review cycles, and returns nothing.

Principle 18 — Subagents have isolated context; never let them recurse

Subagents (.claude/agents/) run in their own context window. They cannot spawn further subagents. Use this property: delegate expensive exploration to subagents and keep the main thread clean.

Subagent isolation is the harness feature that makes large-scale exploration affordable. Anthropic's multi-agent Research feature uses this extensively — the per-subagent token cap is the load-bearing constraint that makes the topology affordable. The "no recursion" rule is the design that prevents cost explosions: a subagent that spawned subagents that spawned subagents would turn a bounded task into an unbounded spend.

Build one subagent for your noisiest task type — exploration, security review, audit. Use it on a task that would normally fill the main context with Read and Grep output. Watch the main context stay clean. Don't let a subagent invoke another subagent — Claude Code enforces this; honour it rather than working around it.

Principle 19 — Pin everything

Claude Code CLI version, model name, skill SHAs, MCP server versions. Pin all of it. Floating versions equal silent behavioural changes.

The "it worked last week" failure mode is the most expensive form of unmanaged drift. A model upgrade silently changed behaviour, you can't reproduce the regression, and you can't even tell when it happened. Goldman Sachs and other named deployments run pinned for a reason. Pinning costs a one-line config change; not pinning costs investigation time that materialises weeks after the change that caused it.

Set minimumVersion in managed settings for the Claude Code CLI. Set ANTHROPIC_DEFAULT_SONNET_MODEL to a specific version string, not sonnet. Pin skills, MCP servers, and plugins to SHA or version tag in the marketplace. Then change those pins deliberately, on your terms, with a regression test running.

Principle 20 — Stage 5 (Distribution) is the team multiplier

Stage 4 produces individual mastery. Stage 5 — Distribution via a plugin marketplace — produces fleet capability. The difference is whether one team's discipline benefits every team that comes after it.

Spotify Honk's 650+ PRs/month, Box's 85% daily adoption, and the Anthropic-Accenture 30,000-developer rollout are all Stage 5 deployments — the maturity-model post names the full trajectory. The Stage 4 plateau is a real named anti-pattern: a team that masters subagents but never packages. Each new repo, each new team, starts from zero. Distribution fixes that.

Identify your three most-used skills, agents, or hooks. Sanitise them — remove personal references, hard-coded paths. Push them to a private marketplace repo, tag v1.0.0, and add to extraKnownMarketplaces for one other team. The first handoff is the hardest; after that the motion becomes automatic.

---

CLAUDE.md is an index, hooks are for incidents, skills live by their description, subagents isolate noise, pin everything, and Stage 5 is where one team's discipline starts benefiting the next team.

Part 4 covers principles 21–25: the governance and safety layer.

---

Series Navigation — The 30 Principles for Agentic Engineering

Part 3 of 5. Six principles. All operational. Spend a quiet afternoon on these once and you'll save weeks downstream.

---

Series Navigation — The 30 Principles for Agentic Engineering

The 30 Principles for Agentic Engineering — Part 3: The Harness

Principle 15 — `CLAUDE.md` under 200 lines, always

Principle 16 — Hooks for invariants that have caused real incidents

Principle 17 — Skills auto-invoke based on description matching

Principle 18 — Subagents have isolated context; never let them recurse

Principle 19 — Pin everything

Principle 20 — Stage 5 (Distribution) is the team multiplier

Related

Standardise the Harness, Customise the Work: The 5-Layer Agent Architecture

From Solo Tool to Team Infrastructure: Scaling Gluon for Production

The 15-Tool-Call Rule: Where Agent Quality Falls Off a Cliff

The 30 Principles for Agentic Engineering — Part 3: The Harness

Principle 15 — `CLAUDE.md` under 200 lines, always

Principle 16 — Hooks for invariants that have caused real incidents

Principle 17 — Skills auto-invoke based on description matching

Principle 18 — Subagents have isolated context; never let them recurse

Principle 19 — Pin everything

Principle 20 — Stage 5 (Distribution) is the team multiplier

Related

Standardise the Harness, Customise the Work: The 5-Layer Agent Architecture

From Solo Tool to Team Infrastructure: Scaling Gluon for Production

The 15-Tool-Call Rule: Where Agent Quality Falls Off a Cliff

Practical AI engineering, in your inbox

Related

Standardise the Harness, Customise the Work: The 5-Layer Agent Architecture

From Solo Tool to Team Infrastructure: Scaling Gluon for Production

The 15-Tool-Call Rule: Where Agent Quality Falls Off a Cliff

Practical AI engineering, in your inbox

Related

Standardise the Harness, Customise the Work: The 5-Layer Agent Architecture

From Solo Tool to Team Infrastructure: Scaling Gluon for Production

The 15-Tool-Call Rule: Where Agent Quality Falls Off a Cliff