The 30 Principles for Agentic Engineering — Part 1: The Kernel
This is a five-part reference series. Thirty principles, grouped into five categories — kernel, lifecycle, harness, governance, calibration. Each principle is a unit on its own: read in any order, skip what doesn't apply.
The principles aren't speculation. They're the cross-references between the five-layer harness, the five-stage maturity model, and roughly twelve months of published research, case studies, and field reports collated through May 2026.
Part 1 covers the kernel — the five principles every other principle leans on. If you do these five and nothing else, you'll outperform most teams that have done the other twenty-five but skipped these.
Principle 1 — Standardise the harness, customise the work
Statement. The five-layer harness (CLAUDE.md → Hooks → Skills → Subagents → Plugins) is the same across every team that ships at scale. Don't reinvent it. Adopt it.
Why it matters. Three independent open-source extractions (LangChain Deep Agents, AgentScope, DeerFlow 2.0) converge on the same shape. The case-study evidence shows nine of twelve production teams running a recognisable variant. Time spent designing a bespoke harness is time spent re-deriving an answer that already exists.
Tomorrow morning.
- Adopt the five folders in order:
CLAUDE.mdfirst, thenhooks/, thenskills/, thenagents/, thenplugins/. - Customise what goes in the harness, not the harness itself.
- Treat any urge to invent a sixth layer as a smell.
Principle 2 — Verification is the load-bearing primitive
Statement. Wire a Stop hook that runs verify.sh (typecheck + lint + tests + audit) after every agent edit. If anything fails, the agent must keep working. This single configuration returns more value than any other architectural choice.
Why it matters. Anthropic's 2026 Agentic Coding Trends Report names verification "the new bottleneck." Every credible architecture has an independent verification primitive — DeerFlow's validation loop, AgentScope's Service Toolkit, the published /ultrareview reporter→verifier patterns. Skip this and you are paying full agent cost for probabilistic compliance.
Tomorrow morning.
- Create
tools/verify.shrunningbun run lint && bun run typecheck && bun test. - Add the
Stophook in.claude/settings.jsonwithexit_on_error: true. - Test by deliberately breaking a test and watching the verifier catch it.
Principle 3 — Plan mode default for non-trivial work
Statement. For tasks of three or more steps, or any architectural decision, the agent produces a written plan and gets human approval before executing. No exceptions.
Why it matters. Magentic-UI's published evaluation showed +71% effectiveness from human-in-the-loop gating at the planning stage. The lifecycle-pattern data is consistent: agents that produce a written plan and get approval before executing succeed at materially higher rates than agents that go straight to code. The cost is two minutes of human attention. The yield is a fraction of the post-merge debugging.
Tomorrow morning.
- Add to
CLAUDE.md: "For any task of 3+ steps or architectural decisions, enter plan mode first. Write the plan totasks/todo.md. Get human approval before executing." - Reinforce by demoing this discipline in standups.
- Treat any "the agent shipped something I didn't want" incident as a planning failure, not an intelligence failure.
Principle 4 — Pick the cheapest, most-deterministic layer that solves the problem
Statement. The five-layer harness gives you choices. Prefer hooks over skills, skills over subagents, subagents over plugins. Use higher layers only when lower ones cannot reach.
Why it matters. The default failure mode at every team that adopts the harness is over-engineering — building subagents when a hook would suffice, building plugins when a rules/<topic>.md note would do. Higher layers cost more in tokens, complexity, and debugging surface. The discipline is to reach for the lowest layer that solves the problem at the required determinism level.
Tomorrow morning.
- Audit
.claude/agents/. For each subagent, ask: "Could a skill do this?" - Audit
.claude/skills/. For each skill, ask: "Could a hook do this?" - Audit
CLAUDE.md. For each block, ask: "Couldrules/<topic>.mddo this and stay path-scoped?"
Principle 5 — Reflect after every task
Statement. At the end of each ticket, the agent updates tasks/lessons.md with: (a) any non-obvious pattern it learned, (b) any near-miss, (c) anything that should graduate to rules/*.md next quarter.
Why it matters. The case-study evidence is consistent: teams without a reflection discipline plateau at roughly 20% productivity gain; teams with one reach 50% and beyond. The mechanism is structural — each session would otherwise forget what the last session learned. lessons.md is the cheapest piece of memory a team can install above the per-session context window. It is also the seed of every future rules/*.md file.
Tomorrow morning.
- Create
tasks/lessons.mdin your project. - Add to
CLAUDE.md: "After completing a ticket, append totasks/lessons.mdwith: (a) what non-obvious pattern you learned, (b) any near-miss, (c) anything that should graduate torules/*.md." - Block a 30-minute recurring slot quarterly to read
lessons.mdand promote patterns.
The kernel in one line
Standardise the harness, make verification load-bearing, plan before acting, choose the cheapest layer, reflect every time. Five rules. If you only ever apply five principles from this series, apply these.
Part 2 covers principles 6–14: the lifecycle that runs on top of the kernel.
Series Navigation — The 30 Principles for Agentic Engineering
- Part 1: The Kernel (you are here)
- Part 2: The Lifecycle
- Part 3: The Harness
- Part 4: Governance and Safety
- Part 5: Calibration and Reality
The Cutler.sg Newsletter
Weekly notes on AI, engineering leadership, and building in Singapore. No fluff.
The 30 Principles for Agentic Engineering — Part 2: The Lifecycle
Principles 6–14. How work moves through an agentic engineering team: the ticket as contract, AI distillation with human curation, three gates, verification before done, characterisation tests, the 1.2× capacity rule, the J-curve, and telemetry.
The 30 Principles for Agentic Engineering — Part 5: Calibration and Reality
Principles 26–30. The calibration layer that catches what the rest of the framework would miss: a PR-noise budget, independent verification, model-swap regression discipline, the 15-tool-call rule, and protecting junior development.
The 5-Stage Maturity Model for AI-Augmented Engineering Teams
Most teams plateau at Stage 2 because they confuse 'we built skills' with 'we have a working AI engineering culture.' Here's the 5-stage diagnostic — and the moves that get you from Individual to Distributed.