The 30 Principles for Agentic Engineering — Part 1: The Kernel
$ grep -n "^##" 2026-05-thirty-principles-agentic-engineering-part-1-kernel.md>
Most teams that adopt agentic tools plateau around 20% productivity gain. A few reach 50% and keep climbing. The gap isn't model quality, budget, or headcount — it's five foundational decisions that either compound or collapse everything built on top of them.
This is a five-part reference series covering thirty principles across kernel, lifecycle, harness, governance, and calibration. Part 1 is the kernel — the five that every other principle leans on. Get these wrong and the remaining twenty-five don't recover the loss. Get them right and you'll outperform most teams that have done the other twenty-five but skipped these five.
Part 1 of 5.
Principle 1 — Standardise the harness, customise the work
The five-layer harness — CLAUDE.md → Hooks → Skills → Subagents → Plugins — is the same across every team that ships at scale. Don't reinvent it. Adopt it, then put your work inside it.
Three independent open-source extractions (LangChain Deep Agents, AgentScope, DeerFlow 2.0) converge on the same shape. The case-study evidence shows nine of twelve production teams running a recognisable variant. Time spent designing a bespoke harness is time spent re-deriving an answer that already exists.
The failure mode I see most often: a team spends three weeks building a custom orchestration layer, discovers it has the same problems every custom orchestration layer has, and then begrudgingly arrives at something that looks like the five-layer harness anyway — but now they own the maintenance. Adopt the five folders in order: CLAUDE.md first, then hooks/, then skills/, then agents/, then plugins/. Customise what goes in the harness. Treat any urge to invent a sixth layer as a smell worth investigating.
I run that same five-folder skeleton across every project I touch — Gluon, scraper-mcp, dagentic — and the standardisation is the whole point: I can drop into any of them and have an agent productive in minutes, because the harness never changes between them. The variety lives in the work, not the scaffolding.
Principle 2 — Verification is the load-bearing primitive
Wire a Stop hook that runs verify.sh — typecheck, lint, tests, audit — after every agent edit. If anything fails, the agent keeps working. This single configuration returns more value than any other architectural choice.
Anthropic's 2026 Agentic Coding Trends Report names verification "the new bottleneck." Every credible architecture has an independent verification primitive — DeerFlow's validation loop, AgentScope's Service Toolkit, the published /ultrareview reporter→verifier patterns.
Without this, you're paying full agent cost for probabilistic compliance. The agent ran. It probably succeeded. You won't know until the incident.
The implementation is three commands in a shell script:
bun run lint && bun run typecheck && bun test
Add the Stop hook in .claude/settings.json with exit_on_error: true. Test by deliberately breaking a test and watching the verifier catch it. If you only implement one thing from this series, make it this.
Principle 3 — Plan mode default for non-trivial work
For any task of three or more steps — or any architectural decision — the agent produces a written plan and gets human approval before executing. No exceptions.
Magentic-UI's published evaluation showed +71% effectiveness from human-in-the-loop gating at the planning stage. The mechanism is straightforward: an agent that plans and waits doesn't start executing a misunderstood requirement for twenty minutes before you notice the wrong file is changing. Two minutes of reading a plan saves hours of untangling.
Add this to CLAUDE.md:
For any task of 3+ steps or architectural decisions, enter plan mode first. Write the plan to
tasks/todo.md. Get human approval before executing.
One reframe that changed how I think about this: when an agent ships something you didn't want, the root cause is almost always a planning failure, not an intelligence failure. The agent was capable of the right answer — it just never got a chance to check whether it was building toward it.
Principle 4 — Pick the cheapest, most-deterministic layer that solves the problem
The five-layer harness gives you choices. Prefer hooks over skills, skills over subagents, subagents over plugins. Use higher layers only when lower ones cannot reach.
The default failure mode at every team that adopts the harness is over-engineering — building subagents when a hook would suffice, building plugins when a rules/<topic>.md note would do. Higher layers cost more in tokens, complexity, and debugging surface.
The discipline is an audit:
- For each subagent in
.claude/agents/: "Could a skill do this?" - For each skill in
.claude/skills/: "Could a hook do this?" - For each block in
CLAUDE.md: "Couldrules/<topic>.mddo this and stay path-scoped?"
Most teams that run this audit find they can collapse 30–40% of their harness complexity without losing any capability. The remaining complexity is the complexity that's actually earning its place.
Principle 5 — Reflect after every task
At the end of each ticket, the agent updates tasks/lessons.md with: (a) any non-obvious pattern it learned, (b) any near-miss, (c) anything that should graduate to rules/*.md next quarter.
The case-study data is stark. Teams without a reflection discipline plateau at roughly 20% productivity gain. Teams with one reach 50% and beyond. The mechanism is structural: each session forgets what the last session learned. lessons.md is the cheapest piece of memory a team can install above the per-session context window — and the seed of every future rules/*.md file.
Create tasks/lessons.md. Add this to CLAUDE.md:
After completing a ticket, append to
tasks/lessons.mdwith: (a) what non-obvious pattern you learned, (b) any near-miss, (c) anything that should graduate torules/*.md.
Then block 30 minutes quarterly to read lessons.md and promote the patterns worth keeping. That one calendar slot is where the 20% teams separate from the 50% ones.
Standardise the harness, make verification load-bearing, plan before acting, choose the cheapest layer, reflect every time. If you apply only five principles from the entire series, these are the ones — everything else runs on top of them.
Part 2 covers principles 6–14: the lifecycle that runs on top of the kernel.
Series Navigation — The 30 Principles for Agentic Engineering
- Part 1: The Kernel (you are here)
- Part 2: The Lifecycle
- Part 3: The Harness
- Part 4: Governance and Safety
- Part 5: Calibration and Reality
The Cutler.sg Newsletter
Weekly notes on AI, engineering leadership, and building in Singapore. No fluff.
The 30 Principles for Agentic Engineering — Part 2: The Lifecycle
Principles 6–14. How work moves through an agentic engineering team: the ticket as contract, AI distillation with human curation, three gates, verification before done, characterisation tests, the 1.2× capacity rule, the J-curve, and telemetry.
The 30 Principles for Agentic Engineering — Part 5: Calibration and Reality
Principles 26–30. The calibration layer that catches what the rest of the framework would miss: a PR-noise budget, independent verification, model-swap regression discipline, the 15-tool-call rule, and protecting junior development.
The 5-Stage Maturity Model for AI-Augmented Engineering Teams
Most teams plateau at Stage 2 because they confuse 'we built skills' with 'we have a working AI engineering culture.' Here's the 5-stage diagnostic — and the moves that get you from Individual to Distributed.