The 30 Principles for Agentic Engineering — Part 2: The Lifecycle
Part 2 of the 30-principle reference. Where the kernel principles set the spine, the lifecycle principles describe how work actually moves through a team that has adopted them — from idea, through agent execution, to a merged PR that does what the ticket asked for.
Nine principles in this part. Read in order if you're operationalising the lifecycle for the first time. Skip to the one you're currently failing at if you're not.
Principle 6 — The ticket is the contract
Statement. The GitHub or Jira issue is the contract between human and agent. Implementation, testing, and PR description all reference the issue. Acceptance criteria are mandatory. The agent never free-interprets.
Why it matters. Every case study with measurable success — Cognition Devin, Spotify Honk, Anthropic-Accenture — runs work through an explicit ticket contract. The teams that struggle send the agent prompts and hope. The cost of writing a proper ticket is offset within the same session by the time you don't spend reconciling what the agent shipped against what you actually wanted.
Tomorrow morning.
- Update the ticket template: persona, problem, journey, acceptance criteria — all required.
- Add to
CLAUDE.md: "You implement against the ticket's acceptance criteria. Do not free-interpret. If AC is ambiguous, ask before acting." - PR template references the ticket and includes the AC checklist.
Principle 7 — Intake-as-AI-distillation with human curation
Statement. Unstructured input (transcripts, slides, notes) → AI distillation → proposed tickets → human review and refinement. Skip neither step.
Why it matters. Teams that skip AI distillation never get past 1× productivity — intake is where most of the time leaks. Teams that skip human curation ship the wrong things, fast. The leverage is in the combination: typically 3–5× faster intake than hand-writing tickets, with a quality bar set by the human reviewer rather than the model.
Tomorrow morning.
- Build or install a
/transcript-to-storiesskill. - Run it on your next meeting recording.
- Spend thirty minutes refining the proposed tickets. Time that against your usual intake.
Principle 8 — Humans gate at intake, irreversible actions, and merge
Statement. The lifecycle is a coroutine, not a checkpoint stream. Humans pause the loop at three places only: intake (turn proposals into approved tickets), irreversible actions (DB drops, prod deploys, payments), and merge.
Why it matters. Both extremes fail. "Approve every step" is too slow to be useful. "No gates anywhere" produces the Replit-style database-deletion stories. The three-gate model is the published convergent answer — Magentic-UI's +71% effectiveness from HITL came from gating at these specific points, not from gating more often.
Tomorrow morning.
- Add
permissions.denyfor irreversible actions:Bash(*production*),Bash(rm -rf:*),Bash(terraform destroy*),Edit(.env*). - Confirm PR review requires a human approval — no auto-merge.
- Confirm intake produces tickets a human approves before
/work-on-ticketruns.
Principle 9 — Verify before done; never mark complete without proof
Statement. "Done" means the test suite passes, the linter passes, the typecheck passes, and the acceptance criteria are demonstrably met. If any one fails, the task is not done.
Why it matters. The kernel principle (verification is load-bearing) gives you the technical mechanism. This principle is the cultural one. The most expensive incidents in agentic deployments come from agents declaring success on tasks they didn't actually complete — "done-by-vibes" — and reviewers nodding because the diff looks plausible. Tests are truth. Self-reported success is hypothesis.
Tomorrow morning.
- The
verify.shhook from Principle 2 is the implementation. - Add to
CLAUDE.md: "Verification before done. Runverify.sh. If anything fails, keep working. Do not mark complete until everything passes." - PR template includes a "verification output" section.
Principle 10 — AI-reviews-AI is not segregation of duties
Statement. Having Claude review Claude's PR is not independent review. For regulated industries — MAS, HKMA, EU AI Act — it fails segregation-of-duties controls.
Why it matters. Sibling agents share priors, share training, share blind spots. Auditors are not going to accept "an agent reviewed it" in 2026, and they should not be expected to. This principle has its own long-form treatment with the regulatory citations — but the operational rule is short.
Tomorrow morning.
- Use
/reviewand/ultrareviewas screening tools, not gates. - Humans merge. No exceptions for auth, payments, infra, DB migrations.
- Document the position in your
CLAUDE.mdand AppSec policy.
Principle 11 — Characterisation tests before brownfield changes
Statement. For legacy code, generate characterisation tests that capture current behaviour before the agent makes any change. Don't modify code that isn't covered.
Why it matters. The brownfield over-refactor anti-pattern — agent rewrites stable code into something subtly broken — is one of the most consistently reported failure modes. Feathers' 20-year-old technique is the cheap fix, and the agent itself is the perfect characterisation-test writer. The full tutorial walks through the prompt and the mutation-testing layer on top.
Tomorrow morning.
- Pick one fragile module.
- Use the agent to generate characterisation tests for current behaviour.
- Add
permissions.denyfor that module's directory until the tests are in place.
Principle 12 — Plan capacity at 1.2–1.5× net, not 2×
Statement. Gross throughput improves roughly 2×, but net of downstream incident cost, real delivery improvement is 1.2–1.5×. Vendor 5–10× claims survive only by ignoring what shipped.
Why it matters. METR's −19% productivity finding, GitClear/Faros at 54% more bugs and 242.7% more incidents per PR, Veracode/Sherlock at 45–92% vulnerability rates in agent-generated code — all converge on the same picture. The published case for 1.2× is at its own dedicated post. The operational rule: plan capacity at the net figure, not the gross one.
Tomorrow morning.
- Plan team capacity at 1.2–1.5× net for the next quarter.
- Measure both throughput (PRs/week) and incident rate (post-merge defects per PR).
- If incident rate exceeds 2× pre-adoption baseline, pause the rollout and debug the harness before adding more agents.
Principle 13 — Plan for the J-curve
Statement. Productivity drops in weeks 4–8 of adoption — METR documented a 19% slowdown — before turning up. Don't pivot during the dip.
Why it matters. The week-6 abandonment pattern is one of the most expensive mistakes in agentic rollouts: leadership concludes the tool isn't working four weeks before it would have turned positive. Section, Spotify, and Box all reported the same dip in their initial rollouts. The productivity J-curve post is the dedicated treatment.
Tomorrow morning.
- Brief leadership on the J-curve before adoption begins.
- Set the success-measurement window at month 3, not week 6.
- During the dip, look at adoption percentage and qualitative signals — not throughput.
Principle 14 — Operate phase: OTEL or you're flying blind
Statement. Pipe Claude Code's OpenTelemetry traces to a shared collector. Track cost per ticket, verifier catch rate, PR cycle time. Without telemetry, you cannot tell adoption from runaway.
Why it matters. The teams that publish defensible ROI numbers — Section, Box, Spotify — all wired OTEL early. The teams that argue about whether the tool is working have anecdotes instead of data. Telemetry is the cheapest piece of governance you can install and the most under-invested.
Tomorrow morning.
- Set
CLAUDE_CODE_ENABLE_TELEMETRY=1and the OTEL endpoint in managed settings. - Stand up the AWS OpenTelemetry Collector or LangSmith / Langfuse.
- Build one dashboard: cost per developer per day, cache-hit rate, percentage of PRs agent-built. Refresh quarterly.
The lifecycle in one line
Tickets are contracts, AI distils intake under human curation, humans gate at three points, verification is the proof of done, characterisation tests defend the past, plan for 1.2×, expect the J-curve, watch the telemetry.
Part 3 covers principles 15–20: the harness layer that makes the lifecycle cheap.
Series Navigation — The 30 Principles for Agentic Engineering
- Part 1: The Kernel
- Part 2: The Lifecycle (you are here)
- Part 3: The Harness
- Part 4: Governance and Safety
- Part 5: Calibration and Reality
The Cutler.sg Newsletter
Weekly notes on AI, engineering leadership, and building in Singapore. No fluff.
The 30 Principles for Agentic Engineering — Part 1: The Kernel
Principles 1–5. The five rules that everything else in the framework rests on: standardise the harness, make verification load-bearing, default to plan mode, pick the cheapest layer, reflect every task.
The 30 Principles for Agentic Engineering — Part 5: Calibration and Reality
Principles 26–30. The calibration layer that catches what the rest of the framework would miss: a PR-noise budget, independent verification, model-swap regression discipline, the 15-tool-call rule, and protecting junior development.
The 5-Stage Maturity Model for AI-Augmented Engineering Teams
Most teams plateau at Stage 2 because they confuse 'we built skills' with 'we have a working AI engineering culture.' Here's the 5-stage diagnostic — and the moves that get you from Individual to Distributed.