Protect the Juniors: Cognitive Debt and the Stack Overflow Collapse
The Wall in Week 8
By week seven, the team was flying. A student project, AI-accelerated, shipping features faster than anything Professor Margaret-Anne Storey had supervised in a decade. Then, by week eight, they were paralysed. Not by bugs. Not by a server outage. By the silence in the room when she asked them to walk through a simple change.
In her essay on cognitive debt, Storey describes the moment with unsettling clarity: "By weeks 7 or 8, one team hit a wall. They could no longer make even simple changes without breaking something unexpected… no one on the team could explain why certain design decisions had been made or how different parts of the system were supposed to work together… They had accumulated cognitive debt faster than technical debt, and it paralyzed them."
The code wasn't messy. The architecture wasn't the problem. The problem was that the theory of the system — the shared mental model of why anything had been built the way it had — never formed in the first place. The code shipped fast for two months. The understanding never did.
This is not a student-team story. It is a population-scale story unfolding under us, and most of us think we know what it looks like. The honest data is sharper than that. Sharper in the direction that matters.
The Metric You Thought You Saw
You have probably read the line: Stack Overflow visits dropped from 200,000 to 3,607 a month. It has been quoted in keynote decks and podcast intros for the better part of a year. It is also wrong — wrong in the way precise people care about.
The 200,000 figure is questions per month at the 2014 peak, not visits. The forum still serves millions of monthly readers consulting the archive. What collapsed is the question volume. DevClass put December 2025 at around 3,862 questions — a 78% year-on-year decline and roughly 97% off the 2014 high. Gergely Orosz called it "as low as when Stack Overflow launched in 2009." Stack Overflow cut 28% of its staff in October 2023. The archive is alive; the contribution layer is dead.
The other line worth correcting: the "40% senior speedup with rigorous review" figure circulating in practitioner threads is not a senior-specific number. It is a gross task-time speedup pulled out of mixed-seniority lab studies. The real signal from Cui et al.'s Microsoft and Accenture field experiments (1,974 developers) points the other way: juniors gain more. Seniors approach zero. METR measured experienced developers 19% slower with AI while they felt 20% faster — the same perception-reality gap I have written about elsewhere in this series.
The structural inversion the data actually supports is uncomfortable: AI makes junior output look senior-level. The skill formation underneath is what's at risk. Once the numbers line up, a sharper distinction emerges — and it has been in the cognitive-science literature for twenty-five years.
Atrophy vs Foreclosure
Two failure modes look identical on a dashboard. They are structurally different.
Atrophy is a skill you built and are losing through disuse. The senior engineer who hasn't written a regex from scratch in two years. The muscle is dormant, not gone. A weekend of practice brings it back. Reversible.
Foreclosure is a skill that never gets built at all. Much harder to reverse, because the substrate to build on never formed. You can't refresh a map you never drew.
Maguire et al. (PNAS, 2000) measured the hippocampi of London taxi drivers — the ones who had memorised "the Knowledge," 25,000 streets and 100,000 landmarks. The posterior hippocampus was measurably larger than in controls. Hippocampal volume correlated with years driving. The brain physically adapted to the skill being practised. It is one of the most-cited findings in cognitive neuroscience.
The cohort that learned to drive after Google Maps never built that substrate. They reach the destination just as reliably. The internal map is never drawn.
That is the spine of this argument. AI makes junior output look senior-level while preventing junior skill from forming. The output looks right. The map underneath was never drawn. And if atrophy is the senior risk, foreclosure is the junior risk — and the question becomes what previously caught the foreclosure, and what we just lost.
Rendering diagram...
Atrophy is the upper-right risk; reversible. Foreclosure is the lower-right risk; structurally harder to undo, because the substrate to build on never formed.
The Collapsed Learning Layer
Stack Overflow was never primarily a place to get an answer. The cognitive value was the wrestling. Formulating a precise question. Reading three wrong answers and noticing why they were wrong. Watching senior engineers correct each other in public over a typo in a regex. That whole social verification layer is now gone for juniors.
AI returns an answer, not a thread. There is no wrong-answer-then-correction arc to read. There is no public formulation of the problem. There is no community signal that says "this question is malformed, here's how to think about it." The strain has been engineered out.
Cal Newport, writing about exactly this: "In a learning environment, the feeling of strain is often a by-product of getting smarter. To minimize this strain is like using an electric scooter to make the marches easier in military boot camp; it will accomplish this goal in the short term, but it defeats the long-term conditioning purposes of the marches."
Strain is the learning. AI removes the strain. The march finishes. The legs don't develop.
This is what was unfolding in week eight of Storey's student project. No public formulation. No peer correction. No wrestling — just confident, plausible code that ran until it didn't. The forum was the substrate, and the substrate is gone. And then a single Anthropic paper, published in January, made the mechanism measurable.
AI-Enhanced Productivity Is Not a Shortcut to Competence
Shen and Tamkin (arXiv:2601.20245), Anthropic, January 2026. A randomised controlled trial: 52 junior software engineers learning a new Python async library, randomly assigned to AI-assisted or no-AI conditions. The headline finding is that AI use impaired conceptual understanding, code reading, and debugging — without delivering significant efficiency gains on average.
The split inside the AI group is the part that matters.
Passive delegators — the engineers who asked AI to generate code — scored 24–39% on the comprehension quiz. Active explainers — the engineers who asked AI to explain concepts rather than write code — scored 65–86%. Same tool. Same library. Forty-point gap.
The paper's verbatim conclusion is the line worth carrying around: "AI-enhanced productivity is not a shortcut to competence."
This is not "AI is bad for juniors." It is something narrower and more useful. The mode of use determines whether the tool builds skill or forecloses it. Same tool, two trajectories, depending on whether the engineer asks it to think with them or for them. Fifty-two engineers is a small sample and the longitudinal question — does the gap close or widen over five years? — is still open. But the structural coherence of the finding, sitting on top of Maguire and Storey and the collapsed learning layer, is what makes it land.
Which leaves the only question worth asking: what does mode-of-use look like in practice, on a real team, this quarter?
Three Things That Work
I have tried versions of these on the teams I run. They cost nothing. None of them require a budget line. The first one starts tomorrow.
1. Implement twice. Addy Osmani's recommendation, verbatim: "Implement projects twice, once with AI, once without, and compare." And: "occasionally disable your AI helper and write key algorithms from scratch." Most teams will not adopt this for production work. As an onboarding protocol or a side-project habit, it is the closest thing to deliberate practice we have for this era. The second pass is the calibration. It shows the junior what their hands knew versus what the AI knew. That comparison is the learning.
2. Default L1/L2 to Socratic mode. A system-prompt tweak, no more. The AI is told, up front, that it must not rewrite the code. It must point at the line, explain the conceptual flaw, and ask the engineer a question. This costs nothing to deploy and maps directly onto what the Shen and Tamkin high-scorers were doing organically. For juniors, this should be the default AI mode for the first two years on the team, not the exception. Make it the path of least resistance; the foreclosure stops.
3. Seniority-gated review where the author has to explain the code. The reviewer's first question for any L1/L2 pull request is always the same: walk me through this block. If the author cannot, the PR does not merge. This is not a new rule. It is the rule code review was supposed to be. AI just made it possible to skip without anyone noticing. Reinstating it is free; you just have to mean it. Or, as Osmani put the same point: "If the original author can't explain why the code works, how will the on-call engineer debug it at 2 AM?"
Which brings me to the post I wrote six weeks ago. And the contradiction I owe the reader an answer to.
Two True Things
I wrote AI as the Great Equaliser in April. It argued that AI is the first accommodation neurodivergent professionals can access privately, without disclosure — a tool of agency, dignity, and inclusion. I stand by every word.
This post argues that AI is foreclosing skill formation in a generation of juniors. I stand by every word of this one too.
Both can be true because the tool isn't the variable. The practice around the tool is the variable. AI used to translate ability — to take the idea you already had and clear the path between your head and the workplace — is liberating. AI used to substitute for the cognitive work that builds the ability in the first place is corrosive. Same tool. Two completely different effects, depending on which side of the foundational-skill line the user is standing on.
The three interventions above are how you keep that line visible. For the juniors you hire. For the ones you mentor. For the engineer you used to be — the one who wrote the bad regex four times before getting it right, and who is now the senior because of those four wrong attempts.
I'm the senior. I'm the one teaching. So is anyone reading this who manages a team in 2026. The next post in this series will look at the five-layer harness — the deterministic scaffolding that gives all of this somewhere to live. That is the systems side. This is the human side of what the 5-Step Loop post named at the architecture level. The verification gap isn't only inside the agent. It is inside the people who used to be the verification.
The wall in week eight was never about the code. It was about what the code had cost the people writing it.
Protect that, and the rest follows.
The Cutler.sg Newsletter
Weekly notes on AI, engineering leadership, and building in Singapore. No fluff.
It's been a while...
Left BSkyB to co-found TUMRA, a data science startup, and been busy developing products while updating personal website
From Prompt Engineering to Context Engineering: The Skill Didn't Die. It Got Harder.
Prompt engineering was 2023's breakout job title and 2025's obituary. The discipline didn't die — it got a better name and a harder shape. Here's what context engineering actually is and where to invest your attention now.
1.2× Not 10×: The Honest Productivity Number Nobody's Publishing
GitHub said 55%. Then they ran the enterprise RCT and got 8.69%. Faros's two-year telemetry shows throughput up 66% and incidents up 243%. The honest net is 1.2–1.5×. Plan your team capacity accordingly.