From Prompt to Context Engineering: Why the Skill Got Harder

The $335k That Broke Twitter

In March 2023, a single job posting on the Anthropic careers page broke the internet. "Prompt Engineer and Librarian." Salary band: $175,000 to $335,000. By April, searches for "prompt engineer" on Indeed had hit their all-time peak. Over the next three months, LinkedIn searches for the same term spiked 7,000 percent.

Here is what nobody put in the headlines. Two years later, LinkedIn profiles listing "prompt engineer" in the title had cratered. A bootcamp graduate sent 200 applications over six months and landed three interviews and zero offers. The Wall Street Journal ran the piece under "Talking to Chatbots Is Now a $200K Job." Sam Altman had said, the year before: "I don't think we'll be doing prompt engineering in five years." He was wrong about the timeline. The job title was dead in two.

If you were paying attention, you noticed something strange. The title didn't die. It migrated.

A Discipline Named for Its Most Visible Technique

The job title referred to the most visible ten percent of the work — the words you typed. The other ninety percent was invisible: the examples, the structure, the retrieval, the scaffolding the model actually saw before it generated a single token.

Look at the technical papers and the direction was obvious from the start. Chain-of-Thought Prompting, Google Brain, January 2022, was already asking what gets shown to the model, not how to phrase the request. Eight exemplars of step-by-step reasoning turned a 540-billion-parameter model into state-of-the-art on GSM8K, beating fine-tuned GPT-3 with a verifier. ReAct, Princeton and Google, October 2022 — the first widely-cited agent pattern — interleaved reasoning traces with tool outputs. That's not a prompt. That's an information architecture. By May 2023, Tree of Thoughts took GPT-4 on the Game of 24 from four percent to seventy-four percent — not by finding magic words, but by structuring the search space the model navigated.

Anthropic's own 2023 prompting guides were already recommending XML tags, scratchpads, and placing instructions at the end of long prompts for better recall. That is context architecture, dressed as prompt advice.

Simon Willison saw this coming and defended the discipline in February 2023. "The best prompt engineers are meticulous," he wrote. "They iterate on their prompts and try to figure out exactly which components are necessary." By June 2025, he had flipped. Prompt engineering, he conceded, had become "a laughably pretentious term for typing things into a chatbot."

The skill didn't die. It got harder. It had always been harder than the name suggested — the name was just finally starting to break.

The Frankenstein Prompt Ceiling

Here is how it breaks in practice.

Hamel Husain has been consulting on production LLM systems since GPT-3. In the Applied LLMs guide he co-authored in June 2024, he documented a real case — Rechat's AI assistant, Lucy. Rapid early progress from prompting. Then a plateau. Then a hard ceiling. Fix one failure mode, another pops up. Whack-a-mole. His quote is the gravestone of the old paradigm: "Our initially simple prompt is now a 2,000 token Frankenstein. And to add injury to insult, it has worse performance on the more common and straightforward inputs."

That is not a writing problem. It is an architectural problem, and the research now explains it cleanly.

Start with Lost-in-the-Middle, Liu et al., Stanford, published in TACL in 2024. In a 20-document question-answering task, accuracy drops more than 30 percent when the key information sits between positions 5 and 15, compared to positions 1 or 20. A U-shaped curve. Consistent across every model tested, including those explicitly designed for long contexts. An instruction buried in the middle of a big agent context is not being carefully weighed. It is being quietly ignored, because that is how transformer attention distributes itself.

Then Chroma's 2025 Context Rot study. Eighteen frontier models — GPT-4.1, Claude Opus 4, Gemini 2.5 Pro, Qwen3 — tested at every context increment. Every single one degraded as the window filled up. Not some. All. A 200K-window model can show significant degradation well before 50K.

Then chain-of-thought. Beautiful for single-inference tasks. Brutal for agent loops, because errors compound. At one percent error per step, an agent fails after an expected hundred steps. Real coding agents routinely run forty turns or more before they finish anything. Cognition measured Devin's behaviour and found coding agents spending over 60 percent of their first turn just retrieving context. The longer the run, the worse the failure mode: compounding error rates turn what looked like a linear cost curve into a cliff.

And finally the deepest break: prompts were designed for a clean slate. Agents inherit everything. Identical prompt, different context, catastrophically different behaviour. The prompt is no longer the variable. It was never really the variable.

Stanford shipped DSPy in October 2023 on this premise: "existing LM pipelines are typically implemented using hard-coded prompt templates, i.e. lengthy strings discovered via trial and error." Their solution was to automate the whole thing. The implicit argument is almost unkind: if prompting were tractable for complex pipelines, you would not need to automate it.

The Naming Moment — June 2025

The field needed a better name for what it was actually doing. Across two weeks of June 2025, it got one.

June 19. Tobi Lütke, CEO of Shopify, posts the sentence that starts the avalanche: "It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM." June 23. Harrison Chase, LangChain, publishes "The rise of context engineering", arguing the discipline is already the most important skill an AI engineer can develop. Same day, Lance Martin drops the four-practice taxonomy. June 25. Andrej Karpathy amplifies on X: "context engineering is the delicate art and science of filling the context window with just the right information for the next step... Doing this well is highly non-trivial." June 27. Simon Willison picks up Lütke's tweet in a long blog post and endorses the term. June 30. Philipp Schmid, at Google DeepMind, publishes the definition practitioners now quote: "Context Engineering is the discipline of designing and building dynamic systems that provides the right information and tools, in the right format, at the right time." And the diagnostic line that reframes every post-mortem you will ever run on an agent: "Most agent failures are not model failures anymore, they are context failures."

Eleven days. Four definitions. One discipline.

By July, Drew Breunig measured the search trend: "in a month, it's over a quarter of the search volume for 'prompt engineering'" — and climbing. Anthropic formalised the term on September 29, 2025 with a full engineering post on "attention budget" and "context rot" as first-class concepts. By January 2026, Chase was on Sequoia's Training Data podcast almost laughing at how cleanly the term fit: "Context engineering is such a good term. I wish I came up with that term. It actually really describes everything we've done at LangChain without knowing that that term existed."

What Context Engineering Actually Is

A name alone doesn't make a discipline. Here is what the work is.

mermaid


Rendering diagram...

Lance Martin's taxonomy is the cleanest mental model. Four practices. Write context — scratchpads, memory tools, NOTES.md files, cross-session memory. Select context — RAG, memory retrieval, tool selection, few-shot example retrieval. Compress context — summarisation, trimming, pruning. Isolate context — sub-agents with focused windows, sandbox environments, state schema boundaries. Karpathy's analogy sits underneath all of it: the LLM is the CPU, the context window is the RAM, and context engineering is the operating system that decides what gets loaded before each instruction runs.

Anthropic's own engineering post turned this into three concrete techniques. Compaction — when the window gets near capacity, summarise, restart with the summary plus the five most recently accessed files. Structured note-taking — demonstrated, hilariously, by Claude playing Pokémon and maintaining precise tallies across thousands of game steps ("for the last 1,234 steps I've been training my Pokémon in Route 1"). Sub-agents — specialised workers burning tens of thousands of tokens each, returning 1,000–2,000-token distillations to a coordinating parent.

Philipp Schmid's example is the clearest "before and after" I've seen. Same LLM. Same task. "Can you do a quick meeting tomorrow?" The cheap-demo agent — prompt only — replies: "Thank you for your message. Tomorrow works for me. May I ask what time?" Serviceable. Robotic. Useless if you have a calendar. The "magical" agent — the one built with context engineering — has already pulled in the calendar, the prior email thread, the contact's history, and the available tools. It replies: "Hey Jim! Tomorrow's packed on my end, back-to-back all day. Thursday AM free if that works for you? Sent an invite, lmk if it works."

The model didn't change. The magic was everything that happened before the model was called. Dex Horthy, whose 12-factor agents GitHub repo has quietly accumulated 19,400 stars, puts it flat: "Everything is context engineering. LLMs are stateless functions that turn inputs into outputs. To get the best outputs, you need to give them the best inputs."

The shift is from phrasing to architecture. From string to system.

The Career S-Curve

Which brings us to what the market has actually done with all of this. It is a three-act play.

Act 1, April 2023. Anthropic's posting. Indeed searches for prompt engineering peak. Over 250,000 LinkedIn postings mentioning "prompt engineer" in the United States at the height of the frenzy. DeepLearning.AI launches ChatGPT Prompt Engineering for Developers in May and gets 300,000 signups in under a week. The top-end salary number: $335,000. Hype, hardening into a job title.

Act 2, June 30, 2023. While the hype cycle is still peaking, Shawn "swyx" Wang publishes "The Rise of the AI Engineer". His thesis: "Emergent capabilities are creating an emerging title: to wield them, we'll have to go beyond the Prompt Engineer and write software." Within 24 hours, 1,000-plus pre-registrants sign up for the first AI Engineer Summit. Karpathy endorses. The inaugural summit sells out at 500 seats that October. By June 2024, the AI Engineer World's Fair hosts 3,000-plus engineers. The title quietly migrates while the obituary writers aren't watching. swyx's line ages beautifully: "Prompt Engineering was both overhyped and here to stay."

Act 3, 2025-2026. Fortune, Fast Company and the Wall Street Journal run the obituary wave in May 2025. Meanwhile — meanwhile — LinkedIn names "AI Engineer" the #1 fastest-growing U.S. job title for the second consecutive year. Postings up 143 percent year-over-year. LinkedIn profiles with "prompt engineer" in the title fall 40 percent from mid-2024 to early 2025. PwC's 2025 Global AI Jobs Barometer finds AI-skilled workers command a 56 percent wage premium — up from 25 percent the year before. The premium doubled.

And the financial proof is almost embarrassing. Cursor — a product that is fundamentally a context engineering system, auto-injecting codebase structure and recent changes before each model call — went from $500 million ARR in June 2025 to $1 billion in November to $2 billion by February 2026. The context windows that sit beneath all this went from 4K tokens at ChatGPT's launch to 200K with Claude 2.1 to 1 million with Gemini 1.5 Pro in February 2024 to ten million with Llama 4 Scout. Anthropic's Model Context Protocol, launched November 2024, now has somewhere between 5,000 and 10,000 community servers in an ecosystem that didn't exist 15 months ago. Gartner expects 40 percent of enterprise applications to include task-specific AI agents by end of 2026 — up from under 5 percent in 2025.

Every one of those agents is context engineering, in production, whether the people building them call it that or not.

The skill didn't die. It got harder. And the market has been putting real money behind that observation for eighteen months.

Where to Invest Now

So: what to do Monday morning.

What to study. Five resources that will repay your time more than any prompt engineering course ever did. Lance Martin's four-practice taxonomy — Write, Select, Compress, Isolate — is the fastest way to get a mental model. Anthropic's "Effective context engineering for AI agents" is the formal engineering manual, attention budget and context rot included. Dex Horthy's 12-factor agents is the practitioner's operating manual for owning your context window rather than inheriting a framework's. Drew Breunig's failure taxonomy — Poisoning, Distraction, Confusion, Clash — gives you a shared vocabulary for debugging what just broke. And Hamel Husain on evals, because context engineering without measurement is cargo cult with extra steps.

What the posture shift is. Stop obsessing over phrasing. Obsess over architecture. Stop writing monster system prompts. Build systems that assemble clean context at runtime. Stop asking "what should I tell the model?" Start asking "what should the model see?" The answer is almost always less than you think — Anthropic's own guidance is to find the smallest possible set of high-signal tokens that still gets the job done.

The honest complication. Gary Marcus would point out, not unreasonably, that both prompt engineering and context engineering are elaborate workarounds for systems that don't reliably understand what they're doing — "patches of competence separated by regions of incompetence." He's not entirely wrong. The engineering discipline is legitimate and well-paid and increasingly necessary. The underlying reliability problem is also real and unsolved. Both things are true at the same time. You can build a career in the gap.

Ethan Mollick put it better than I can, in January 2026: "The skills that are so often dismissed as 'soft' turned out to be the hard ones." Knowing what good looks like. Explaining it clearly enough that even an AI can deliver it. Curating what your systems see. Measuring whether they actually did the thing. None of that fits on a resume line that says "prompt engineer." All of it is the work.

The skill didn't die. It got harder. Invest accordingly.

The $335k That Broke Twitter

If you were paying attention, you noticed something strange. The title didn't die. It migrated.

A Discipline Named for Its Most Visible Technique

The skill didn't die. It got harder. It had always been harder than the name suggested — the name was just finally starting to break.

The Frankenstein Prompt Ceiling

Here is how it breaks in practice.

That is not a writing problem. It is an architectural problem, and the research now explains it cleanly.

The Naming Moment — June 2025

The field needed a better name for what it was actually doing. Across two weeks of June 2025, it got one.

Eleven days. Four definitions. One discipline.

What Context Engineering Actually Is

A name alone doesn't make a discipline. Here is what the work is.

mermaid


Rendering diagram...

The shift is from phrasing to architecture. From string to system.

The Career S-Curve

Which brings us to what the market has actually done with all of this. It is a three-act play.

Every one of those agents is context engineering, in production, whether the people building them call it that or not.

The skill didn't die. It got harder. And the market has been putting real money behind that observation for eighteen months.

Where to Invest Now

So: what to do Monday morning.

The skill didn't die. It got harder. Invest accordingly.

Michael Cutler

From Prompt Engineering to Context Engineering: The Skill Didn't Die. It Got Harder.

The $335k That Broke Twitter

A Discipline Named for Its Most Visible Technique

The Frankenstein Prompt Ceiling

The Naming Moment — June 2025

What Context Engineering Actually Is

The Career S-Curve

Where to Invest Now

The Cutler.sg Newsletter

Protect the Juniors: Cognitive Debt and the Stack Overflow Collapse

Manager Mode: When AI Does the Work, Everyone Becomes Middle Management

The 30 Principles for Agentic Engineering — Part 5: Calibration and Reality

From Prompt Engineering to Context Engineering: The Skill Didn't Die. It Got Harder.

The $335k That Broke Twitter

A Discipline Named for Its Most Visible Technique

The Frankenstein Prompt Ceiling

The Naming Moment — June 2025

What Context Engineering Actually Is

The Career S-Curve

Where to Invest Now

The Cutler.sg Newsletter

Protect the Juniors: Cognitive Debt and the Stack Overflow Collapse

Manager Mode: When AI Does the Work, Everyone Becomes Middle Management

The 30 Principles for Agentic Engineering — Part 5: Calibration and Reality