OpenAI AgentKit: Strategic Timing or Playing Catch-Up?

The Setup

October 6, 2025. Sam Altman announces OpenAI's "something new" — AgentKit and Agent Builder. Half my timeline rolled its eyes. Agents? Seriously? We've been gluing those together with LangChain for two years.

I had a different reaction, because I've built the thing AgentKit competes with. Gluon, my open-source multi-agent orchestrator. Dagentic, a serverless agent framework. I run four or five Claude Code agents across separate projects most working days, which means I keep making the unglamorous decision nobody puts in a keynote: which orchestration layer do I reach for this time, and what does it cost me when I'm wrong.

So I didn't watch the keynote wondering whether OpenAI was late. I watched it wondering which problem they'd decided to solve — because the part everyone demos is the part I worry about least.

The stakes are real. The agentic AI market is projected to grow from $7.06 billion in 2025 to $93.20 billion by 2032, a 44.6% CAGR, with 33% of enterprise software incorporating agentic AI by 2028, up from less than 1% today. That size, that fast, with no settled standard, is exactly the kind of thing OpenAI has eaten before.

What AgentKit actually ships

Strip the keynote gloss off and AgentKit is a bet that the agent ecosystem's real pain isn't capability — it's assembly. That bet is correct, and I can tell you why from the receipts. When I built Gluon, almost none of the hard work was the agent reasoning; the model could already reason. The work was everything around it: wiring tools, threading state between steps, building the eval harness to tell whether a change made things better or just different, standing up the deployment path. A working agent today means LangChain for the logic, something like Zapier or n8n for integrations, a homegrown eval rig, and a deployment pipeline you maintain yourself. Four things that rot independently, and I've spent more weekends than I'd like keeping that stack from drifting apart.

AgentKit collapses that into one surface. Agent Builder looks more like Canva than an IDE — visual workflow creation a non-technical team can touch. ChatKit drops embeddable chat UIs into an existing app. And the part I'd have killed for: built-in trace grading and automated prompt optimization, so the eval harness I hand-rolled comes in the box.

The early numbers back the assembly thesis, not the intelligence one. Bain & Company reported a 25% efficiency gain through dataset curation and prompt optimization; Ramp noted Agent Builder "significantly reduced iteration cycles and deployment times." Iteration cycles, deployment times, curation. Nobody's reporting the agents got smarter — they're reporting the scaffolding got cheaper. Exactly what was expensive when I did it by hand.

It supports third-party models and pairs the visual builder with a code-first path, courting the business user dragging boxes and the engineer who drops to code when the boxes run out. That's not fighting LangChain for the existing pie — it's growing the pie past the people who can already wire this stuff themselves.

The fragmented kingdom

Look at what the last two years actually felt like for anyone shipping agents. LangChain dominates with roughly 450,000 developers worldwide and 43% of agentic-workflow implementations in production. CrewAI has 19% adoption among advanced builders. AutoGen leads multi-agent orchestration for research. Three serious frameworks, three mental models, zero agreement on what an "agent" even is.

I've lived inside that fragmentation. Choosing between these isn't picking a favourite — it's a bet with a switching cost, and you don't find out whether you bet right until backing out hurts. Every team rebuilds the same eval and deployment plumbing because nobody shares it. The tool sprawl that ate enterprise software a decade ago was quietly repeating itself in the agent layer.

That's a market waiting to be consolidated. But the consolidator needs three things at once: developer trust, enterprise relationships, and real platform-operating experience. Very short list of companies that have all three.

The "OpenAI-compatible" move, again

I take this part of OpenAI's history seriously because I've felt it as a developer, not read about it as an analyst.

OpenAI didn't invent the completions API; they made theirs the one everyone copies. "OpenAI-compatible" is now a phrase in the docs of platforms that compete directly with them — they standardised the interface so thoroughly that rivals advertise compliance with it. Not because the API was technically untouchable, but because the developer experience was good enough, often enough, that building against it became the path of least resistance. I've reached for that endpoint myself with a dozen alternatives on hand, purely because it was the one I didn't have to think about.

AgentKit is that move pointed at agents. Make the orchestration layer the obvious default, and the default quietly becomes the standard. The network pieces are already moving — OpenAI is positioning ChatGPT as a surface third-party apps run inside, and Booking.com, Expedia, Figma, Spotify, Khan Academy, Instacart, and Uber have already signed on. Whether it lands is a separate question. The intent is unmistakable, and I've been on the receiving end of it enough times not to dismiss it.

Why October 2025 isn't an accident

Three things had to be true at once for this to be good timing rather than late timing, and in 2025 they finally were.

Enterprises got past pilots. The pilot-to-production crossing runs 6-8 months. Companies that started agent pilots in early 2024 are now standing at exactly the platform-decision moment AgentKit is built to win.

The boring parts matured. Evaluation frameworks, governance tooling, security standards moved from experimental to dependable. Trace grading, performance datasets, automated optimization — the very pieces I had to build myself for Gluon — are now stable enough to ship as a product. You couldn't have shipped AgentKit credibly in 2023; the ground it stands on didn't exist yet.

The market is expanding, not just growing. $7 billion to $93 billion isn't a bigger version of the same thing. Industry observers call 2025 a "critical inflection point where agentic systems can finally move from pilot to production with confidence." Agents crossing from developer toy into business-critical infrastructure — a different, larger buyer.

The proof points are already in production. Klarna deployed AgentKit across a significant share of its support tickets; HubSpot uses it to power their Breeze assistant for sales automation. Different industries, real load, not a demo on a stage.

What it won't do for me

This is where my enthusiasm runs out, and it's the part the keynote skipped.

Everything AgentKit makes cheaper sits before the agent acts. Build the workflow faster, grade the traces, tune the prompts, deploy in a day instead of a fortnight. I'd take all of it. But it doesn't touch the bottleneck I keep slamming into on every agent project I've shipped: not "can the agent do the work?" — it can — but "can a human confirm it before the action goes out the door?"

Run four or five agents at once, as I do, and that verification cost is the whole game. A prettier builder lets me create more agents faster, which means more output to check — the bottleneck gets worse, not better. Trace grading helps after the fact; it doesn't help me decide, in the moment, whether to let this particular agent ship this particular thing unsupervised. No visual canvas solves that, because it isn't a tooling problem. It's a trust problem, and trust doesn't drag-and-drop.

So, the honest split. I'd reach for AgentKit the moment I want a non-engineer on my team to stand up something useful without me — a capability LangChain has never genuinely offered, and most of why OpenAI is expanding the market rather than raiding LangChain's. And for a standard support or sales-automation flow where the OpenAI-compatible gravity, the partner ecosystem, and the in-box eval harness mean I'm not maintaining four tools to ship one.

I wouldn't reach for it where I need the orchestration to bend in ways a vendor's abstraction won't allow — most of why Gluon exists. LangChain and CrewAI carry real switching costs, but they buy real control, and the day AgentKit's boxes run out is the day you discover whether you're a tenant or an owner. Nor where the human-verification gap is the actual risk, because that's not in the box and OpenAI isn't pretending it is.

Simon Willison flagged the same fault line — the gap between OpenAI's expansive "a system that can do work independently" and the narrower, tool-shaped definition most working developers carry. He's right that it's ambiguous. I'd put it more bluntly: OpenAI is selling the system, and the system's hardest part is still the one nobody's selling.

The verdict

So, late or masterful? Having built the thing it competes with: deliberately, narrowly masterful — and only on the half of the problem they chose to fight on.

The half they took is the one I'd have paid them for in my Gluon weekends. The assembly tax is brutal, and collapsing four tools into one defensible platform, at the exact moment enterprises cross from pilot to production, is a clean strategic shot. If OpenAI-compatible gravity does to orchestration what it did to completions, they consolidate a fragmented market right as it goes vertical. Not a company scrambling to catch up — a company that waited until the boring parts were finally boring enough to productise, which after twenty years of watching platforms win is the version of "late" I've learned to bet on.

The half they left on the table is the one that keeps me up: an agent's value is now capped not by what it can do but by how fast a human can trust what it did. AgentKit makes the front of that pipeline cheaper and leaves the back exactly where it was. The better it works, the more output it produces for someone to verify — and that someone is still me, still reading the trace.

OpenAI built the best on-ramp the agent market has had. The traffic jam is a mile further down the road, and they didn't touch it. Neither has anyone else. That's the post I'll be writing next.

OpenAI's AgentKit: Late to the Agent Party or Strategic Masterstroke?

The Setup

What AgentKit actually ships

The fragmented kingdom

The "OpenAI-compatible" move, again

Why October 2025 isn't an accident

What it won't do for me

The verdict

Related

Tools, Then Teammates, Then Autonomy — Part 2: The Autonomy Gate

Tools, Then Teammates, Then Autonomy — Part 1: Hitting the Wall

Your AI Team Did Nothing While You Slept

OpenAI's AgentKit: Late to the Agent Party or Strategic Masterstroke?

The Setup

What AgentKit actually ships

The fragmented kingdom

The "OpenAI-compatible" move, again

Why October 2025 isn't an accident

What it won't do for me

The verdict

Related

Tools, Then Teammates, Then Autonomy — Part 2: The Autonomy Gate

Tools, Then Teammates, Then Autonomy — Part 1: Hitting the Wall

Your AI Team Did Nothing While You Slept

Practical AI engineering, in your inbox

Related

Tools, Then Teammates, Then Autonomy — Part 2: The Autonomy Gate

Tools, Then Teammates, Then Autonomy — Part 1: Hitting the Wall

Your AI Team Did Nothing While You Slept

Practical AI engineering, in your inbox

Related

Tools, Then Teammates, Then Autonomy — Part 2: The Autonomy Gate

Tools, Then Teammates, Then Autonomy — Part 1: Hitting the Wall

Your AI Team Did Nothing While You Slept