Inside the Cockpit: How Gluon Turns AI Agents into a Managed Workflow
Part 2 of 4 in Gluon: Building an AI Agent Orchestrator series
The Question That Kept Interrupting
Four agents running. Auth bug. API refactor. Tests. A fourth — honestly, I'd scrolled through too many logs to remember what it was doing.
Every few seconds, the same question: is it still running?
Not "is the code good?" Not "did it pick the right approach?" Just: is it still running? The most basic question you can ask about a process, and I couldn't answer it without switching tabs and scanning walls of terminal output.
That single question — is it still running? — is the difference between running agents and orchestrating them. The first is an experiment. The second is a workflow.
Why Cockpits Exist
In aviation, pilots handle the flying. The cockpit crew coordinates everything else — fuel, course, radio traffic, weather. Multiple instruments feeding information to one place. Multiple people, one unified picture.
That's what was missing. Not smarter agents. A better cockpit.
Here's what a working cockpit provides. Centralized information — one dashboard instead of scattered terminals. Where are the agents? What are they doing? How much have they cost? What needs attention? Isolated actions — each pilot flies independently. In Gluon, git worktrees ensure one agent can't delete another's work. Real-time awareness — you watch the work happen, tool calls visualized as they execute, costs ticking up live. Multiple control surfaces — CLI for terminal workflows, web dashboard for visual oversight, Telegram for phone checks, Discord for team coordination, PWA for mobile. Same orchestrator, many surfaces. Built-in safeguards — cost caps, iteration limits, circuit breakers that stop agents from looping forever.
The cockpit doesn't fly the plane. It makes the flying visible, coordinated, and safe.
From Intent to Execution in 30 Seconds
Let me walk you through what actually happens when I start work.
I open the web dashboard and click the plus icon. A task creation dialog appears.

First: project selector. auth-service, grouped by workspace — this matters when you're working across team codebases.
Then: describe the work. I type /fix-auth-bug and Gluon autocompletes. Intent clear. But I add more: @src/auth/signup.ts. That file gets included in the agent's context window — no guessing about which file to look at. I've pointed directly at what matters.
Now: profile selection. This is where model choice becomes a cost decision you make once, not a bill you discover later.
- Quick uses Haiku — fast, cheap. For simple fixes.
- Standard uses Sonnet — balanced. My default.
- Deep uses Opus — maximum reasoning for complex refactoring.
- Planning uses Opus with explicit plan-before-execute mode.
I pick Standard. Toggle on Git Worktree for isolation. Ralph Loop stays off — I want to watch this one before letting it run autonomous. I can attach an image if needed: screenshot of the bug, error logs, anything visual.

One click. Task created. Within 30 seconds, the agent is spinning.
And here's the part that changed how I work: I can watch it happen live.
Watching the Work — Not Waiting for It
The task opens in the Run Details modal. Tabs across the top: Output, Errors, Messages, Commits, Files, Attachments, Loop, History. I'm on Output.
The agent's thinking streams in real-time. No polling. No refresh button. WebSocket connection — I'm seeing what the agent does as it happens.
It reads the signup file. Identifies the bug — line 42, the email validation regex doesn't handle new TLDs like .cloud. Writes a test case that reproduces the error. Each step appears in the stream:
[TOOL CALL] file_read: src/auth/signup.ts (2.4KB)
[TOOL CALL] bash: npm test -- signup.test.ts (failed, as expected)
[TOOL CALL] file_write: src/auth/signup.ts (lines 40-45 modified)
[TOOL CALL] bash: npm test -- signup.test.ts (passed)
No guessing. No terminal tab-switching. I'm watching a developer work — reading the file, writing a failing test, fixing the code, watching the test pass. Except this developer runs at machine speed and costs $0.47.

And while it works, a cost counter ticks in the corner. $0.02. $0.04. $0.07. Financial transparency in real-time, not a surprise on the invoice. Is it still running? Yes — and I know exactly what it's doing and what it costs.
I can filter by tool type: file operations, bash calls, errors only. Pause auto-scroll to read carefully. Expand tool calls to see inputs and outputs. If the agent needs guidance, an "Input Required" badge appears and I jump in via the resume flow.
Other systems: submit a job, get a result, sit in the dark. Gluon puts you in the cockpit. You're not waiting — you're witnessing.
The Architecture That Protects Everything
Here's the decision that makes all of this safe enough to trust.
Rendering diagram...
Each task runs in /tmp/gluon-worktrees/wt-{run_id}/ — a temporary git worktree, a lightweight clone of your repository that's completely isolated. The agent works in that worktree. It makes changes. It commits. It pushes. But it's working on branch gluon-task/{run_id} — a branch unique to that task.
Main stays clean. Untouched.
If something goes catastrophically wrong — if the agent makes a terrible decision — its damage is confined to its own branch. Delete the worktree, move on. Compare this to agents with no isolation. The Replit AI agent? No isolation. It deleted the entire production database. With Gluon, worst case: you lose the worktree.
Before each task starts, Gluon syncs: fetch from remote, fast-forward if behind, auto-commit any uncommitted changes in main. After the task completes, Gluon stages all changes, commits, and pushes. Rebase conflicts get handled automatically. The agent's work lands safely on its own branch, ready for review.
The isolation extends to parallel work. Agent One works on auth in wt-abc123. Agent Two works on the API in wt-def456. They're completely invisible to each other. Zero interference. Zero risk of accidentally merging one agent's changes while another is still running.
This is how you scale from one agent to many. Not by hoping they don't collide — by making collision impossible.
The Kanban Board — One Screen, Many Pilots
Four columns: Queue (waiting), Running (executing), Review (waiting for human decision), Done (complete).
Each task is a card. At a glance:
- Task title and project
- Profile icon — which model (Quick/Standard/Deep/Planning)
- Progress bar for multi-step formulas ("Step 2/4 Implement")
- Health dot — green (healthy), yellow (slow), red (stuck or looping)
- Input Required badge if the agent is waiting for approval
My board right now: one task in Queue — a feature request. Two in Running — the auth bug (green dot, healthy) and an API refactor (yellow, slower than expected but progressing). One in Review — yesterday's test-writing task. One in Done — ready for PR.
Drag tasks between columns to manually pause or promote. Click any card for full Run Details.
Is it still running? Look at the board. Green dot. Yes.
This is the shift from chasing work across terminals to making decisions from a single screen. Not running agents — orchestrating them.
Deep Diagnostics — Everything About a Task
Click a task card. The modal opens with tabs that go deep.
Output — streaming logs. Errors — filtered error messages with stack traces. Messages — full conversation history, expandable tool calls, inline screenshot thumbnails. Commits — every commit during the run, with author, timestamp, message, and files changed. Click for full diff. Files — all changed files with line counts and inline diff viewer. Attachments — captured images, diagrams. Loop — Ralph iteration progress, circuit breaker state, safety metrics. History — related runs in chronological order.
I click Commits for the auth bug. Three commits:
- "Add test case for email validation with new TLDs"
- "Fix regex pattern to support .cloud, .app, .dev"
- "Add test for edge case: nested subdomains"
The first commit shows tests added, assertions clear, tests failing as expected. The second shows the regex change — old pattern handled only standard TLDs, new pattern handles arbitrary extensions. Tests pass.


Everything I need to make a decision — merge or refine — right there. No switching to a terminal. No cloning branches to inspect locally. The cockpit has the instrumentation.
The Numbers You Actually Need
Cost tracking sounds mundane until you realize the difference between model tiers is 25x.
- Haiku: ~$36K/year running continuously
- Sonnet: ~$180K/year
- Opus: ~$900K/year
That's not a rounding error. That's the difference between a viable workflow and an unsustainable one. So you need to know — per task, per project, per day — what you're spending.
The Costs dashboard shows real-time spend:
Today: $4.32 (five tasks). This week: $14.78. This month: $32.11.
Click into a project:
auth-service: $8.94
- Task 1 (Standard, Sonnet): $0.47
- Task 2 (Deep, Opus): $1.23
- Task 3 (Quick, Haiku): $0.03
Each task shows the model and exact cost. And I can set a cost cap on any task with --max-cost. Set it to $5, the agent stops at $5. No surprise bills. No runaway loops draining the budget.
If you're running agents without cost visibility, you're flying without a fuel gauge. You'll find out you're empty when you crash.
Same Orchestrator, Every Surface
Here's the architectural move that made everything click.
I submit a task from the web dashboard. Thirty seconds later, a coworker checks status via Telegram. Same task, different interface. He types /logs and sees the real-time stream inline.
That night on the couch: I pull up Gluon on my phone. It's a PWA — installable like a native app. Pull to refresh. Same Kanban board, responsive. The task from this morning shows green.

At my desk: CLI. gluon logs {run-id} -f streams the same logs to my terminal.
One SQLite database. One FastAPI backend. One WebSocket layer. Whether you're in CLI, web, Telegram, Discord, or PWA — same system. And because the interface layer is decoupled, adding a new surface doesn't touch the core. Slack bot? Route through the orchestrator. Email? Same. Custom webhook? Same.
Is it still running? Check from anywhere.
A Week in the Cockpit
Monday: Standup reveals three priorities. I create three tasks in parallel from the Kanban board. Bugfix (Standard). Refactoring (Deep). Feature skeleton (Planning). All three spin up by noon. Green dots on two. Yellow on the third — it's complex, taking longer. I check the board from my phone while getting coffee. Still yellow. Fine.
Tuesday: Refactoring task moves to Review. I inspect the diffs and spot something: needs docstrings on new parameters. I resume with that feedback. Agent picks up where it left off. Back to Running.
Wednesday: Bugfix completes. Review commits, approve, one-click PR creation. Refactoring done — diffs clean. Feature skeleton still running (Planning is thorough). Cost so far: $6.42.
Thursday: Feature skeleton moves to Review. Architecture looks solid. I resume: "This structure is solid. Now implement the user authentication flow." Standard profile. By end of day, it's complete and back in Review.
Friday: Three completed PRs ready for code review. Weekly cost: $14.79.
No anxiety about what's running. No surprises about what things cost. No terminal archaeology to figure out what happened two days ago. Just a board, a budget, and decisions made at the right moments.
The Real Difference
This isn't UI polish. It's a structural shift in how you work with AI agents.
Real-time logs mean you're not flying blind. Git worktree isolation means you can trust agents without risk. Cost tracking means you understand the economics before they surprise you. Multiple interfaces mean you're not chained to your desk. The Kanban board means decisions happen at the orchestrator level, not scattered across terminals.
Is it still running?
After a week in the cockpit, I stopped asking that question. Not because it stopped mattering — because the answer was always visible. Green dot, yellow dot, red dot. The cockpit answers before you ask.
That foundation enables what comes next in Part 3: autonomous loops. The Ralph Loop. When you stop monitoring and let the agents decide for themselves. Set a cost cap and iteration limit, let them run until the work is done.
That's when orchestration becomes transformational. But it only works if you trust the cockpit first.
Series Navigation
- Post 1: From tmux Chaos to AI Agent Orchestration
- Post 2: Inside the Cockpit (you are here)
- Post 3: Ralph Loop — Autonomous Execution
- Post 4: From Solo Tool to Team Infrastructure
The Cutler.sg Newsletter
Weekly notes on AI, engineering leadership, and building in Singapore. No fluff.
From Solo Tool to Team Infrastructure: Scaling Gluon for Production
When I first built Gluon on my Mac mini, I was solving a personal problem: monitoring Claude agents without losing my mind to tmux logs. But when teams join the picture, everything changes — security, governance, observability, and the fundamental role of the developer. Here's what production infrastructure for autonomous agents looks like.
Ralph Loop: Teaching AI Agents to Work Autonomously (Without Burning Your Budget)
How Gluon's Ralph Loop enables autonomous Claude execution with built-in safety rails — circuit breakers, multi-signal completion detection, and cost controls that scale from simple tasks to complex workflows.
Why I Built Gluon: From tmux Chaos to AI Agent Orchestration
When orchestrating 4-5 Claude Code agents across projects, I was losing track of progress and cost. A 2-3 day build with Claude Code itself led to Gluon—an open-source orchestrator that treats AI agents like team members.