Inside the Cockpit: How Gluon Turns AI Agents into a Managed Workflow
Part 2 of 4 in Gluon: Building an AI Agent Orchestrator series
I had four terminal tabs open. Auth bug. API refactoring. Tests. A fourth—honestly, I'd scrolled through too many logs to remember what it was doing.
That's not a system. That's chaos.
Every few seconds I'd switch tabs. Still running? Finished? Crashed? Each terminal showed walls of text—Claude thinking, tool calls firing, git operations happening somewhere in the noise. No unified view. No single place to see what each agent was doing, how much money I was spending, or whether they were stepping on each other's toes.
That's the problem I solved with Gluon. Not just agents running in the background, but a cockpit for orchestrating them.
The Cockpit Metaphor
In aviation, pilots handle the moment-to-moment flying. The cockpit crew coordinates them, monitors fuel, adjusts course, handles radio traffic. Multiple instruments feeding information to one place. Multiple people, one unified picture. That's what I wanted for AI agents: clear separation between the work (agents) and the oversight (orchestrator).
Here's what a working cockpit does:
Centralizes information. Instead of scattered terminal tabs, one unified dashboard. Where are the agents? What are they doing? How much have they cost? What problems need attention?
Isolates actions. Each pilot (agent) flies independently without interfering with others. In Gluon: git worktrees. Each task gets its own branch, its own sandbox. One agent can't delete another's work.
Enables real-time awareness. You don't wait for Slack responses. You watch the work happen. Tool calls visualized as they execute. Costs ticking up in real-time. No "is it still running?" anxiety.
Provides multiple control surfaces. A cockpit doesn't have one control interface. Captain, pilot, navigator—all access the same information through different instruments. CLI for terminal workflows. Web dashboard for visual oversight. Telegram for phone checks. Discord for team coordination. PWA for mobile. Same orchestrator. Many surfaces.
Enforces safeguards. Fuel limits. Maximum altitude. Engine warnings. Gluon has cost caps, iteration limits, circuit breakers that stop agents from looping forever.
Creating a Task—From Intent to Execution
Let me walk you through what happens when I actually start work.
I open the web dashboard and click the plus icon. A task creation dialog appears.

First: project selector. In this case, auth-service. The selector is grouped by workspace, which matters across team codebases.
Then: describe the work. I type /fix-auth-bug and Gluon autocompletes. Intent clear. But I add more: @src/auth/signup.ts. That file gets included in the agent's context window—no guessing about which file to look at. I've pointed directly at what matters.
Now: profile selection. This is where model choice matters.
Quick uses Haiku—fast iterations, low cost. For simple fixes.
Standard uses Sonnet—balanced speed and reasoning. My default.
Deep uses Opus—maximum reasoning for complex refactoring.
Planning also uses Opus, with explicit plan-before-execute mode.
I pick Standard. I toggle on Git Worktree to ensure isolation. Ralph Loop stays off—I want to monitor this manually before letting it run autonomous. I can attach an image if needed (screenshot of the bug, error logs, anything visual).

One click. Task created.
Within 30 seconds, the agent is spinning. And I can watch it happen live.
Real-Time Streaming—Watching the Agent Work
The task opens in the Run Details modal. There are several tabs: Output, Errors, Messages, Commits, Files, Attachments, Loop, History. Right now, I'm on Output.
The agent's thinking streams in real-time. No polling. No refresh button. The WebSocket connection means I'm seeing what the agent is doing as it happens.
It reads the signup file. Identifies the bug—line 42, the email validation regex doesn't handle new TLDs like .cloud. Writes a test case that reproduces the error. I see each step appear:
[TOOL CALL] file_read: src/auth/signup.ts (2.4KB)
[TOOL CALL] bash: npm test -- signup.test.ts (failed, as expected)
[TOOL CALL] file_write: src/auth/signup.ts (lines 40-45 modified)
[TOOL CALL] bash: npm test -- signup.test.ts (passed)
No guessing. No "what's it doing now?" No switching terminals. I'm watching the work unfold in real-time, with full transparency.

And while all that happens, a cost counter ticks up in the corner. $0.02. $0.04. $0.07. The agent spends $0.47 total—and I knew that before it finished. Financial transparency removes anxiety. I know exactly what this piece of work costs.
I can filter by tool type—file operations, bash calls, errors only. Pause the auto-scroll to read carefully. Expand tool calls to see input and output. If the agent gets stuck and needs guidance, an "Input Required" badge appears, and I can jump in via the resume flow.
This is where real-time visibility matters. Other systems: submit a job, get a result, sit in the dark. Gluon puts you in the cockpit. You're not waiting—you're witnessing.
Git Worktree Isolation—Agents Never Touch Main
Here's the architectural decision that protects everything.
Rendering diagram...
Each task runs in /tmp/gluon-worktrees/wt-{run_id}/. That's a temporary git worktree—a lightweight clone of your repository that's completely isolated. The agent works in that worktree. It makes changes. It commits. It pushes. But it's working on branch gluon-task/{run_id}—a branch unique to that task.
The main branch stays clean. Untouched. If something goes catastrophically wrong, if the agent makes a terrible decision, its damage is confined to its own branch. You delete the worktree and move on.
Compare this to what happened with other agents. The Replit AI agent? No isolation. It deleted the entire production database. With Gluon, that agent's damage is confined. Worst case, you lose the worktree.
Before each task starts, Gluon syncs the worktree. Fetch from remote. Fast-forward if you're behind. Auto-commit any uncommitted changes in your main branch. This keeps things clean.
After the task completes, Gluon stages all changes, commits them, and pushes to the remote. If there are rebase conflicts, it handles those automatically. The agent's work is safely on its own branch, ready for review or merging.
The isolation model extends to multiple agents running in parallel. Agent One works on the auth service in wt-abc123. Agent Two works on the API in wt-def456. They're completely invisible to each other. Zero interference. Zero risk of accidentally merging Agent One's changes while Agent Two is still running.
This is architectural. It's how you scale from one agent to many.
The Kanban Board—One Cockpit, Many Pilots
This is where the magic happens. The Kanban board is the cockpit display.
Four columns: Queue (waiting to start), Running (actively executing), Review (waiting for human decision), Done (complete).
Each task is a card. I can see at a glance:
- Task title and which project it belongs to
- Profile icon showing which model it's using (Quick/Standard/Deep/Planning)
- Progress bar for formula workflows (if this is a multi-step task, it shows "Step 2/4 Implement")
- Health indicator dot for running tasks—green means healthy, yellow means slow, red means stuck or looping
- Input Required badge if the agent is waiting for user approval on something
Looking at my board right now:
One task in Queue waiting to start—a feature request.
Two tasks in Running—the auth bug I just created (green dot, healthy), and an API refactoring (yellow dot, slower than expected but still progressing).
One task in Review—yesterday's test-writing task, waiting for me to decide what to do next.
One task in Done—completed, ready for PR creation.
I can drag tasks between columns if I want to manually pause something or promote it. Click on any card and it opens the full Run Details modal.
This is the shift from "running agents" to "orchestrating agents." I'm not chasing the work across multiple terminals. I'm looking at the cockpit display and deciding what to do next.
Run Details Modal—Deep Diagnostics
When I click on a task, the modal opens with multiple tabs. Visibility goes deep.
Output — streaming logs from the run.
Errors — filtered error messages with stack traces. Quick diagnosis.
Messages — full conversation history, chronological. Expandable tool calls with input/output. Inline screenshot thumbnails.
Commits — every commit made during the run. Author, timestamp, message, files changed. Click for full diff.
Files — all changed files with line counts (additions/deletions). Inline diff viewer. This is where I see what actually changed before deciding to merge.
Attachments — captured images, screenshots, architecture diagrams.
Loop — (for autonomous tasks) Ralph iteration progress, circuit breaker state, cost, safety metrics.
History — related runs in chronological order. Useful for multi-step formulas and resumed tasks.
I click on the Commits tab for the auth bug task. Three commits:
- "Add test case for email validation with new TLDs"
- "Fix regex pattern to support .cloud, .app, .dev"
- "Add test for edge case: nested subdomains"
I click the first commit. The diff appears inline. Tests added. Assertions clear. Tests failing (as expected before the fix).
I scroll to the second commit. The regex change is visible. The old pattern supported only standard TLDs. The new pattern uses a more permissive regex that handles arbitrary extensions. Tests now pass.


All the information I need to make a decision—merge or refine—right there in the modal.
Cost Tracking—Financial Transparency
I click on the Costs dashboard. It's showing me real-time spend across all my projects.
Today: $4.32 (five tasks run)
This week: $14.78
This month: $32.11
I can click into each project:
auth-service: $8.94
- Task 1 (Standard, Sonnet): $0.47
- Task 2 (Deep, Opus): $1.23
- Task 3 (Quick, Haiku): $0.03
Each task shows the model that ran it and the exact cost. This matters because the models have dramatically different costs at scale.
- Haiku: ~$36K/year if running continuously
- Sonnet: ~$180K/year
- Opus: ~$900K/year
The difference between Haiku and Opus is 25x. So I have to be intentional about which profile I choose. Quick task? Use Haiku. Complex refactoring? Use Opus. Most work? Sonnet.
I can set a cost cap on any task with the --max-cost flag. If I create a task and set --max-cost 5, the agent will stop if it hits $5 in spending. No surprise bills. No runaway loops that drain your budget.
Multiple Interfaces—Same Orchestrator, Many Surfaces
Here's the architectural move that unlocked flexibility.
I submit a task from the web dashboard. Thirty seconds later, my coworker checks status via Telegram. Same task, different interface. He types /logs and sees the real-time stream inline, tool calls visualized.
That night on the couch: I pull up Gluon on my phone. It's a PWA—installable like a native app. Pull to refresh. Same Kanban board, responsive and snappy. The task from this morning shows green (completed).

At my desk: CLI. gluon logs {run-id} -f streams the same real-time logs to my terminal.
The magic is the shared orchestrator. One SQLite database. One FastAPI backend. One WebSocket layer. Whether you're in CLI, web, Telegram, Discord, or PWA, you're talking to the same system. That's why this scales.
And because the interface layer is decoupled, I can add new ones without touching the core. Slack bot? Route through the orchestrator. Email? Same. Custom webhook? Same.
Orchestration as Scaled Workflow
Before: four terminals, each a task. Juggling context, wondering what's stuck, scrolling logs hoping for signals.
Now: one cockpit. Multiple pilots working in parallel. Progress, costs, errors—all at a glance. Zoom into any task with the Run Details modal. The bottleneck isn't the agents. It's orchestration, coordination, observability. That's what Gluon addresses.
A Week in the Cockpit
Monday: Standup reveals three priorities. I create three tasks in parallel from the Kanban board. Bugfix (Standard). Refactoring (Deep). Feature skeleton (Planning). All three spin up by noon. Green dots on two. Yellow on one—it's complex, taking longer.
Tuesday: Refactoring task moves to Review. I inspect the diffs and spot something: needs docstrings on the new parameters. I resume with that feedback. Agent picks up where it left off and completes the documentation. Back to Running.
Wednesday: Bugfix completes. I review commits, approve, one-click PR creation. Refactoring task done—diffs clean. Feature skeleton still running (Planning is thorough). Cost so far: $6.42.
Thursday: Feature skeleton moves to Review. I read through the architecture, approve it. I resume: "This structure is solid. Now implement the user authentication flow." Standard profile. By day's end, it's complete and back in Review.
Friday: Three completed PRs ready for code review. Weekly cost: $14.79. Well within budget.
No anxiety. No surprises. Clear picture of the work, always visible, financially tracked, isolated per-task, decisions made at the right moments.
The Cockpit Difference
Terminal chaos to cockpit. That's the journey.
This isn't UI polish. It's architectural. It's how you scale from one agent (interesting experiment) to four agents (useful system) to a team of agents (production infrastructure).
Real-time logs mean you're not flying blind. Git worktree isolation means you can trust agents without risk. Cost tracking means you understand the economics from the start. Multiple interfaces mean you're not stuck at your desk. Kanban board means decisions happen at the orchestrator level, not scattered across terminals.
This foundation enables what comes next in Post 3: autonomous loops. The Ralph Loop. When you stop deciding and let the agents decide for you. Set a cost cap and iteration limit and let them run until the work is classified complete. That's when orchestration becomes transformational.
For now: you're the cockpit crew. Flying the orchestrator. Watching pilots (agents) work. Deciding what comes next based on real-time data.
That's orchestration.
Series Navigation
- Post 1: From tmux Chaos to AI Agent Orchestration
- Post 2: Inside the Cockpit (you are here)
- Post 3: Ralph Loop — Autonomous Execution
- Post 4: From Solo Tool to Team Infrastructure
From Solo Tool to Team Infrastructure: Scaling Gluon for Production
When I first built Gluon on my Mac mini, I was solving a personal problem: monitoring Claude agents without losing my mind to tmux logs. But when teams join the picture, everything changes—security, governance, observability, and the fundamental role of the developer. Here's what production infrastructure for autonomous agents looks like.
Ralph Loop: Teaching AI Agents to Work Autonomously (Without Burning Your Budget)
How Gluon's Ralph Loop enables autonomous Claude execution with built-in safety rails — circuit breakers, multi-signal completion detection, and cost controls that scale from simple tasks to complex workflows.
Why I Built Gluon: From tmux Chaos to AI Agent Orchestration
When orchestrating 4-5 Claude Code agents across projects, I was losing track of progress and cost. A 2-3 day build with Claude Code itself led to Gluon—an open-source orchestrator that treats AI agents like team members.