Orchestrating Multiple AI Agents: The Gluon Cockpit Guide

Part 2 of 4 in Gluon: Building an AI Agent Orchestrator series

---

The Question That Kept Interrupting

Four agents running. Auth bug. API refactor. Tests. A fourth — honestly, I'd scrolled through too many logs to remember what it was doing.

Every few seconds, the same question: is it still running?

Not "is the code good?" Not "did it pick the right approach?" Just: is it still running? The most basic question you can ask about a process, and I couldn't answer it without switching tabs and scanning walls of terminal output.

That single question — is it still running? — is the difference between running agents and orchestrating them. The first is an experiment. The second is a workflow.

Why Cockpits Exist

In aviation, pilots handle the flying. The cockpit crew coordinates everything else — fuel, course, radio traffic, weather. Multiple instruments feeding information to one place. Multiple people, one unified picture.

That's what was missing. Not smarter agents. A better cockpit.

Here's what a working cockpit provides. Centralized information — one dashboard instead of scattered terminals. Where are the agents? What are they doing? How much have they cost? What needs attention? Isolated actions — each pilot flies independently. In Gluon, git worktrees ensure one agent can't delete another's work. Real-time awareness — you watch the work happen, tool calls visualized as they execute, costs ticking up live. Multiple control surfaces — CLI for terminal workflows, web dashboard for visual oversight, Telegram for phone checks, Discord for team coordination, PWA for mobile. Same orchestrator, many surfaces. Built-in safeguards — cost caps, iteration limits, circuit breakers that stop agents from looping forever.

The cockpit doesn't fly the plane. It makes the flying visible, coordinated, and safe.

From Intent to Execution in 30 Seconds

Let me walk you through what actually happens when I start work.

I open the web dashboard and click the plus icon. A task creation dialog appears.

Task creation dialog with slash command autocomplete

First: project selector. auth-service, grouped by workspace — this matters when you're working across team codebases.

Then: describe the work. I type /fix-auth-bug and Gluon autocompletes. Intent clear. But I add more: @src/auth/signup.ts. That file gets included in the agent's context window — no guessing about which file to look at. I've pointed directly at what matters.

Now: profile selection. This is where model choice becomes a cost decision you make once, not a bill you discover later.

Quick uses Haiku — fast, cheap. For simple fixes.
Standard uses Sonnet — balanced. My default.
Deep uses Opus — maximum reasoning for complex refactoring.
Planning uses Opus with explicit plan-before-execute mode.

I pick Standard. Toggle on Git Worktree for isolation. Ralph Loop stays off — I want to watch this one before letting it run autonomous. I can attach an image if needed: screenshot of the bug, error logs, anything visual.

Profile selection, git worktree toggle, and execution options

Click to enlarge

One click. Task created. Within 30 seconds, the agent is spinning.

And here's the part that changed how I work: I can watch it happen live.

Watching the Work — Not Waiting for It

The task opens in the Run Details modal. Tabs across the top: Output, Errors, Messages, Commits, Files, Attachments, Loop, History. I'm on Output.

The agent's thinking streams in real-time. No polling. No refresh button. WebSocket connection — I'm seeing what the agent does as it happens.

It reads the signup file. Identifies the bug — line 42, the email validation regex doesn't handle new TLDs like .cloud. Writes a test case that reproduces the error. Each step appears in the stream:

[TOOL CALL] file_read: src/auth/signup.ts (2.4KB)
[TOOL CALL] bash: npm test -- signup.test.ts (failed, as expected)
[TOOL CALL] file_write: src/auth/signup.ts (lines 40-45 modified)
[TOOL CALL] bash: npm test -- signup.test.ts (passed)

No guessing. No terminal tab-switching. I'm watching a developer work — reading the file, writing a failing test, fixing the code, watching the test pass. Except this developer runs at machine speed and costs $0.47.

Live task execution with streaming output and tool calls

And while it works, a cost counter ticks in the corner. $0.02. $0.04. $0.07. Financial transparency in real-time, not a surprise on the invoice. Is it still running? Yes — and I know exactly what it's doing and what it costs.

I can filter by tool type: file operations, bash calls, errors only. Pause auto-scroll to read carefully. Expand tool calls to see inputs and outputs. If the agent needs guidance, an "Input Required" badge appears and I jump in via the resume flow.

Other systems: submit a job, get a result, sit in the dark. Gluon puts you in the cockpit. You're not waiting — you're witnessing.

The Architecture That Protects Everything

Here's the decision that makes all of this safe enough to trust.

mermaid


Rendering diagram...

Each task runs in /tmp/gluon-worktrees/wt-{run_id}/ — a temporary git worktree, a lightweight clone of your repository that's completely isolated. The agent works in that worktree. It makes changes. It commits. It pushes. But it's working on branch gluon-task/{run_id} — a branch unique to that task.

Main stays clean. Untouched.

If something goes catastrophically wrong — if the agent makes a terrible decision — its damage is confined to its own branch. Delete the worktree, move on. Compare this to agents with no isolation. The Replit AI agent? No isolation. It deleted the entire production database. With Gluon, worst case: you lose the worktree.

Before each task starts, Gluon syncs: fetch from remote, fast-forward if behind, auto-commit any uncommitted changes in main. After the task completes, Gluon stages all changes, commits, and pushes. Rebase conflicts get handled automatically. The agent's work lands safely on its own branch, ready for review.

The isolation extends to parallel work. Agent One works on auth in wt-abc123. Agent Two works on the API in wt-def456. They're completely invisible to each other. Zero interference. Zero risk of accidentally merging one agent's changes while another is still running.

This is how you scale from one agent to many. Not by hoping they don't collide — by making collision impossible.

The Kanban Board — One Screen, Many Pilots

Four columns: Queue (waiting), Running (executing), Review (waiting for human decision), Done (complete).

Each task is a card. At a glance:

Task title and project
Profile icon — which model (Quick/Standard/Deep/Planning)
Progress bar for multi-step formulas ("Step 2/4 Implement")
Health dot — green (healthy), yellow (slow), red (stuck or looping)
Input Required badge if the agent is waiting for approval

My board right now: one task in Queue — a feature request. Two in Running — the auth bug (green dot, healthy) and an API refactor (yellow, slower than expected but progressing). One in Review — yesterday's test-writing task. One in Done — ready for PR.

Drag tasks between columns to manually pause or promote. Click any card for full Run Details.

Is it still running? Look at the board. Green dot. Yes.

This is the shift from chasing work across terminals to making decisions from a single screen. Not running agents — orchestrating them.

Deep Diagnostics — Everything About a Task

Click a task card. The modal opens with tabs that go deep.

Output — streaming logs. Errors — filtered error messages with stack traces. Messages — full conversation history, expandable tool calls, inline screenshot thumbnails. Commits — every commit during the run, with author, timestamp, message, and files changed. Click for full diff. Files — all changed files with line counts and inline diff viewer. Attachments — captured images, diagrams. Loop — Ralph iteration progress, circuit breaker state, safety metrics. History — related runs in chronological order.

I click Commits for the auth bug. Three commits:

"Add test case for email validation with new TLDs"
"Fix regex pattern to support .cloud, .app, .dev"
"Add test for edge case: nested subdomains"

The first commit shows tests added, assertions clear, tests failing as expected. The second shows the regex change — old pattern handled only standard TLDs, new pattern handles arbitrary extensions. Tests pass.

Commits tab showing files changed with additions and deletions

Inline file diffs for reviewing agent changes

Everything I need to make a decision — merge or refine — right there. No switching to a terminal. No cloning branches to inspect locally. The cockpit has the instrumentation.

The Numbers You Actually Need

Cost tracking sounds mundane until you realize the difference between model tiers is 25x.

Haiku: ~$36K/year running continuously
Sonnet: ~$180K/year
Opus: ~$900K/year

That's not a rounding error. That's the difference between a viable workflow and an unsustainable one. So you need to know — per task, per project, per day — what you're spending.

The Costs dashboard shows real-time spend:

Today: $4.32 (five tasks). This week: $14.78. This month: $32.11.

Click into a project:

auth-service: $8.94

Task 1 (Standard, Sonnet): $0.47
Task 2 (Deep, Opus): $1.23
Task 3 (Quick, Haiku): $0.03

Each task shows the model and exact cost. And I can set a cost cap on any task with --max-cost. Set it to $5, the agent stops at $5. No surprise bills. No runaway loops draining the budget.

If you're running agents without cost visibility, you're flying without a fuel gauge. You'll find out you're empty when you crash.

Same Orchestrator, Every Surface

Here's the architectural move that made everything click.

I submit a task from the web dashboard. Thirty seconds later, a coworker checks status via Telegram. Same task, different interface. He types /logs and sees the real-time stream inline.

That night on the couch: I pull up Gluon on my phone. It's a PWA — installable like a native app. Pull to refresh. Same Kanban board, responsive. The task from this morning shows green.

Gluon mobile PWA showing the Kanban board on a phone

At my desk: CLI. gluon logs {run-id} -f streams the same logs to my terminal.

One SQLite database. One FastAPI backend. One WebSocket layer. Whether you're in CLI, web, Telegram, Discord, or PWA — same system. And because the interface layer is decoupled, adding a new surface doesn't touch the core. Slack bot? Route through the orchestrator. Email? Same. Custom webhook? Same.

Is it still running? Check from anywhere.

A Week in the Cockpit

Monday: Standup reveals three priorities. I create three tasks in parallel from the Kanban board. Bugfix (Standard). Refactoring (Deep). Feature skeleton (Planning). All three spin up by noon. Green dots on two. Yellow on the third — it's complex, taking longer. I check the board from my phone while getting coffee. Still yellow. Fine.

Tuesday: Refactoring task moves to Review. I inspect the diffs and spot something: needs docstrings on new parameters. I resume with that feedback. Agent picks up where it left off. Back to Running.

Wednesday: Bugfix completes. Review commits, approve, one-click PR creation. Refactoring done — diffs clean. Feature skeleton still running (Planning is thorough). Cost so far: $6.42.

Thursday: Feature skeleton moves to Review. Architecture looks solid. I resume: "This structure is solid. Now implement the user authentication flow." Standard profile. By end of day, it's complete and back in Review.

Friday: Three completed PRs ready for code review. Weekly cost: $14.79.

No anxiety about what's running. No surprises about what things cost. No terminal archaeology to figure out what happened two days ago. Just a board, a budget, and decisions made at the right moments.

The Real Difference

This isn't UI polish. It's a structural shift in how you work with AI agents.

Real-time logs mean you're not flying blind. Git worktree isolation means you can trust agents without risk. Cost tracking means you understand the economics before they surprise you. Multiple interfaces mean you're not chained to your desk. The Kanban board means decisions happen at the orchestrator level, not scattered across terminals.

Is it still running?

After a week in the cockpit, I stopped asking that question. Not because it stopped mattering — because the answer was always visible. Green dot, yellow dot, red dot. The cockpit answers before you ask.

That foundation enables what comes next in Part 3: autonomous loops. The Ralph Loop. When you stop monitoring and let the agents decide for themselves. Set a cost cap and iteration limit, let them run until the work is done.

That's when orchestration becomes transformational. But it only works if you trust the cockpit first.

---

Series Navigation

Post 1: From tmux Chaos to AI Agent Orchestration
Post 2: Inside the Cockpit (you are here)
Post 3: Ralph Loop — Autonomous Execution
Post 4: From Solo Tool to Team Infrastructure

Part 2 of 4 in Gluon: Building an AI Agent Orchestrator series

---

The Question That Kept Interrupting

Four agents running. Auth bug. API refactor. Tests. A fourth — honestly, I'd scrolled through too many logs to remember what it was doing.

Every few seconds, the same question: is it still running?

That single question — is it still running? — is the difference between running agents and orchestrating them. The first is an experiment. The second is a workflow.

Why Cockpits Exist

That's what was missing. Not smarter agents. A better cockpit.

The cockpit doesn't fly the plane. It makes the flying visible, coordinated, and safe.

From Intent to Execution in 30 Seconds

Let me walk you through what actually happens when I start work.

I open the web dashboard and click the plus icon. A task creation dialog appears.

First: project selector. auth-service, grouped by workspace — this matters when you're working across team codebases.

Now: profile selection. This is where model choice becomes a cost decision you make once, not a bill you discover later.

Quick uses Haiku — fast, cheap. For simple fixes.
Standard uses Sonnet — balanced. My default.
Deep uses Opus — maximum reasoning for complex refactoring.
Planning uses Opus with explicit plan-before-execute mode.

Profile selection, git worktree toggle, and execution options

Click to enlarge

One click. Task created. Within 30 seconds, the agent is spinning.

And here's the part that changed how I work: I can watch it happen live.

Watching the Work — Not Waiting for It

The task opens in the Run Details modal. Tabs across the top: Output, Errors, Messages, Commits, Files, Attachments, Loop, History. I'm on Output.

The agent's thinking streams in real-time. No polling. No refresh button. WebSocket connection — I'm seeing what the agent does as it happens.

[TOOL CALL] file_read: src/auth/signup.ts (2.4KB)
[TOOL CALL] bash: npm test -- signup.test.ts (failed, as expected)
[TOOL CALL] file_write: src/auth/signup.ts (lines 40-45 modified)
[TOOL CALL] bash: npm test -- signup.test.ts (passed)

Other systems: submit a job, get a result, sit in the dark. Gluon puts you in the cockpit. You're not waiting — you're witnessing.

The Architecture That Protects Everything

Here's the decision that makes all of this safe enough to trust.

mermaid


Rendering diagram...

Main stays clean. Untouched.

This is how you scale from one agent to many. Not by hoping they don't collide — by making collision impossible.

The Kanban Board — One Screen, Many Pilots

Four columns: Queue (waiting), Running (executing), Review (waiting for human decision), Done (complete).

Each task is a card. At a glance:

Task title and project
Profile icon — which model (Quick/Standard/Deep/Planning)
Progress bar for multi-step formulas ("Step 2/4 Implement")
Health dot — green (healthy), yellow (slow), red (stuck or looping)
Input Required badge if the agent is waiting for approval

Drag tasks between columns to manually pause or promote. Click any card for full Run Details.

Is it still running? Look at the board. Green dot. Yes.

This is the shift from chasing work across terminals to making decisions from a single screen. Not running agents — orchestrating them.

Deep Diagnostics — Everything About a Task

Click a task card. The modal opens with tabs that go deep.

I click Commits for the auth bug. Three commits:

"Add test case for email validation with new TLDs"
"Fix regex pattern to support .cloud, .app, .dev"
"Add test for edge case: nested subdomains"

Everything I need to make a decision — merge or refine — right there. No switching to a terminal. No cloning branches to inspect locally. The cockpit has the instrumentation.

The Numbers You Actually Need

Cost tracking sounds mundane until you realize the difference between model tiers is 25x.

Haiku: ~$36K/year running continuously
Sonnet: ~$180K/year
Opus: ~$900K/year

That's not a rounding error. That's the difference between a viable workflow and an unsustainable one. So you need to know — per task, per project, per day — what you're spending.

The Costs dashboard shows real-time spend:

Today: $4.32 (five tasks). This week: $14.78. This month: $32.11.

Click into a project:

auth-service: $8.94

Task 1 (Standard, Sonnet): $0.47
Task 2 (Deep, Opus): $1.23
Task 3 (Quick, Haiku): $0.03

Each task shows the model and exact cost. And I can set a cost cap on any task with --max-cost. Set it to $5, the agent stops at $5. No surprise bills. No runaway loops draining the budget.

If you're running agents without cost visibility, you're flying without a fuel gauge. You'll find out you're empty when you crash.

Same Orchestrator, Every Surface

Here's the architectural move that made everything click.

I submit a task from the web dashboard. Thirty seconds later, a coworker checks status via Telegram. Same task, different interface. He types /logs and sees the real-time stream inline.

That night on the couch: I pull up Gluon on my phone. It's a PWA — installable like a native app. Pull to refresh. Same Kanban board, responsive. The task from this morning shows green.

At my desk: CLI. gluon logs {run-id} -f streams the same logs to my terminal.

Is it still running? Check from anywhere.

A Week in the Cockpit

Wednesday: Bugfix completes. Review commits, approve, one-click PR creation. Refactoring done — diffs clean. Feature skeleton still running (Planning is thorough). Cost so far: $6.42.

Friday: Three completed PRs ready for code review. Weekly cost: $14.79.

The Real Difference

This isn't UI polish. It's a structural shift in how you work with AI agents.

Is it still running?

That's when orchestration becomes transformational. But it only works if you trust the cockpit first.

---

Series Navigation

Post 1: From tmux Chaos to AI Agent Orchestration
Post 2: Inside the Cockpit (you are here)
Post 3: Ralph Loop — Autonomous Execution
Post 4: From Solo Tool to Team Infrastructure

Michael Cutler

Inside the Cockpit: How Gluon Turns AI Agents into a Managed Workflow

The Question That Kept Interrupting

Why Cockpits Exist

From Intent to Execution in 30 Seconds

Watching the Work — Not Waiting for It

The Architecture That Protects Everything

The Kanban Board — One Screen, Many Pilots

Deep Diagnostics — Everything About a Task

The Numbers You Actually Need

Same Orchestrator, Every Surface

A Week in the Cockpit

The Real Difference

The Cutler.sg Newsletter

From Solo Tool to Team Infrastructure: Scaling Gluon for Production

Ralph Loop: Teaching AI Agents to Work Autonomously (Without Burning Your Budget)

Why I Built Gluon: From tmux Chaos to AI Agent Orchestration

Inside the Cockpit: How Gluon Turns AI Agents into a Managed Workflow

The Question That Kept Interrupting

Why Cockpits Exist

From Intent to Execution in 30 Seconds

Watching the Work — Not Waiting for It

The Architecture That Protects Everything

The Kanban Board — One Screen, Many Pilots

Deep Diagnostics — Everything About a Task

The Numbers You Actually Need

Same Orchestrator, Every Surface

A Week in the Cockpit

The Real Difference

The Cutler.sg Newsletter

From Solo Tool to Team Infrastructure: Scaling Gluon for Production

Ralph Loop: Teaching AI Agents to Work Autonomously (Without Burning Your Budget)

Why I Built Gluon: From tmux Chaos to AI Agent Orchestration