From Solo Tool to Team Infrastructure: Scaling Gluon for Production
Part 4 of 4 in Gluon: Building an AI Agent Orchestrator series
Monitoring a Ralph loop via tmux SSH is fine when it's your laptop. It's a nightmare when your team of five developers each spawn parallel agents and you have no idea which one is running out of context, which one just deleted production files, or whether the $50 cost cap got exceeded last night.
Three months ago, I showed you Gluon running on my Mac mini—a personal tool solving a personal problem. Today, it's evolved into something else entirely: production infrastructure for autonomous agents at team scale. That evolution forced me to rethink almost everything. Not the Claude integrations. Not the core orchestration loop. But everything around it: security isolation, cost governance, operational visibility, and how humans actually stay in control when AI agents multiply.
This is where the story gets interesting—and where the real constraints emerge.
The Inflection Point
The transition from solo developer tool to team infrastructure is fundamental. The tension is real: Gluon was born from my frustration with tooling gaps. I wanted to watch Ralph loops, capture outputs, resume sessions, coordinate multiple agents across projects. All of that works beautifully on a Mac mini when the only user is me.
But at team scale, new constraints surface:
- Security isolation: Each agent runs with capabilities. What stops one from corrupting another's work? What prevents accidental (or intentional) access to ~/.aws or production credentials?
- Governance visibility: I know my own tolerance for agent autonomy. A team of five doesn't. How do you enforce consistent policies across agents?
- Cost attribution: Solo development? One bill. Teams? You need to know which project, which agent, which user burned through the budget.
- Observability: Humans can't read logs in real-time. They need dashboard signals that proactively tell them something's wrong.
- Failure recovery: When an agent gets stuck or the network fails, can it resume gracefully? Or does someone lose three hours of work?
Building for one is fundamentally different from building for many. Here's what changed.
Security & Isolation: Each Agent Gets a Sandbox
Autonomous agents wielding code execution tools are powerful and dangerous. Without isolation, one misconfigured agent can corrupt another's work—or worse, touch production data. That can't happen.
Gluon's security model is defense-in-depth. Each agent runs in an isolated OS-level sandbox with three enforced boundaries:
Filesystem sandboxing via bubblewrap (Linux) or sandbox-exec (macOS) restricts agent access to a git worktree—the specific git branch created for that task. Agents can't escape the sandbox and touch ~/.aws, ~/.ssh, or your home directory. They work within their assigned desk. Period.
PUID/PGID support ensures agents inherit your host user permissions, not root. This is critical for Docker deployments. If an agent needs to run npm install in your project's node_modules, it can—because it's running as you, not as an omnipotent root user. Single layer of defense against privilege escalation.
Scoped volume mounts define exactly what the container can access:
~/.claude(read-write) — Claude CLI credentials~/.gluon(read-write) — Database, logs, images~/workspaces(read-write) — Project source code~/.aws(read-only) — AWS credentials for Bedrock API calls~/.config/gh(read-only) — GitHub CLI configuration
Everything else is off-limits. No access to system binaries beyond what's in the container. No access to your personal documents.
Resource limits cap CPU and memory per agent: 8 CPU cores and 12 GB RAM by default (configurable per workspace). Prevents runaway agents from melting your infrastructure.

The result: a security model that enterprise teams actually need. When your engineers are coordinating agents instead of writing code directly, you must guarantee agents can't interfere with each other or access what they shouldn't.
Agent Teams & Parallel Coordination
Once agents are isolated, the next problem emerges: how do you coordinate multiple agents on the same task?
Enter Claude Code's Agent Teams capability. This is native to the Claude Agent SDK—a lead agent spawns multiple subagents concurrently, each working on distinct subtasks in parallel, then synthesizes the results.
Say you need to implement a feature: API endpoint, database schema, frontend form, and tests. Instead of one agent sequentially implementing each piece (and potentially introducing inconsistencies), you spawn four subagents:
- Subagent 1: Design and implement the API endpoint
- Subagent 2: Create database schema and migrations
- Subagent 3: Build frontend form with validation
- Subagent 4: Write comprehensive tests
They work in parallel. Gluon's SubagentTracker monitors start/stop events in real-time via agent hooks. The lead agent (running Opus for reasoning) synthesizes the results, validates consistency, and surfaces conflicts.
The win: a feature that previously took one agent 4-6 hours now completes in 1-2 hours with higher quality. Parallelism at the agent level.
Best practice: structure prompts with 2-5 distinct subtasks, mention shared files subagents should reference, and end with explicit synthesis instructions.


Work Queue & Merge Queue: Task Orchestration
Coordinating multiple agents in parallel is one problem. Coordinating human workflows around those agents is another.
The work queue solves the "too many agents fighting for attention" chaos. Queue 10 bugs Monday morning. Gluon dispatches them across the week, respecting rate limits and cost caps. No babysitting. Items are batched, prioritized, and pushed to available slots in real-time. WebSocket updates keep the dashboard live.
The merge queue tackles a specific pain point: coordinating PR merges. Agents generate pull requests. Multiple agents might touch overlapping files. Conflicts are inevitable. Traditional CI/CD blocks merges until someone manually rebases. Gluon's merge queue processes PRs sequentially with conflict detection. It shows exactly which files collided. One-click AI conflict resolution runs Claude on the rebase, resolving conflicts programmatically.
The workflow: queue → dispatch → running → review → done. Humans set policy. Agents execute. No context switching. No manual rebase drudgery.
Observability: The Witness Health Monitor
With work flowing through queues, teams need visibility into what's happening in real-time. But humans can't read logs. They need signals.
Enter the Witness Health Monitor. Background process that classifies running agents into five states:
- Healthy: Normal progress, files changing, iteration advancing
- Slow: Making progress but below expected throughput
- Looping: Repeating similar actions, no progress
- Stuck: No file changes for five consecutive iterations
- Zombie: Process alive but unresponsive
Each classification appears as a colored dot on task cards in the Kanban board: green, yellow, orange, red, gray. At a glance, you know which agents are humming along and which ones need attention.
Why is this necessary? Because 95% of AI pilots fail in production. Not because AI sucks. Because humans get overwhelmed by thousands of daily approvals and log messages. Alert fatigue leads to "auto-approve" modes—which reintroduces the risk you were trying to avoid. Witness turns that chaos into five colored dots and actionable signals.
Natural Language Interfaces: From Terminal to Anywhere
Dashboards are great when you're at your desk. But teams live in Slack, Discord, and Telegram now. Gluon's chat bots bring the full orchestrator to natural language.
Telegram and Discord bots speak English (or whatever language you prefer). Behind them: Claude reasoning plus 40+ MCP tools covering project management, git operations, run management, work queue, merge queue, and system admin. Model selection via flags: --model opus for reasoning-heavy tasks, --model haiku for quick answers.
Real-world example: You're in a meeting. Someone says, "Stop that slow bug-fix agent and resume it with more aggressive search." Flip to Telegram, type "Cancel run bugfix-12, resume with more context," and Gluon handles it. No SSH. No terminal. No specialized knowledge.
The Witness colors appear in the chat. Cost tracking is available. Task creation, status checks, and conflict resolution all flow through the same interface.
Gluon is also a Progressive Web App (PWA). Install it on your phone. Full mobile dashboard. Tailscale tunnel for secure remote access. Monitor Ralph loops from a coffee shop. Cost caps, health indicators, and cancel buttons at your fingertips.

The accessibility shift matters. The biggest unlock isn't the technology. It's that anyone on the team can manage agents without being a terminal expert.
The Governance Gap: Humans Are the Constraint
Here's where the narrative shifts. All these features—isolation, coordination, visibility, chat interfaces—address a single root problem: humans can't scale at the rate of AI.
The projection is stark. Forty-five billion non-human agent identities by end-2026. Only 10% of organizations have governance strategies. That's liability amplification waiting to happen.
The problem isn't AI capability. It's human capacity to oversee, verify, and make decisions about autonomous agents.
Gluon's answer is explicit governance.
Supervision policies define auto-resume behavior per task—from aggressive (minimal checks, fast turnaround) through conservative (the default) to fully manual. Post 3 covered the details; the point here is that teams need consistent policies across all their agents, not ad-hoc decisions per developer.
Circuit breakers (the 3-state pattern from Post 3) stop runaway loops before they drain budgets. At team scale, this is non-negotiable—you can't rely on someone watching every agent.
100% audit logging records every decision, every cost, every tool call. Compliance and accountability are built-in.
Cost visibility tracks token spend, API calls, and cost-per-run. Cost caps are enforced. Agents with runaway expenses don't surprise you with a $5,000 bill.
And this is critical: explicit exit signals. Ralph's design includes dual-condition checks: both a COMPLETE status AND an explicit EXIT_SIGNAL flag. Two conditions, not one. Because if you only check for "completion," an agent can get stuck in a loop claiming victory.
That leash we discussed in Post 3? At team scale, it becomes governance infrastructure. Gluon's supervision system is the rope—and every team member holds it the same way.
The Role Shift: From Code Writer to Orchestrator
The future of software engineering isn't writing code. It's orchestrating AI agents. This isn't theoretical—it's already enterprise reality, and the workflow shift I described in Post 1 is accelerating.
The skills shift accordingly:
- Advanced prompt engineering: Phrasing tasks so agents understand intent
- Systemic thinking: Designing agent workflows across multiple specialists
- PromptOps: Versioning prompts, monitoring agent behavior, tuning for quality
- Supervision design: Setting policies, guardrails, and exit conditions
Over half of companies expect to use AI orchestration by 2026. The market is signaling this change.
Gluon exists to make this pattern production-ready—not just for enterprises with 500-person engineering teams, but for 5-person startups and solo developers coordinating agents across projects.
Full Circle: From Tmux to Team Infrastructure
Now, back to where this journey began.
Post 1 opened with a visceral problem: monitoring Ralph loops via tmux SSH on a home Mac mini. Fragile. Opaque. No coordination. No safety nets. Visibility gaps meant losing context when a session died.
Today's infrastructure solves every one of those pain points:
Visibility: tmux logs → unified dashboard with streaming updates, health indicators, cost tracking, and real-time WebSocket feeds.
Coordination: serial tmux windows → parallel agent teams spawning subagents, work queues batching tasks, merge queues coordinating PR integration.
Governance: manual ad-hoc decisions → explicit supervision policies, circuit breaker safety nets, 100% audit trails, cost caps, dual-condition exit signals.
Accessibility: terminal expertise required → chat bots in Discord and Telegram, PWA on your phone, Tailscale for secure remote access.
The architecture has grown, but it's designed for the same principle: humans stay in control. Claude agents do the work. Oversight remains human.
Gluon runs on a Mac mini today. The architecture, though, is designed for cloud: Kubernetes, AWS ECS, multi-region failover. That's next.
The Horizon: Where This Is Going
This is the fourth and final post in the Gluon series. But it's really the beginning.
The future isn't about smarter AI. It's about better orchestration. Every engineering team running Claude agents today needs an orchestrator like Gluon. We built ours. Here's what we learned.
Gluon is open source under the MIT license: github.com/carrotly-ai/gluon-agent. Version 0.8.0, Python 3.12+, Docker-deployable, 80+ REST endpoints, 40+ chat tools, 50+ CLI commands. We didn't want to build this behind a SaaS paywall. Teams should own their orchestrator.
If you're running Claude agents today, fork Gluon. Modify it. Make it yours. The agent-orchestrator pattern is foundational infrastructure for the next era of software engineering.
The tmux chaos of three months ago feels ancient history. Today's production architecture for autonomous agents is fundamentally different: secure, observable, governed, team-ready.
That's the evolution from personal tool to team infrastructure.
Series Navigation
- Post 1: From tmux Chaos to AI Agent Orchestration
- Post 2: Inside the Cockpit
- Post 3: Ralph Loop — Autonomous Execution
- Post 4: From Solo Tool to Team Infrastructure (you are here)
The Hidden Arsenal: How My Dotfiles Unlocked 10x Productivity with AI Coding Assistants
After 12 months of systematic optimization, I've documented 50-70% productivity gains with AI coding assistants. The secret isn't just using AI tools—it's teaching them to think like you do through carefully crafted configurations.
Building Agentic Deep Research Systems: From Hours to Minutes with AI-Powered Document Generation
How multi-agent AI systems are revolutionizing document creation, turning complex research workflows into automated pipelines that generate comprehensive reports with rich formatting, citations, and visuals.
Scraper MCP: Context-Efficient Web Scraping for LLMs
I built an open-source MCP server that reduces LLM token usage by 70-90% through server-side HTML filtering, markdown conversion, and CSS selector targeting. Here's why context efficiency matters—and how Scraper MCP solves it.