Orchestration: CEO → Domains → Specialists

Tier 2 · Building 8 min read

Before this, read:

Anatomy of an agent — what each node in this hierarchy is
Spawning sub-agents — how the spawning mechanics work

Running one agent that does everything is the same mistake as writing one file that does everything. It works until it doesn’t. The orchestration model described here is how a system scales to ~31 specialist agents without the whole thing becoming an unmanageable tangle.

The three tiers

The JD system runs a three-tier hierarchy, codified in ~/clawd/ORGANIZATION-CHARTER.md:

Tier 1 — CEO / Orchestrator (Opus-class model). The single top-level brain. It holds the work queue, routes tasks to the right domain or specialist, and gates actions that are irreversible or above some cost threshold. It runs via agents/chief_of_staff and is the only agent that speaks for JD on high-stakes decisions.

Tier 2 — Domain agents (Sonnet-class). One per life domain: school, family, growth, ai-foundry, siemens, consulting, health, life-ops. Each is a persistent Claude Code session — it runs a heartbeat every hour, writes a structured report.md to ~/clawd/domains/<domain>/state/, and answers questions about its domain. Domain agents know their domain deeply. They don’t know about other domains.

Tier 3 — Specialists (~31 packages, Sonnet or Haiku by task). The agents in ~/agent-system/agents/. Each does one thing: the researcher does research, the professor agent answers course questions, the QA agent runs Playwright tests. They’re invoked by the CEO or domain agents and report back.

The principle behind this structure: route by scope, use the cheapest model that can do the job. A routine health check doesn’t need Opus. A decision about whether to merge code with breaking tests does.

Model routing by stake

The ORGANIZATION-CHARTER specifies model selection by decision stake, not by personal preference:

Opus — CEO, architecture decisions, high-stakes routing
Sonnet — orchestrators, domain agents, most specialist work
Haiku — workers, formatters, cheap-and-fast tasks with a clear output spec

“Cheapest model that does the job” is not a cost-saving measure — it’s a discipline that keeps expensive models free for work that actually needs them. When a Haiku task is sent to Opus, the Opus context fills with trivial work and can’t hold the high-stakes reasoning it was reserved for.

ask_agent: the messaging layer

Any agent can talk to any other by name. The implementation lives at ~/agent-system/agents/shared/ask_agent.py:

from agents.shared.ask_agent import ask

# From any agent — ask the CEO for direction
result = ask("ceo", "Should this task go to the ai-foundry domain or the growth domain?", caller="my-agent")

# Ask the health expert a question (in-process, no bridge round-trip)
result = ask("health-expert", "What does the research say about BPC-157 + tesamorelin stacking?")

# Ask the merch expert about pricing
result = ask("merch-expert", "What's the current break-even for a Gildan 5000 at $24.99 retail?")

Or from the command line:

python3.12 -m agents.shared.ask_agent health "Summarize JD's health metrics from this week." --caller ceo

Internally, ask_agent routes differently depending on the target:

KB-backed agents (health-expert, merch-expert, professor) get called in-process via their own ask() function — no bridge round-trip, lower latency, lower cost
Domain agents (health, family, growth, etc.) route through the bridge, which picks up the conversation in the live domain session
The CEO (ceo, chief_of_staff) routes through the bridge to the persistent CEO session

The result is a typed string (the agent’s response). The caller can use it directly, write it to a file, or forward it to another agent.

The CEO work-queue loop

The CEO agent doesn’t just answer questions — it runs a bounded autonomous executor. Shipped 2026-06-08 as part of the “loop trio,” agents/chief_of_staff --work-queue operates on a queue of approved-but-unbuilt tasks:

Claims the next task from the queue
Checks whether the task is reversible or irreversible
For reversible work: executes autonomously
For irreversible, comms-as-JD, money, or deployment tasks: escalates to JD via Telegram before acting
On completion, marks the task done and claims the next one

The deny-first gate on irreversible actions is non-negotiable. The CEO never sends as JD, never moves money, and never deploys without explicit approval, regardless of how confident it is about the right answer.*

* Human approval is required for all money moves, sends-as-JD, and IP/trademark decisions by hard rule.

How a real routing chain works

Here is a concrete example from the CHANGELOG (2026-06-08): JD sends a Telegram message asking the system to ship three evolution proposals. The chain:

The Telegram inbound router (scripts/telegram-inbound-router.sh) receives the message
The CEO session picks it up and reads the three approved proposals from agents/evolution/proposals.py
For each proposal, the CEO spawns a worktree-isolated build agent with the specific objective
Each build agent works independently, commits to its own branch, and reports back
The CEO merges passing branches, reports the summary to JD

JD sent one message. Seven separate build agents shipped seven distinct fixes the same day. The CEO was the routing layer; the build agents were the workers.

Where the hierarchy lives on disk

~/agent-system/
  agents/
    chief_of_staff/    ← Tier 1 CEO
    ai_os/             ← system-level agents (self-healer, watchers, etc.)
    researcher/        ← Tier 3 specialist
    professor/         ← Tier 3 specialist
    qa_agent/          ← Tier 3 specialist
    health_expert/     ← Tier 3 specialist
    merch_expert/      ← Tier 3 specialist
    ...
  shared/
    ask_agent.py       ← the messaging router

~/clawd/
  ORGANIZATION-CHARTER.md   ← the governing doc
  domains/
    health/state/
    family/state/
    growth/state/
    ...

Next: The heartbeat — how cron jobs give the whole org a steady pulse, even when no one is actively talking to it. Crons as a heartbeat.