What I Learned Building a Multi-Agent Stack

Practitioner 12 min read

I’ve built and deployed ~31 autonomous AI agent packages on a Mac Mini sitting in my apartment in Provo, Utah. (Count as of June 2026; the agents/ directory is the ground truth.)

Not demos. Not toy projects. Agents that wake up at 4 AM, check my workouts, prepare class notes, send Telegram messages, manage my calendar, scout job opportunities, and route tasks between each other — every single day.

Here’s what I actually learned.

1. Most Agents Fail Because of Prompt Discipline, Not Code

The number one killer of agents isn’t a missing feature or a buggy API call. It’s a bad system prompt.

When an agent doesn’t know:

Who it is (identity)
What it owns (domain)
What it must never do (hard limits)
How to communicate with others (output format)

…it hallucinates its way into chaos. I’ve had agents that started inventing calendar events, agents that rewrote each other’s memory files, and agents that got into infinite loop conversations with themselves.

The fix is always the same: SOUL.md. Every agent in my stack has one. It defines:

## Who You Are
## What You Own
## Hard Limits — Never Do These
## Communication Style
## Memory Protocol

Before your agent can do anything useful, it needs to know what it is.

2. Memory Is the Feature You Don’t Build Until You Need It

Every agent tutorial shows you how to make an API call. None of them show you how to make the second call remember what happened in the first.

My memory architecture evolved through three stages:

Stage 1: No memory. Every session starts fresh. Works for task runners. Fails for anything relationship-based.

Stage 2: Append-only logs. Every agent writes to a markdown file. Reads it back at session start. Works until the file gets too big and starts confusing the model.

Stage 3: Structured + rolling. Now I have:

MEMORY.md — permanent facts, preferences, key decisions
memory/YYYY-MM-DD.md — daily session logs, auto-rolled after 7 days
state/current.md — active work, in-progress tasks, blocked items
state/tech-debt.md — named debt items with age and priority

The discipline: agents write to the right file, not just whatever’s convenient.

3. Orchestration Is Harder Than Building Individual Agents

My CEO agent spawns sub-agents. My CTO spawns coding agents. My COO spawns research agents.

The problem nobody tells you about: sub-agents don’t automatically know the context their parent had.

I had to build explicit context-passing. When CEO spawns CTO with a task, it passes:

Task brief
Success criteria
Relevant memory snippets
Which files to read on startup

Without this, you get agents that re-ask questions already answered, contradict decisions already made, or build things that conflict with existing architecture.

The pattern that works:

Task: [Specific goal]
Context: [What you need to know about the current situation]
Success criteria: [How you'll know when you're done]
Read first: [Files they must load before starting]
Report to: [Who they report back to]

4. Cron Jobs Are Your Production Infrastructure

People think cron jobs are boring. In an agent system, they’re your heartbeat.

My stack has ~160 scheduled jobs across all domains as of mid-2026. These aren’t just “check email” jobs. They’re:

CEO morning briefing: synthesizes overnight events, queues priority tasks
CTO health check: scans GitHub CI, cron status, deploy logs, alerts if anything is broken
COO evening wrap: closes open loops, logs the day, prepares tomorrow
Health coach: checks workout completion, adjusts tomorrow’s plan
Spiritual journal: prompts reflection, saves to Obsidian
CMO curation: finds top AI content, prepares LinkedIn post queue

When cron breaks, agents stop being proactive. They become reactive. The magic dies.

Monitor your cron jobs obsessively.

5. The 3-Tier Cost Model

Not all agents are equal. I learned to think in three tiers:

Tier	Model	Use Case	Cost
Heavy	Claude Sonnet	Complex reasoning, orchestration, code review	~$15/M tokens
Light	Claude Haiku	Routine tasks, data extraction, simple decisions	~$0.25/M tokens
Free	Local scripts	File operations, CLI commands, data transforms	$0

My rule: start with Haiku. Only upgrade to Sonnet if the output quality genuinely requires it.

Most agents I’ve built can do 80% of their work with Haiku. The expensive model is for the 20% that needs real reasoning.

6. Agents Need Accountability

This sounds obvious. It’s not.

I have a task board (~/clawd/shared/tasks/board.json) that every agent reads at session start. CEO assigns tasks. Agents claim them, work them, mark them done.

Without accountability:

Agents duplicate work
Tasks fall through cracks
Nobody knows what’s actually in progress

With accountability:

Every task has an owner
Every session starts with “what’s assigned to me?”
CEO can see the full picture

The task board is 80% of what makes my agent system feel like an actual organization and not chaos.

7. What I’d Do Differently

Start with fewer, better agents. I built a lot because I was experimenting. For production use, 5-10 well-defined agents with clear domains beats dozens of agents with overlapping responsibilities.

Define handoffs first. Before writing a single prompt, draw the org chart. Who talks to whom. What formats they use. What they never do.

Test with real data from day one. Agents that work with fake data fail with real data 40% of the time. Flush out the edge cases early.

Build the monitoring first. I built many agents before I built proper health monitoring. That was backwards. Build cto-status.json, the deploy log, the cron health dashboard before you build the agents that depend on them.

The Full Stack (Simplified)

CEO Orchestrator (orchestration, task routing, strategic decisions)
├── School domain agent (Canvas LMS, grades, assignments)
├── Family domain agent (calendar, chores, budget)
├── Health domain agent (workouts, nutrition, sleep, habits)
├── Growth domain agent (content, LinkedIn, career)
├── Siemens domain agent (work tasks, reporting)
├── Consulting domain agent (pipeline, proposals)
├── AI Foundry domain agent (research, builds)
└── Life Ops domain agent (admin, logistics)
    └── Specialist workers (research, QA, coding, browser)

Each domain agent has: config.yaml, SOUL.md, state/, and a heartbeat cron. Workers are ephemeral — spawned per task, no persistent workspace.

The system is live. It runs 24/7. It breaks sometimes. I fix it. It gets better.

That’s the build-in-public part nobody talks about.

Want to build your own agent stack? Start with the Agent Tree Architecture guide or build your first agent in 30 minutes.

For consulting on agent architecture for your business, reach out at jddavenport.com.