What I Learned Building a Multi-Agent Stack
I’ve built and deployed dozens of autonomous AI agents on a Mac Mini sitting in my apartment in Provo, Utah.
Not demos. Not toy projects. Agents that wake up at 4 AM, check my workouts, prepare class notes, send Telegram messages, manage my calendar, scout job opportunities, and route tasks between each other — every single day.
Here’s what I actually learned.
1. Most Agents Fail Because of Prompt Discipline, Not Code
Section titled “1. Most Agents Fail Because of Prompt Discipline, Not Code”The number one killer of agents isn’t a missing feature or a buggy API call. It’s a bad system prompt.
When an agent doesn’t know:
- Who it is (identity)
- What it owns (domain)
- What it must never do (hard limits)
- How to communicate with others (output format)
…it hallucinates its way into chaos. I’ve had agents that started inventing calendar events, agents that rewrote each other’s memory files, and agents that got into infinite loop conversations with themselves.
The fix is always the same: SOUL.md. Every agent in my stack has one. It defines:
## Who You Are## What You Own## Hard Limits — Never Do These## Communication Style## Memory ProtocolBefore your agent can do anything useful, it needs to know what it is.
2. Memory Is the Feature You Don’t Build Until You Need It
Section titled “2. Memory Is the Feature You Don’t Build Until You Need It”Every agent tutorial shows you how to make an API call. None of them show you how to make the second call remember what happened in the first.
My memory architecture evolved through three stages:
Stage 1: No memory. Every session starts fresh. Works for task runners. Fails for anything relationship-based.
Stage 2: Append-only logs. Every agent writes to a markdown file. Reads it back at session start. Works until the file gets too big and starts confusing the model.
Stage 3: Structured + rolling. Now I have:
MEMORY.md— permanent facts, preferences, key decisionsmemory/YYYY-MM-DD.md— daily session logs, auto-rolled after 7 daysstate/current.md— active work, in-progress tasks, blocked itemsstate/tech-debt.md— named debt items with age and priority
The discipline: agents write to the right file, not just whatever’s convenient.
3. Orchestration Is Harder Than Building Individual Agents
Section titled “3. Orchestration Is Harder Than Building Individual Agents”My CEO agent spawns sub-agents. My CTO spawns coding agents. My COO spawns research agents.
The problem nobody tells you about: sub-agents don’t automatically know the context their parent had.
I had to build explicit context-passing. When CEO spawns CTO with a task, it passes:
- Task brief
- Success criteria
- Relevant memory snippets
- Which files to read on startup
Without this, you get agents that re-ask questions already answered, contradict decisions already made, or build things that conflict with existing architecture.
The pattern that works:
Task: [Specific goal]Context: [What you need to know about the current situation]Success criteria: [How you'll know when you're done]Read first: [Files they must load before starting]Report to: [Who they report back to]4. Cron Jobs Are Your Production Infrastructure
Section titled “4. Cron Jobs Are Your Production Infrastructure”People think cron jobs are boring. In an agent system, they’re your heartbeat.
My stack has 47 scheduled jobs running across 8 agents. These aren’t just “check email” jobs. They’re:
- CEO morning briefing: synthesizes overnight events, queues priority tasks
- CTO health check: scans GitHub CI, cron status, deploy logs, alerts if anything is broken
- COO evening wrap: closes open loops, logs the day, prepares tomorrow
- Health coach: checks workout completion, adjusts tomorrow’s plan
- Spiritual journal: prompts reflection, saves to Obsidian
- CMO curation: finds top AI content, prepares LinkedIn post queue
When cron breaks, agents stop being proactive. They become reactive. The magic dies.
Monitor your cron jobs obsessively.
5. The 3-Tier Cost Model
Section titled “5. The 3-Tier Cost Model”Not all agents are equal. I learned to think in three tiers:
| Tier | Model | Use Case | Cost |
|---|---|---|---|
| Heavy | Claude Sonnet | Complex reasoning, orchestration, code review | ~$15/M tokens |
| Light | Claude Haiku | Routine tasks, data extraction, simple decisions | ~$0.25/M tokens |
| Free | Local scripts | File operations, CLI commands, data transforms | $0 |
My rule: start with Haiku. Only upgrade to Sonnet if the output quality genuinely requires it.
Most agents I’ve built can do 80% of their work with Haiku. The expensive model is for the 20% that needs real reasoning.
6. Agents Need Accountability
Section titled “6. Agents Need Accountability”This sounds obvious. It’s not.
I have a task board (~/clawd/shared/tasks/board.json) that every agent reads at session start. CEO assigns tasks. Agents claim them, work them, mark them done.
Without accountability:
- Agents duplicate work
- Tasks fall through cracks
- Nobody knows what’s actually in progress
With accountability:
- Every task has an owner
- Every session starts with “what’s assigned to me?”
- CEO can see the full picture
The task board is 80% of what makes my agent system feel like an actual organization and not chaos.
7. What I’d Do Differently
Section titled “7. What I’d Do Differently”Start with fewer, better agents. I built a lot because I was experimenting. For production use, 5-10 well-defined agents with clear domains beats dozens of agents with overlapping responsibilities.
Define handoffs first. Before writing a single prompt, draw the org chart. Who talks to whom. What formats they use. What they never do.
Test with real data from day one. Agents that work with fake data fail with real data 40% of the time. Flush out the edge cases early.
Build the monitoring first. I built many agents before I built proper health monitoring. That was backwards. Build cto-status.json, the deploy log, the cron health dashboard before you build the agents that depend on them.
The Full Stack (Simplified)
Section titled “The Full Stack (Simplified)”CEO (orchestration, task routing, strategic decisions)├── CTO (code, deploys, infrastructure, skills)│ ├── Coding Agent (implementation)│ ├── QA Agent (testing)│ └── Deploy Agent (Vercel, environment)├── COO (daily ops, family, home, schedule)│ └── Research Agent (web lookups, data)├── CMO (content, LinkedIn, Twitter, growth)├── CIO (intelligence, learning, research)└── Health Coach (workouts, nutrition, sleep, habits)Each agent has: SOUL.md, AGENTS.md, memory/, state/, and a heartbeat cron.
The system is live. It runs 24/7. It breaks sometimes. I fix it. It gets better.
That’s the build-in-public part nobody talks about.
Want to build your own agent stack? Start with the Agent Tree Architecture guide or build your first agent in 30 minutes.
For consulting on agent architecture for your business, reach out at jddavenport.com.