Skip to content
Subscribe

LLM Routing Strategy: The Right Model for Every Task

A single-agent system can get away with running everything on one model. A multi-agent system with dozens of tasks running daily cannot. The cost difference between smart routing and naive routing is enormous — and the quality difference is often zero.

This framework covers how to select the right model for every task type, implement dynamic routing in your agents, and keep daily API costs under control.

Let’s start with numbers. Current approximate pricing per million tokens (input/output):

ModelInputOutputRelative Cost (vs Haiku)
Claude Opus 4.6$15$75~50x
Claude Sonnet 4.6$3$15~10x
Claude Sonnet 4.5$3$15~10x
Claude Haiku 4.5$0.80$41x (baseline)
Kimi for Coding~$0.30~$1.50~0.3x
Gemini 2.5 Pro$1.25$10~4x
GPT-4o$5$15~12x

A monitoring cron running every 30 minutes on Opus costs ~167x more per day than the same cron on Haiku. For a task that’s literally pattern-matching, you’re burning money with no quality gain.

Start here for every new task:

Is this a CEO strategic decision, complex reasoning, or quality judgment?
→ YES: Claude Opus 4.6
→ NO: ↓
Is this orchestration, delegation, or C-suite execution?
→ YES: Claude Sonnet 4.6
→ NO: ↓
Is this content creation, code writing, or analysis?
→ YES: Claude Sonnet 4.5
→ NO: ↓
Is this a large codebase task (>50K tokens of code context)?
→ YES: Kimi for Coding (128K context window, fraction of Sonnet cost)
→ NO: ↓
Is this monitoring, scanning, pattern matching, or simple lookup?
→ YES: Claude Haiku 4.5 ← Use this by default for workers
→ NO: ↓
Does this require web search grounding or real-time information?
→ YES: Gemini 2.5 Pro
→ NO: ↓
Does this require image analysis?
→ YES: GPT-4o
→ NO: Claude Sonnet 4.5 (safe default)

Each model has a domain where it’s the clear winner:

Claude Opus 4.6

Sweet spot: Strategic reasoning, quality judgment, complex multi-step orchestration.

Use when: CEO making delegation decisions, evaluating complex tradeoffs, synthesizing cross-domain information, deciding “is this good enough to show the human?”

Don’t use when: Executing well-defined tasks, monitoring, scanning, or anything a Haiku can do.

Claude Sonnet 4.6

Sweet spot: Orchestration execution, sophisticated reasoning, good at following complex instructions.

Use when: C-suite orchestrators planning work, drafting high-quality content, code review and analysis.

Don’t use when: Simple pattern matching, monitoring tasks, high-volume scanning.

Claude Haiku 4.5

Sweet spot: High-volume, well-defined tasks. Speed and cost efficiency.

Use when: Monitoring crons, content curation, classification tasks, simple data extraction, basic lookups.

Don’t use when: Complex reasoning, nuanced writing, strategic decisions.

Kimi for Coding

Sweet spot: Large codebase tasks requiring extensive context.

Use when: Refactoring large files, reviewing entire codebases, tasks that need 50K+ tokens of code context.

Don’t use when: Small code tasks where Haiku suffices, or reasoning-heavy tasks where Sonnet is needed.

Implement routing as a JSON config your agents read at spawn time:

{
"model_routing": {
"strategic_decision": "anthropic/claude-opus-4-6",
"orchestration": "anthropic/claude-sonnet-4-6",
"code_generation": "anthropic/claude-sonnet-4-6",
"code_review": "anthropic/claude-sonnet-4-5",
"large_context_coding": "kimi-coding/kimi-for-coding",
"content_creation": "anthropic/claude-sonnet-4-5",
"research_synthesis": "anthropic/claude-sonnet-4-5",
"monitoring": "anthropic/claude-haiku-4-5",
"curation": "anthropic/claude-haiku-4-5",
"classification": "anthropic/claude-haiku-4-5",
"web_grounded_search": "google/gemini-2.5-pro",
"image_analysis": "openai/gpt-4o",
"fallback": "anthropic/claude-sonnet-4-5"
}
}

When a CEO spawns a coding worker, it passes the code_generation model key. The worker looks up the routing config and uses claude-sonnet-4-6. If tomorrow Haiku gets better at coding, update the config — all agents immediately use the new model.

Here’s how the CEO spawns agents with dynamic model selection:

Terminal window
# CEO spawning a research task → routes to Haiku
sessions_spawn researcher \
--task "Find the top 5 AI agent frameworks published in 2026" \
--model $(jq -r '.model_routing.monitoring' ~/clawd/model-routing.json) \
--timeout 5m
# CEO spawning a content draft → routes to Sonnet
sessions_spawn communicator \
--task "Draft LinkedIn post about C-suite architecture" \
--model $(jq -r '.model_routing.content_creation' ~/clawd/model-routing.json) \
--timeout 3m
# CEO spawning a coding task → routes to Sonnet or Kimi based on context size
CODE_CONTEXT_TOKENS=$(estimate_token_count ~/projects/nerve-center-v2/src/)
if [ "$CODE_CONTEXT_TOKENS" -gt 50000 ]; then
MODEL=$(jq -r '.model_routing.large_context_coding' ~/clawd/model-routing.json)
else
MODEL=$(jq -r '.model_routing.code_generation' ~/clawd/model-routing.json)
fi
sessions_spawn coding-worker \
--task "Add CIO panel to dashboard" \
--model "$MODEL" \
--timeout 15m

Each C-suite agent has a default model (used for its own reasoning), plus can spawn workers on different models:

AgentOwn ModelWorkers Spawned On
CEOOpus 4.6Haiku (monitoring), Sonnet (complex), Kimi (large code)
COOSonnet 4.6Haiku (health check, family tasks), Sonnet (tutor)
CTOSonnet 4.6Haiku (deploy, infra), Sonnet/Kimi (coding)
CMOSonnet 4.6Haiku (curator, monitor), Sonnet (content)
CIOSonnet 4.5Haiku (all scanners)

The pattern: orchestrators use Sonnet, workers default to Haiku, complex tasks escalate to Sonnet.

All scheduled monitoring tasks start with Haiku:

crons:
- name: cron-health-check
schedule: "*/30 * * * *"
model: "anthropic/claude-haiku-4-5" # 48x/day — must be Haiku
- name: content-curation
schedule: "0 6,12,18 * * *"
model: "anthropic/claude-haiku-4-5" # 3x/day — Haiku for pattern matching
- name: morning-brief
schedule: "0 7 * * *"
model: "anthropic/claude-sonnet-4-6" # Once/day — worth Sonnet for quality

Start cheap, escalate only if needed:

# Pseudo-code: try Haiku first, escalate if task is complex
def spawn_coding_worker(task, codebase_size_tokens):
if codebase_size_tokens > 100_000:
model = "kimi-coding/kimi-for-coding"
elif task.is_complex():
model = "anthropic/claude-sonnet-4-6"
else:
model = "anthropic/claude-haiku-4-5"
return sessions_spawn("coding-worker", task=task, model=model)

For intelligence tasks (arXiv, HN scanning), don’t re-run if results are fresh:

Terminal window
# CIO arXiv scanner — check if today's scan already ran
ARTIFACT="~/clawd/agents/cio/artifacts/arxiv-$(date +%Y-%m-%d).json"
if [ -f "$ARTIFACT" ]; then
echo "Already scanned today. Skipping."
exit 0
fi
# Run scan with Haiku
sessions_spawn arxiv-scanner --model claude-haiku-4-5 ...

Instead of spawning 5 separate Haiku agents for 5 small tasks, batch them into one:

# Inefficient — 5 separate spawns
Spawn 1: Check cron health
Spawn 2: Check deploy log
Spawn 3: Check skill registry
Spawn 4: Check build pipeline
Spawn 5: Update fleet health
# Efficient — one Haiku agent does all 5
Spawn 1: "Check cron health, deploy log, skill registry, build pipeline, and update fleet health JSON"

For a full C-Suite architecture running 24/7:

ComponentModelFrequencyEst. Daily Cost
CEO heartbeat (30m)Opus48x/day$1.50
COO morning briefSonnet1x/day$0.10
COO heartbeatSonnet24x/day$0.75
CTO sprint checkSonnet1x/day$0.10
CTO cron monitorHaiku48x/day$0.05
CMO curationHaiku3x/day$0.03
CMO content draftSonnet1x/day$0.15
CIO intel briefSonnet2x/day$0.20
CIO scannersHaiku8x/day$0.05
Ad hoc coding tasksSonnet/Haikuvaries$1-3/day
Total~$4-7/day

The key: Haiku for everything repetitive, Sonnet for execution, Opus only for the CEO.

Quick reference for model selection decisions:

CapabilityHaikuSonnetOpusKimi
Pattern matching✅ Best
Simple extraction✅ Best
Content creation⚠️ Weak✅ Best
Code generation⚠️ Simple only✅ Best✅ Large context
Strategic reasoning⚠️✅ Best
Complex orchestration✅ Best
Quality judgment⚠️✅ Best
Large context (>50K)⚠️⚠️✅ Best
Speed✅ Best⚠️ Slow⚠️
Cost✅ Cheapest⚠️❌ Expensive✅ Very cheap

Model pricing and capabilities change rapidly. In 2026, today’s Sonnet will be tomorrow’s Haiku. Build your routing to be configuration-driven, not hardcoded:

  1. Never hardcode model names in agent code — always reference a config key
  2. Update routing config monthly — reassess as new models release
  3. Track model performance — did Haiku start handling tasks Sonnet used to need?
  4. Watch cost trends — new model releases often drop prices on existing tiers

The routing config should be a living document, reviewed with the same frequency as your tech radar.


Related: Organizational Charter — The governance rules including R5: Cost Awareness.


About the author: JD Davenport builds AI agent systems at OpenClaw. Follow on LinkedIn for updates on building AI agents for business.