Weekly agent architecture insights + a free cheatsheet.
No spam. Unsubscribe anytime.
LLM Routing Strategy: The Right Model for Every Task
A single-agent system can get away with running everything on one model. A multi-agent system with dozens of tasks running daily cannot. The cost difference between smart routing and naive routing is enormous — and the quality difference is often zero.
This framework covers how to select the right model for every task type, implement dynamic routing in your agents, and keep daily API costs under control.
Let’s start with numbers. Current approximate pricing per million tokens (input/output):
Model
Input
Output
Relative Cost (vs Haiku)
Claude Opus 4.6
$15
$75
~50x
Claude Sonnet 4.6
$3
$15
~10x
Claude Sonnet 4.5
$3
$15
~10x
Claude Haiku 4.5
$0.80
$4
1x (baseline)
Kimi for Coding
~$0.30
~$1.50
~0.3x
Gemini 2.5 Pro
$1.25
$10
~4x
GPT-4o
$5
$15
~12x
A monitoring cron running every 30 minutes on Opus costs ~167x more per day than the same cron on Haiku. For a task that’s literally pattern-matching, you’re burning money with no quality gain.
Use when: CEO making delegation decisions, evaluating complex tradeoffs, synthesizing cross-domain information, deciding “is this good enough to show the human?”
Don’t use when: Executing well-defined tasks, monitoring, scanning, or anything a Haiku can do.
Claude Sonnet 4.6
Sweet spot: Orchestration execution, sophisticated reasoning, good at following complex instructions.
Use when: C-suite orchestrators planning work, drafting high-quality content, code review and analysis.
Don’t use when: Simple pattern matching, monitoring tasks, high-volume scanning.
Claude Haiku 4.5
Sweet spot: High-volume, well-defined tasks. Speed and cost efficiency.
Use when: Monitoring crons, content curation, classification tasks, simple data extraction, basic lookups.
Don’t use when: Complex reasoning, nuanced writing, strategic decisions.
Kimi for Coding
Sweet spot: Large codebase tasks requiring extensive context.
Use when: Refactoring large files, reviewing entire codebases, tasks that need 50K+ tokens of code context.
Don’t use when: Small code tasks where Haiku suffices, or reasoning-heavy tasks where Sonnet is needed.
When a CEO spawns a coding worker, it passes the code_generation model key. The worker looks up the routing config and uses claude-sonnet-4-6. If tomorrow Haiku gets better at coding, update the config — all agents immediately use the new model.
Model pricing and capabilities change rapidly. In 2026, today’s Sonnet will be tomorrow’s Haiku. Build your routing to be configuration-driven, not hardcoded:
Never hardcode model names in agent code — always reference a config key
Update routing config monthly — reassess as new models release
Track model performance — did Haiku start handling tasks Sonnet used to need?
Watch cost trends — new model releases often drop prices on existing tiers
The routing config should be a living document, reviewed with the same frequency as your tech radar.