LLM Routing Strategy: The Right Model for Every Task

A single-agent system can get away with running everything on one model. A multi-agent system with dozens of tasks running daily cannot. The cost difference between smart routing and naive routing is enormous — and the quality difference is often zero.

This framework covers how to select the right model for every task type, implement dynamic routing in your agents, and keep daily API costs under control.

The Cost Reality

Let’s start with numbers. Current approximate pricing per million tokens (input/output):

Model	Input	Output	Relative Cost (vs Haiku)
Claude Opus 4.6	$15	$75	~50x
Claude Sonnet 4.6	$3	$15	~10x
Claude Sonnet 4.5	$3	$15	~10x
Claude Haiku 4.5	$0.80	$4	1x (baseline)
Kimi for Coding	~$0.30	~$1.50	~0.3x
Gemini 2.5 Pro	$1.25	$10	~4x
GPT-4o	$5	$15	~12x

A monitoring cron running every 30 minutes on Opus costs ~167x more per day than the same cron on Haiku. For a task that’s literally pattern-matching, you’re burning money with no quality gain.

The Core Routing Decision Tree

Start here for every new task:

Is this a CEO strategic decision, complex reasoning, or quality judgment?
  → YES: Claude Opus 4.6
  → NO: ↓

Is this orchestration, delegation, or domain-agent execution?
  → YES: Claude Sonnet 4.6
  → NO: ↓

Is this content creation, code writing, or analysis?
  → YES: Claude Sonnet 4.5
  → NO: ↓

Is this a large codebase task (>50K tokens of code context)?
  → YES: Kimi for Coding (128K context window, fraction of Sonnet cost)
  → NO: ↓

Is this monitoring, scanning, pattern matching, or simple lookup?
  → YES: Claude Haiku 4.5 ← Use this by default for workers
  → NO: ↓

Does this require web search grounding or real-time information?
  → YES: Gemini 2.5 Pro
  → NO: ↓

Does this require image analysis?
  → YES: GPT-4o
  → NO: Claude Sonnet 4.5 (safe default)

Model Sweet Spots

Each model has a domain where it’s the clear winner:

Claude Opus 4.6

Sweet spot: Strategic reasoning, quality judgment, complex multi-step orchestration.

Use when: CEO making delegation decisions, evaluating complex tradeoffs, synthesizing cross-domain information, deciding “is this good enough to show the human?”

Don’t use when: Executing well-defined tasks, monitoring, scanning, or anything a Haiku can do.

Claude Sonnet 4.6

Sweet spot: Orchestration execution, sophisticated reasoning, good at following complex instructions.

Use when: C-suite orchestrators planning work, drafting high-quality content, code review and analysis.

Don’t use when: Simple pattern matching, monitoring tasks, high-volume scanning.

Claude Haiku 4.5

Sweet spot: High-volume, well-defined tasks. Speed and cost efficiency.

Use when: Monitoring crons, content curation, classification tasks, simple data extraction, basic lookups.

Don’t use when: Complex reasoning, nuanced writing, strategic decisions.

Kimi for Coding

Sweet spot: Large codebase tasks requiring extensive context.

Use when: Refactoring large files, reviewing entire codebases, tasks that need 50K+ tokens of code context.

Don’t use when: Small code tasks where Haiku suffices, or reasoning-heavy tasks where Sonnet is needed.

Routing Configuration

Implement routing as a JSON config your agents read at spawn time:

{
  "model_routing": {
    "strategic_decision": "anthropic/claude-opus-4-6",
    "orchestration": "anthropic/claude-sonnet-4-6",
    "code_generation": "anthropic/claude-sonnet-4-6",
    "code_review": "anthropic/claude-sonnet-4-5",
    "large_context_coding": "kimi-coding/kimi-for-coding",
    "content_creation": "anthropic/claude-sonnet-4-5",
    "research_synthesis": "anthropic/claude-sonnet-4-5",
    "monitoring": "anthropic/claude-haiku-4-5",
    "curation": "anthropic/claude-haiku-4-5",
    "classification": "anthropic/claude-haiku-4-5",
    "web_grounded_search": "google/gemini-2.5-pro",
    "image_analysis": "openai/gpt-4o",
    "fallback": "anthropic/claude-sonnet-4-5"
  }
}

When a CEO spawns a coding worker, it passes the code_generation model key. The worker looks up the routing config and uses claude-sonnet-4-6. If tomorrow Haiku gets better at coding, update the config — all agents immediately use the new model.

Dynamic Model Selection in Practice

Here’s how the CEO spawns agents with dynamic model selection:

# CEO spawning a research task → routes to Haiku
sessions_spawn researcher \
  --task "Find the top 5 AI agent frameworks published in 2026" \
  --model $(jq -r '.model_routing.monitoring' ~/clawd/model-routing.json) \
  --timeout 5m

# CEO spawning a content draft → routes to Sonnet
sessions_spawn communicator \
  --task "Draft LinkedIn post about C-suite architecture" \
  --model $(jq -r '.model_routing.content_creation' ~/clawd/model-routing.json) \
  --timeout 3m

# CEO spawning a coding task → routes to Sonnet or Kimi based on context size
CODE_CONTEXT_TOKENS=$(estimate_token_count ~/projects/nerve-center-v2/src/)
if [ "$CODE_CONTEXT_TOKENS" -gt 50000 ]; then
  MODEL=$(jq -r '.model_routing.large_context_coding' ~/clawd/model-routing.json)
else
  MODEL=$(jq -r '.model_routing.code_generation' ~/clawd/model-routing.json)
fi

sessions_spawn coding-worker \
  --task "Add CIO panel to dashboard" \
  --model "$MODEL" \
  --timeout 15m

Agent-Level Model Defaults

The CEO orchestrator and each domain agent have a default model for their own reasoning, plus can spawn workers on different models:

Agent	Own Model	Workers Spawned On
CEO / Orchestrator	Opus	Haiku (monitoring), Sonnet (complex tasks), Kimi (large code)
Domain agents (school, family, health, etc.)	Sonnet	Haiku (routine checks), Sonnet (analysis), Kimi (large-context coding)
Specialist workers	Haiku or Sonnet	N/A — ephemeral

The pattern: orchestrators use Sonnet, workers default to Haiku, complex tasks escalate to Sonnet.

Cost Optimization Patterns

Pattern 1: Haiku-First for Crons

All scheduled monitoring tasks start with Haiku:

crons:
  - name: cron-health-check
    schedule: "*/30 * * * *"
    model: "anthropic/claude-haiku-4-5"  # 48x/day — must be Haiku

  - name: content-curation
    schedule: "0 6,12,18 * * *"
    model: "anthropic/claude-haiku-4-5"  # 3x/day — Haiku for pattern matching

  - name: morning-brief
    schedule: "0 7 * * *"
    model: "anthropic/claude-sonnet-4-6"  # Once/day — worth Sonnet for quality

Pattern 2: Escalate on Complexity

Start cheap, escalate only if needed:

# Pseudo-code: try Haiku first, escalate if task is complex
def spawn_coding_worker(task, codebase_size_tokens):
    if codebase_size_tokens > 100_000:
        model = "kimi-coding/kimi-for-coding"
    elif task.is_complex():
        model = "anthropic/claude-sonnet-4-6"
    else:
        model = "anthropic/claude-haiku-4-5"

    return sessions_spawn("coding-worker", task=task, model=model)

Pattern 3: Cache Results

For intelligence tasks (arXiv, HN scanning), don’t re-run if results are fresh:

# CIO arXiv scanner — check if today's scan already ran
ARTIFACT="~/clawd/agents/cio/artifacts/arxiv-$(date +%Y-%m-%d).json"
if [ -f "$ARTIFACT" ]; then
  echo "Already scanned today. Skipping."
  exit 0
fi

# Run scan with Haiku
sessions_spawn arxiv-scanner --model claude-haiku-4-5 ...

Pattern 4: Batch Small Tasks

Instead of spawning 5 separate Haiku agents for 5 small tasks, batch them into one:

# Inefficient — 5 separate spawns
Spawn 1: Check cron health
Spawn 2: Check deploy log
Spawn 3: Check skill registry
Spawn 4: Check build pipeline
Spawn 5: Update fleet health

# Efficient — one Haiku agent does all 5
Spawn 1: "Check cron health, deploy log, skill registry, build pipeline, and update fleet health JSON"

Real-World Cost Targets

For a domain-based agent system running 24/7:

Component	Model	Frequency	Est. Daily Cost
CEO/Orchestrator heartbeat	Opus	~48x/day	$1.50
Domain heartbeats (8 domains)	Sonnet	8x/day	$0.80
Morning brief synthesis	Sonnet	1x/day	$0.10
Cron monitors (health, stale-check)	Haiku	many/day	$0.05–0.10
Content/growth agents	Haiku + Sonnet	3x/day	$0.15–0.25
Ad hoc coding tasks	Sonnet/Haiku	varies	$1–3/day
Total (moderate usage)			~$4–8/day

The key: Haiku for everything repetitive, Sonnet for execution, Opus only for the orchestrator.

Model-Capability Matrix

Quick reference for model selection decisions:

Capability	Haiku	Sonnet	Opus	Kimi
Pattern matching	✅ Best	✅	✅	❌
Simple extraction	✅ Best	✅	✅	❌
Content creation	⚠️ Weak	✅ Best	✅	❌
Code generation	⚠️ Simple only	✅ Best	✅	✅ Large context
Strategic reasoning	❌	⚠️	✅ Best	❌
Complex orchestration	❌	✅	✅ Best	❌
Quality judgment	❌	⚠️	✅ Best	❌
Large context (>50K)	❌	⚠️	⚠️	✅ Best
Speed	✅ Best	✅	⚠️ Slow	⚠️
Cost	✅ Cheapest	⚠️	❌ Expensive	✅ Very cheap

Future-Proofing Your Routing

Model pricing and capabilities change rapidly. In 2026, today’s Sonnet will be tomorrow’s Haiku. Build your routing to be configuration-driven, not hardcoded:

Never hardcode model names in agent code — always reference a config key
Update routing config monthly — reassess as new models release
Track model performance — did Haiku start handling tasks Sonnet used to need?
Watch cost trends — new model releases often drop prices on existing tiers

The routing config should be a living document, reviewed with the same frequency as your tech radar.

Related: Organizational Charter — The governance rules including R5: Cost Awareness.

About the author: JD Davenport builds AI agent systems at OpenClaw. Follow on LinkedIn for updates on building AI agents for business.