Skip to content
Subscribe

Model Router Architecture

Status: Production

The model router classifies incoming tasks and routes them to the most cost-effective LLM. It enforces a $15/day budget target and auto-downgrades when spend exceeds 80%.

TierModelsUse CaseCost (per 1M tokens)
Worker (Haiku)Claude Haiku 3.5, Ollama qwen2.5:7bSimple lookups, formatting, file ops$0.25-1.00 / free
Specialist (Sonnet)Claude Sonnet, Kimi K2.5Coding, analysis, research, most tasks$3.00 / $0.38-0.60
Executive (Opus)Claude OpusCreative writing, complex reasoning, strategy$15.00
  • 4.1x cheaper than Sonnet, 5x slower
  • Quality competitive (8.8 vs 9.1 benchmark score)
  • Strong at coding (76.8% SWE-Bench), weak on hallucination
  • Best for: analytics, QA pipelines, research observations
  • Keep Sonnet for: creative writing, tutoring, nuanced tasks
1. Task text analyzed for keywords and complexity signals
2. Classified into tier (worker / specialist / executive)
3. Budget check against daily spend
4. If >80% budget consumed → auto-downgrade one tier
5. Route to cheapest available model in tier
  • State file: state/budget.json
  • Daily target: $15
  • Nightly summary: track-costs.sh runs at 11 PM, sends Telegram summary
  • Per-request costs: logged in routing.log with estimated_cost_usd field

A higher-level router directs traffic across 3 systems: Agent System tools, OpenClaw agents, and Claude Code sessions.

  • Weighted keyword scoring with multi-word bonus
  • Health-aware OpenClaw routing (30s cache, demotion when gateway is down)
  • Handles retired agent redirection (6 OC agents mapped to AS equivalents)
  • 50-case test suite: 100% accuracy, under 15ms latency