Model Router Architecture
Status: Production
Overview
Section titled “Overview”The model router classifies incoming tasks and routes them to the most cost-effective LLM. It enforces a $15/day budget target and auto-downgrades when spend exceeds 80%.
Model Tiers
Section titled “Model Tiers”| Tier | Models | Use Case | Cost (per 1M tokens) |
|---|---|---|---|
| Worker (Haiku) | Claude Haiku 3.5, Ollama qwen2.5:7b | Simple lookups, formatting, file ops | $0.25-1.00 / free |
| Specialist (Sonnet) | Claude Sonnet, Kimi K2.5 | Coding, analysis, research, most tasks | $3.00 / $0.38-0.60 |
| Executive (Opus) | Claude Opus | Creative writing, complex reasoning, strategy | $15.00 |
Kimi K2.5 Optimization
Section titled “Kimi K2.5 Optimization”- 4.1x cheaper than Sonnet, 5x slower
- Quality competitive (8.8 vs 9.1 benchmark score)
- Strong at coding (76.8% SWE-Bench), weak on hallucination
- Best for: analytics, QA pipelines, research observations
- Keep Sonnet for: creative writing, tutoring, nuanced tasks
Routing Logic
Section titled “Routing Logic”1. Task text analyzed for keywords and complexity signals2. Classified into tier (worker / specialist / executive)3. Budget check against daily spend4. If >80% budget consumed → auto-downgrade one tier5. Route to cheapest available model in tierBudget Tracking
Section titled “Budget Tracking”- State file:
state/budget.json - Daily target: $15
- Nightly summary:
track-costs.shruns at 11 PM, sends Telegram summary - Per-request costs: logged in
routing.logwithestimated_cost_usdfield
Unified Router
Section titled “Unified Router”A higher-level router directs traffic across 3 systems: Agent System tools, OpenClaw agents, and Claude Code sessions.
- Weighted keyword scoring with multi-word bonus
- Health-aware OpenClaw routing (30s cache, demotion when gateway is down)
- Handles retired agent redirection (6 OC agents mapped to AS equivalents)
- 50-case test suite: 100% accuracy, under 15ms latency