Model Router Architecture

Status: Production

Overview

The model router classifies incoming tasks and routes them to the most cost-effective LLM. It enforces a $15/day budget target and auto-downgrades when spend exceeds 80%.

Model Tiers

Tier	Models	Use Case	Cost (per 1M tokens)
Worker (Haiku)	Claude Haiku 3.5, Ollama qwen2.5:7b	Simple lookups, formatting, file ops	$0.25-1.00 / free
Specialist (Sonnet)	Claude Sonnet, Kimi K2.5	Coding, analysis, research, most tasks	$3.00 / $0.38-0.60
Executive (Opus)	Claude Opus	Creative writing, complex reasoning, strategy	$15.00

Kimi K2.5 Optimization

Approximately 4–5x cheaper than Sonnet at the time of integration (2026-04-06; verify current pricing)
5x slower than Sonnet for many tasks
Quality competitive for structured/coding tasks; check provider’s published benchmarks for current numbers
Strong for: analytics, QA pipelines, research observations
Keep Sonnet for: creative writing, tutoring, nuanced tasks where hallucination risk is higher

Routing Logic

1. Task text analyzed for keywords and complexity signals
2. Classified into tier (worker / specialist / executive)
3. Budget check against daily spend
4. If >80% budget consumed → auto-downgrade one tier
5. Route to cheapest available model in tier

Budget Tracking

State file: state/budget.json
Daily target: $15
Nightly summary: track-costs.sh runs at 11 PM, sends Telegram summary
Per-request costs: logged in routing.log with estimated_cost_usd field

Unified Router

A higher-level router directs traffic across 3 systems: Agent System tools, OpenClaw agents, and Claude Code sessions.

Weighted keyword scoring with multi-word bonus
Health-aware OpenClaw routing (30s cache, demotion when gateway is down)
Handles retired agent redirection (6 OC agents mapped to AS equivalents)
50-case test suite: passes cleanly; routing decisions are sub-second in practice