The Deep-Research Agent
Before this, read:
- Tools 101 — tool calls are the mechanism the research loop runs on
- Anatomy of an agent — the package structure this agent follows
The research agent is one of the oldest pieces of this system. It started as the heart of OpenClaw — the “iterative knowledge-gap loop with 5 subagents” described in the original spec — and survived the pivot to Claude Code intact. Today it lives at agents/researcher/ and runs as the backbone for everything that needs more than a single search to answer: the merch store’s initial market research, the competitive landscape analysis for the cockpit, the growth strategy, the health expert’s KB, and more.
The loop structure
Section titled “The loop structure”The core loop (agents/researcher/core/loop.py) runs a configurable number of iterations. Each iteration:
-
Knowledge-gap agent — given what we know so far, what do we still not know? Returns a list of targeted sub-questions.
-
Tool-selector agent — for each gap, which source is most likely to answer it? Chooses from:
brave_search,github_search,reddit_search,arxiv_search,hn_search,firecrawl_extract,semantic_scholar_search,stackoverflow_search,wikipedia_search,twitter_search. -
Parallel tool execution — runs selected tools against the gap questions, collects results.
-
Observations agent — what did the sources actually say? Builds structured findings.
-
Devil’s advocate — what’s weak, contradicted, or missing in the findings so far?
-
Critique-gap agent — updates the knowledge gap list based on what the devil’s advocate found.
After all iterations, a writer agent synthesizes the findings into a cited report and saves it to Obsidian at ~/obsidian-vault/11-Agents/research/.
The loop also tracks: confidence level, a citation graph, multi-hop reasoning across sources, and cross-source triangulation. Cost is logged per-run via a CostTracker that records which models and tool calls were used.
Depth tiers
Section titled “Depth tiers”Three named tiers configure the loop:
- shallow: 2–3 iterations, limited tool budget, fast (~$0.50–$1)
- standard: 5–6 iterations, broader tool coverage
- deep: 8–12 iterations, full tool suite, cross-source triangulation, multi-hop reasoning (~$5–$10)
The flagship research run on 2026-06-08 (competitive landscape for the cockpit) ran at deep: 370 sources, $7.78. The AI coding agent market research run the same day ran at depth 8, producing the ChatbotToNerveCenter deck.
Run from the CLI:
python3.12 -m agents.researcher.main "what are X/Reddit saying about AI coding agents June 2026" --depth deepThe X and Reddit backends
Section titled “The X and Reddit backends”The research agent’s social coverage expanded significantly on 2026-06-08 when both X and Reddit backends went live.
Reddit: the original reddit_tool.py used a dead API endpoint (PullPush, archived ~May 2025) and a broken sort. The rewrite switched to Reddit’s OAuth API with sort=relevance (the previous sort=new was ignoring the query parameter entirely). The 90-day recency window is configurable via config.social_lookback_days. Source: commit 5acb55d.
X/Twitter: tools/twitter.py uses the twitterapi.io backend (approximately $0.15/1k results). It returns real same-day posts with @handle, date, and likes. The official X API is write-only for this tier; twitterapi.io provides the read path. Source: commit after 5acb55d, same day.
Both backends return results within a trailing 90-day window, sorted newest-first. The tool-selector agent picks between X, Reddit, Brave, arXiv, and the others based on the gap question type — social signal questions go to X/Reddit; academic claims go to arXiv/Semantic Scholar; code questions go to GitHub/StackOverflow.
Checkpointing and resumption
Section titled “Checkpointing and resumption”Long runs (deep tier, 10+ iterations) can exceed a session’s reliable execution window. The loop checkpoints state after each iteration via lib.checkpoint.CheckpointManager. If a run is interrupted — network failure, session rotation, process kill — it resumes from the last completed iteration on the next invocation with the same task_id (deterministic from query + depth hash).
The task ID is:
def _research_task_id(query: str, depth: str) -> str: h = hashlib.sha256(f"{query}::{depth}".encode()).hexdigest()[:16] return f"research-{h}"Same query + same depth → same ID → resumes from the last checkpoint. A new query always starts fresh.
What it’s been used for
Section titled “What it’s been used for”Real runs that produced real artifacts:
- Merch store initial research (2026-06-08): Printful vs Printify stack comparison, IP-safety legal landscape, POD economics. Produced
BUILD-PLAN.md. - Cockpit competitive landscape (2026-06-08, 370 sources, $7.78): NC5 competitive position vs the field. Report at
~/clawd/projects/cockpit-chat-v3/COMPETITIVE-LANDSCAPE-2026-06-08.md. - AI coding agent market research (2026-06-08, 8 iterations): X/Reddit sentiment on Claude Code, Cursor, and GitHub Copilot. Became the source deck for the
ChatbotToNerveCenterpresentation. - Health expert KB (2026-06-06): retatrutide, peptide stacking, GLP-1 muscle retention, extended fasting. ~180KB of cited KB files at
~/clawd/domains/health/state/expert-kb/. - X growth strategy (2026-05): 5 parallel deep-research streams, produced
STRATEGY.md. - Claude Code on your phone (2026-05-27): 307 sources, $4.75, confidence 51/100 (structural solid, specifics thin — the agent said so).
The last example is worth noting: the researcher reports its own confidence score. A run that came back at 51/100 is the agent flagging that the answer is structurally sound but thin on specifics — that’s honest output, not a failed run.
Cost and caching
Section titled “Cost and caching”Results are cached for 1 hour via ResearchCache backed by Supabase (scrape_cache table). A second run on the same query within the cache window skips the tool calls and re-synthesizes from the cached results, costing only the writer pass.
The main per-run cost is the deep-tier LLM calls. The standard breakdown for a deep run: ~60% LLM (the writer + devil’s advocate passes are expensive), ~25% tool call costs (mostly twitterapi.io on queries with social signal), ~15% overhead. Budget: shallow ~$0.50, standard ~$2–3, deep ~$5–10.
For queries where Brave alone is sufficient (most factual questions with no social-signal component), the tool-selector routes away from paid backends and the cost stays near the low end.
Next: What happens when the research output needs to become a McKinsey-style deck — Deck Architect v4.