Skip to content
Subscribe

Build of the Week: Multi-Agent Orchestration with LangGraph

This is the first entry in our Build of the Week series, where we break down real multi-agent systems — not toy demos, not “hello world” chatbots. Working systems that solve actual problems.

This week: building a multi-agent orchestration system with LangGraph. We’ll wire up a supervisor agent that routes tasks to specialist workers, manages shared state, and handles failures gracefully.

LangGraph is LangChain’s framework for building stateful, multi-actor applications with LLMs. Unlike simple chain-of-thought pipelines, LangGraph gives you:

  • Graph-based control flow — agents are nodes, edges define routing logic
  • Persistent state — shared state object that flows between nodes
  • Cycles and conditionals — agents can loop, retry, and branch
  • Built-in checkpointing — resume from any point in the graph

If you’ve built multi-agent systems with raw API calls (like we do in the Agent Tree stack), you know the pain of managing state, routing, and error handling yourself. LangGraph abstracts the hard parts while keeping you in control of the logic.

We’re building a system with three agents:

  1. Supervisor — receives user requests, decides which specialist to invoke, verifies results
  2. Researcher — searches the web, summarizes findings, returns structured data
  3. Coder — writes and tests code based on specifications
User Request
┌──────────┐
│Supervisor│◄──────────────────────┐
│ (GPT-4) │ │
└────┬─────┘ │
│ │
├─── "needs research" ───►┌───┴──────┐
│ │Researcher│
│ │ (Sonnet) │
│ └──────────┘
└─── "needs code" ──────►┌──────────┐
│ Coder │
│ (Haiku) │
└──────────┘

The supervisor doesn’t do the work. It decides who does the work, inspects the output, and either routes to the next step or sends feedback for another pass.

Every LangGraph application starts with a state schema. This is the shared context that flows between all nodes.

from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage
import operator
class AgentState(TypedDict):
"""Shared state across all agents in the graph."""
messages: Annotated[Sequence[BaseMessage], operator.add]
next_agent: str
research_results: str
code_output: str
iteration_count: int
task_complete: bool

Key design decisions here:

  • messages uses the operator.add reducer — new messages append rather than replace
  • next_agent is the routing key the supervisor sets
  • iteration_count prevents infinite loops (critical — we’ll set a max of 3)

Each specialist is a function that takes the state, does its work, and returns an updated state.

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage
from langchain_community.tools import TavilySearchResults
search_tool = TavilySearchResults(max_results=5)
research_llm = ChatOpenAI(
model="claude-sonnet-4-20250514", # Good at synthesis
temperature=0.1,
)
def researcher_node(state: AgentState) -> dict:
"""Research agent — searches web and synthesizes findings."""
messages = state["messages"]
last_message = messages[-1].content
# Execute search
search_results = search_tool.invoke(last_message)
# Synthesize with LLM
synthesis_prompt = f"""You are a research specialist. Synthesize these search results
into a clear, structured summary. Include sources.
Query: {last_message}
Results: {search_results}
Return a structured summary with key findings and source URLs."""
response = research_llm.invoke([HumanMessage(content=synthesis_prompt)])
return {
"messages": [AIMessage(content=f"[Researcher] {response.content}")],
"research_results": response.content,
}
coder_llm = ChatOpenAI(
model="claude-haiku-4-20250414", # Fast, cheap, good at defined tasks
temperature=0.0,
)
def coder_node(state: AgentState) -> dict:
"""Coding agent — writes code based on specifications."""
messages = state["messages"]
research = state.get("research_results", "")
code_prompt = f"""You are a coding specialist. Write clean, tested Python code
based on the following conversation and research context.
Research context: {research}
Requirements from conversation:
{messages[-1].content}
Return the code with inline comments and a brief explanation of your approach.
Include error handling and type hints."""
response = coder_llm.invoke([HumanMessage(content=code_prompt)])
return {
"messages": [AIMessage(content=f"[Coder] {response.content}")],
"code_output": response.content,
}

Notice the model selection: Sonnet for research (needs synthesis and judgment), Haiku for coding (well-defined task, speed matters). This is the model routing strategy in action — use the cheapest model that can do the job well.

The supervisor is the brain. It looks at the current state and decides what happens next.

from langchain_core.messages import SystemMessage
supervisor_llm = ChatOpenAI(
model="gpt-4o", # Best reasoning for routing decisions
temperature=0.0,
)
SUPERVISOR_PROMPT = """You are a task supervisor managing a team of specialists:
- Researcher: searches the web and synthesizes information
- Coder: writes and tests Python code
Given the conversation so far, decide the next step:
- If the task needs information gathering, route to "researcher"
- If the task needs code written and you have enough context, route to "coder"
- If the task is complete and verified, route to "FINISH"
Respond with ONLY one of: researcher, coder, FINISH"""
def supervisor_node(state: AgentState) -> dict:
"""Supervisor — routes tasks and verifies quality."""
messages = state["messages"]
iteration = state.get("iteration_count", 0)
# Safety valve: max 3 iterations
if iteration >= 3:
return {
"messages": [AIMessage(content="[Supervisor] Max iterations reached. Delivering current results.")],
"next_agent": "FINISH",
"iteration_count": iteration + 1,
"task_complete": True,
}
response = supervisor_llm.invoke(
[SystemMessage(content=SUPERVISOR_PROMPT)] + list(messages)
)
next_step = response.content.strip().lower()
# Validate routing decision
if next_step not in ("researcher", "coder", "finish"):
next_step = "finish" # Fail safe
return {
"messages": [AIMessage(content=f"[Supervisor] Routing to: {next_step}")],
"next_agent": next_step.upper() if next_step == "finish" else next_step,
"iteration_count": iteration + 1,
"task_complete": next_step == "finish",
}

The iteration cap at 3 is non-negotiable. In production, you also want a token budget limit — track cumulative tokens across all nodes and bail if you exceed a threshold.

This is where LangGraph shines. You define nodes, edges, and conditional routing in a declarative graph.

from langgraph.graph import StateGraph, END
def route_next(state: AgentState) -> str:
"""Conditional edge — routes based on supervisor's decision."""
next_agent = state.get("next_agent", "FINISH")
if next_agent == "FINISH":
return END
return next_agent
# Build the graph
workflow = StateGraph(AgentState)
# Add nodes
workflow.add_node("supervisor", supervisor_node)
workflow.add_node("researcher", researcher_node)
workflow.add_node("coder", coder_node)
# Set entry point
workflow.set_entry_point("supervisor")
# Add edges
workflow.add_conditional_edges(
"supervisor",
route_next,
{
"researcher": "researcher",
"coder": "coder",
END: END,
},
)
# Specialists always report back to supervisor
workflow.add_edge("researcher", "supervisor")
workflow.add_edge("coder", "supervisor")
# Compile
app = workflow.compile()

The flow: every request enters at the supervisor. The supervisor routes to a specialist. The specialist does work and returns to the supervisor. The supervisor checks the result and either routes again or finishes.

from langchain_core.messages import HumanMessage
result = app.invoke({
"messages": [
HumanMessage(content="Research the best Python web frameworks for building "
"REST APIs in 2026, then write a FastAPI hello-world with "
"health check endpoint and proper error handling.")
],
"next_agent": "",
"research_results": "",
"code_output": "",
"iteration_count": 0,
"task_complete": False,
})
# Print the conversation
for msg in result["messages"]:
print(f"\n{'='*60}")
print(msg.content[:500])

A typical run looks like:

[Supervisor] Routing to: researcher
[Researcher] Key findings: FastAPI remains the top choice for Python REST APIs...
[Supervisor] Routing to: coder
[Coder] Here's a FastAPI application with health check and error handling...
[Supervisor] Routing to: FINISH

Three nodes, three LLM calls per specialist, one routing decision each pass. Total cost for this run: roughly $0.03.

For production, you want to persist state so you can resume interrupted workflows.

from langgraph.checkpoint.memory import MemorySaver
# Add checkpointing
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
# Run with a thread ID for persistence
config = {"configurable": {"thread_id": "task-001"}}
result = app.invoke(initial_state, config)
# Later: resume from checkpoint
state = app.get_state(config)

For production deployments, swap MemorySaver for a PostgreSQL or Redis-backed checkpointer. LangGraph has built-in support for both.

In the Agent Tree system, we use a similar pattern but with raw Claude API calls and file-based state instead of LangGraph’s abstractions:

AspectAgent TreeLangGraph
State managementFile-based (YAML/JSON)In-memory graph state
RoutingCustom Python logicConditional edges
CheckpointingGit commits + file snapshotsBuilt-in checkpointers
Model mixingOpenRouter + local OllamaLangChain model adapters
Error handlingTry/catch + iteration limitsSame, but graph-level
DeploymentMac Mini + cronLangGraph Cloud or self-hosted

LangGraph is more structured and portable. Our approach is more flexible and cheaper (no LangChain overhead). The right choice depends on your constraints.

Before shipping this to real users:

1. Add observability. Log every routing decision, every specialist invocation, every token count. We use LangSmith for LangGraph projects and custom logging for our Agent Tree stack.

2. Set token budgets. Not just iteration limits — track cumulative tokens. A researcher that returns 50,000 tokens of search results will blow up your downstream context.

3. Handle partial failures. If the researcher fails but the coder can still work with what’s available, let it. Don’t fail the whole graph because one node had a bad day.

4. Test the routing logic. The supervisor’s routing decisions are the most critical part. Write unit tests that verify routing for common input patterns.

5. Version your prompts. When you change a specialist’s system prompt, track the version. You’ll want to A/B test and roll back.

The complete working example is available on GitHub:

github.com/JDDavenport/langgraph-multi-agent-example

Clone it, set your API keys, run python main.py. The README has setup instructions.

Next week’s Build of the Week: Tool-Calling Agents with Claude’s Computer Use — building an agent that can browse the web, fill out forms, and extract data from sites that don’t have APIs.


This is Issue #1 of the Agent Tree Army newsletter’s Build of the Week series. Subscribe to get these in your inbox every Friday.

About the author: JD Davenport runs 50+ AI agents on a Mac Mini and documents everything at docs.agenttree.army. Follow on LinkedIn for daily updates on building AI agent systems.*