Build of the Week: Multi-Agent Orchestration with LangGraph
This is the first entry in our Build of the Week series, where we break down real multi-agent systems — not toy demos, not “hello world” chatbots. Working systems that solve actual problems.
This week: building a multi-agent orchestration system with LangGraph. We’ll wire up a supervisor agent that routes tasks to specialist workers, manages shared state, and handles failures gracefully.
Why LangGraph
Section titled “Why LangGraph”LangGraph is LangChain’s framework for building stateful, multi-actor applications with LLMs. Unlike simple chain-of-thought pipelines, LangGraph gives you:
- Graph-based control flow — agents are nodes, edges define routing logic
- Persistent state — shared state object that flows between nodes
- Cycles and conditionals — agents can loop, retry, and branch
- Built-in checkpointing — resume from any point in the graph
If you’ve built multi-agent systems with raw API calls (like we do in the Agent Tree stack), you know the pain of managing state, routing, and error handling yourself. LangGraph abstracts the hard parts while keeping you in control of the logic.
The Architecture
Section titled “The Architecture”We’re building a system with three agents:
- Supervisor — receives user requests, decides which specialist to invoke, verifies results
- Researcher — searches the web, summarizes findings, returns structured data
- Coder — writes and tests code based on specifications
User Request │ ▼┌──────────┐│Supervisor│◄──────────────────────┐│ (GPT-4) │ │└────┬─────┘ │ │ │ ├─── "needs research" ───►┌───┴──────┐ │ │Researcher│ │ │ (Sonnet) │ │ └──────────┘ │ └─── "needs code" ──────►┌──────────┐ │ Coder │ │ (Haiku) │ └──────────┘The supervisor doesn’t do the work. It decides who does the work, inspects the output, and either routes to the next step or sends feedback for another pass.
Step 1: Define the State
Section titled “Step 1: Define the State”Every LangGraph application starts with a state schema. This is the shared context that flows between all nodes.
from typing import TypedDict, Annotated, Sequencefrom langchain_core.messages import BaseMessageimport operator
class AgentState(TypedDict): """Shared state across all agents in the graph.""" messages: Annotated[Sequence[BaseMessage], operator.add] next_agent: str research_results: str code_output: str iteration_count: int task_complete: boolKey design decisions here:
messagesuses theoperator.addreducer — new messages append rather than replacenext_agentis the routing key the supervisor setsiteration_countprevents infinite loops (critical — we’ll set a max of 3)
Step 2: Build the Specialist Agents
Section titled “Step 2: Build the Specialist Agents”Each specialist is a function that takes the state, does its work, and returns an updated state.
The Researcher
Section titled “The Researcher”from langchain_openai import ChatOpenAIfrom langchain_core.messages import HumanMessage, AIMessagefrom langchain_community.tools import TavilySearchResults
search_tool = TavilySearchResults(max_results=5)
research_llm = ChatOpenAI( model="claude-sonnet-4-20250514", # Good at synthesis temperature=0.1,)
def researcher_node(state: AgentState) -> dict: """Research agent — searches web and synthesizes findings.""" messages = state["messages"] last_message = messages[-1].content
# Execute search search_results = search_tool.invoke(last_message)
# Synthesize with LLM synthesis_prompt = f"""You are a research specialist. Synthesize these search results into a clear, structured summary. Include sources.
Query: {last_message} Results: {search_results}
Return a structured summary with key findings and source URLs."""
response = research_llm.invoke([HumanMessage(content=synthesis_prompt)])
return { "messages": [AIMessage(content=f"[Researcher] {response.content}")], "research_results": response.content, }The Coder
Section titled “The Coder”coder_llm = ChatOpenAI( model="claude-haiku-4-20250414", # Fast, cheap, good at defined tasks temperature=0.0,)
def coder_node(state: AgentState) -> dict: """Coding agent — writes code based on specifications.""" messages = state["messages"] research = state.get("research_results", "")
code_prompt = f"""You are a coding specialist. Write clean, tested Python code based on the following conversation and research context.
Research context: {research}
Requirements from conversation: {messages[-1].content}
Return the code with inline comments and a brief explanation of your approach. Include error handling and type hints."""
response = coder_llm.invoke([HumanMessage(content=code_prompt)])
return { "messages": [AIMessage(content=f"[Coder] {response.content}")], "code_output": response.content, }Notice the model selection: Sonnet for research (needs synthesis and judgment), Haiku for coding (well-defined task, speed matters). This is the model routing strategy in action — use the cheapest model that can do the job well.
Step 3: Build the Supervisor
Section titled “Step 3: Build the Supervisor”The supervisor is the brain. It looks at the current state and decides what happens next.
from langchain_core.messages import SystemMessage
supervisor_llm = ChatOpenAI( model="gpt-4o", # Best reasoning for routing decisions temperature=0.0,)
SUPERVISOR_PROMPT = """You are a task supervisor managing a team of specialists:- Researcher: searches the web and synthesizes information- Coder: writes and tests Python code
Given the conversation so far, decide the next step:- If the task needs information gathering, route to "researcher"- If the task needs code written and you have enough context, route to "coder"- If the task is complete and verified, route to "FINISH"
Respond with ONLY one of: researcher, coder, FINISH"""
def supervisor_node(state: AgentState) -> dict: """Supervisor — routes tasks and verifies quality.""" messages = state["messages"] iteration = state.get("iteration_count", 0)
# Safety valve: max 3 iterations if iteration >= 3: return { "messages": [AIMessage(content="[Supervisor] Max iterations reached. Delivering current results.")], "next_agent": "FINISH", "iteration_count": iteration + 1, "task_complete": True, }
response = supervisor_llm.invoke( [SystemMessage(content=SUPERVISOR_PROMPT)] + list(messages) )
next_step = response.content.strip().lower()
# Validate routing decision if next_step not in ("researcher", "coder", "finish"): next_step = "finish" # Fail safe
return { "messages": [AIMessage(content=f"[Supervisor] Routing to: {next_step}")], "next_agent": next_step.upper() if next_step == "finish" else next_step, "iteration_count": iteration + 1, "task_complete": next_step == "finish", }The iteration cap at 3 is non-negotiable. In production, you also want a token budget limit — track cumulative tokens across all nodes and bail if you exceed a threshold.
Step 4: Wire Up the Graph
Section titled “Step 4: Wire Up the Graph”This is where LangGraph shines. You define nodes, edges, and conditional routing in a declarative graph.
from langgraph.graph import StateGraph, END
def route_next(state: AgentState) -> str: """Conditional edge — routes based on supervisor's decision.""" next_agent = state.get("next_agent", "FINISH") if next_agent == "FINISH": return END return next_agent
# Build the graphworkflow = StateGraph(AgentState)
# Add nodesworkflow.add_node("supervisor", supervisor_node)workflow.add_node("researcher", researcher_node)workflow.add_node("coder", coder_node)
# Set entry pointworkflow.set_entry_point("supervisor")
# Add edgesworkflow.add_conditional_edges( "supervisor", route_next, { "researcher": "researcher", "coder": "coder", END: END, },)
# Specialists always report back to supervisorworkflow.add_edge("researcher", "supervisor")workflow.add_edge("coder", "supervisor")
# Compileapp = workflow.compile()The flow: every request enters at the supervisor. The supervisor routes to a specialist. The specialist does work and returns to the supervisor. The supervisor checks the result and either routes again or finishes.
Step 5: Run It
Section titled “Step 5: Run It”from langchain_core.messages import HumanMessage
result = app.invoke({ "messages": [ HumanMessage(content="Research the best Python web frameworks for building " "REST APIs in 2026, then write a FastAPI hello-world with " "health check endpoint and proper error handling.") ], "next_agent": "", "research_results": "", "code_output": "", "iteration_count": 0, "task_complete": False,})
# Print the conversationfor msg in result["messages"]: print(f"\n{'='*60}") print(msg.content[:500])A typical run looks like:
[Supervisor] Routing to: researcher[Researcher] Key findings: FastAPI remains the top choice for Python REST APIs...[Supervisor] Routing to: coder[Coder] Here's a FastAPI application with health check and error handling...[Supervisor] Routing to: FINISHThree nodes, three LLM calls per specialist, one routing decision each pass. Total cost for this run: roughly $0.03.
Adding Checkpointing
Section titled “Adding Checkpointing”For production, you want to persist state so you can resume interrupted workflows.
from langgraph.checkpoint.memory import MemorySaver
# Add checkpointingmemory = MemorySaver()app = workflow.compile(checkpointer=memory)
# Run with a thread ID for persistenceconfig = {"configurable": {"thread_id": "task-001"}}result = app.invoke(initial_state, config)
# Later: resume from checkpointstate = app.get_state(config)For production deployments, swap MemorySaver for a PostgreSQL or Redis-backed checkpointer. LangGraph has built-in support for both.
How This Compares to Our Stack
Section titled “How This Compares to Our Stack”In the Agent Tree system, we use a similar pattern but with raw Claude API calls and file-based state instead of LangGraph’s abstractions:
| Aspect | Agent Tree | LangGraph |
|---|---|---|
| State management | File-based (YAML/JSON) | In-memory graph state |
| Routing | Custom Python logic | Conditional edges |
| Checkpointing | Git commits + file snapshots | Built-in checkpointers |
| Model mixing | OpenRouter + local Ollama | LangChain model adapters |
| Error handling | Try/catch + iteration limits | Same, but graph-level |
| Deployment | Mac Mini + cron | LangGraph Cloud or self-hosted |
LangGraph is more structured and portable. Our approach is more flexible and cheaper (no LangChain overhead). The right choice depends on your constraints.
Production Considerations
Section titled “Production Considerations”Before shipping this to real users:
1. Add observability. Log every routing decision, every specialist invocation, every token count. We use LangSmith for LangGraph projects and custom logging for our Agent Tree stack.
2. Set token budgets. Not just iteration limits — track cumulative tokens. A researcher that returns 50,000 tokens of search results will blow up your downstream context.
3. Handle partial failures. If the researcher fails but the coder can still work with what’s available, let it. Don’t fail the whole graph because one node had a bad day.
4. Test the routing logic. The supervisor’s routing decisions are the most critical part. Write unit tests that verify routing for common input patterns.
5. Version your prompts. When you change a specialist’s system prompt, track the version. You’ll want to A/B test and roll back.
Full Source Code
Section titled “Full Source Code”The complete working example is available on GitHub:
github.com/JDDavenport/langgraph-multi-agent-example
Clone it, set your API keys, run python main.py. The README has setup instructions.
What’s Next
Section titled “What’s Next”Next week’s Build of the Week: Tool-Calling Agents with Claude’s Computer Use — building an agent that can browse the web, fill out forms, and extract data from sites that don’t have APIs.
This is Issue #1 of the Agent Tree Army newsletter’s Build of the Week series. Subscribe to get these in your inbox every Friday.
About the author: JD Davenport runs 50+ AI agents on a Mac Mini and documents everything at docs.agenttree.army. Follow on LinkedIn for daily updates on building AI agent systems.*