Build of the Week: Multi-Agent Orchestration with LangGraph

This is the first entry in our Build of the Week series, where we break down real multi-agent systems — not toy demos, not “hello world” chatbots. Working systems that solve actual problems.

This week: building a multi-agent orchestration system with LangGraph. We’ll wire up a supervisor agent that routes tasks to specialist workers, manages shared state, and handles failures gracefully.

Why LangGraph

LangGraph is LangChain’s framework for building stateful, multi-actor applications with LLMs. Unlike simple chain-of-thought pipelines, LangGraph gives you:

Graph-based control flow — agents are nodes, edges define routing logic
Persistent state — shared state object that flows between nodes
Cycles and conditionals — agents can loop, retry, and branch
Built-in checkpointing — resume from any point in the graph

If you’ve built multi-agent systems with raw API calls (like we do in the Agent Tree stack), you know the pain of managing state, routing, and error handling yourself. LangGraph abstracts the hard parts while keeping you in control of the logic.

The Architecture

We’re building a system with three agents:

Supervisor — receives user requests, decides which specialist to invoke, verifies results
Researcher — searches the web, summarizes findings, returns structured data
Coder — writes and tests code based on specifications

User Request
    │
    ▼
┌──────────┐
│Supervisor│◄──────────────────────┐
│  (GPT-4) │                       │
└────┬─────┘                       │
     │                             │
     ├─── "needs research" ───►┌───┴──────┐
     │                         │Researcher│
     │                         │ (Sonnet) │
     │                         └──────────┘
     │
     └─── "needs code" ──────►┌──────────┐
                               │  Coder   │
                               │ (Haiku)  │
                               └──────────┘

The supervisor doesn’t do the work. It decides who does the work, inspects the output, and either routes to the next step or sends feedback for another pass.

Step 1: Define the State

Every LangGraph application starts with a state schema. This is the shared context that flows between all nodes.

from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage
import operator

class AgentState(TypedDict):
    """Shared state across all agents in the graph."""
    messages: Annotated[Sequence[BaseMessage], operator.add]
    next_agent: str
    research_results: str
    code_output: str
    iteration_count: int
    task_complete: bool

Key design decisions here:

messages uses the operator.add reducer — new messages append rather than replace
next_agent is the routing key the supervisor sets
iteration_count prevents infinite loops (critical — we’ll set a max of 3)

Step 2: Build the Specialist Agents

Each specialist is a function that takes the state, does its work, and returns an updated state.

The Researcher

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage
from langchain_community.tools import TavilySearchResults

search_tool = TavilySearchResults(max_results=5)

research_llm = ChatOpenAI(
    model="claude-sonnet-4-20250514",  # Good at synthesis
    temperature=0.1,
)

def researcher_node(state: AgentState) -> dict:
    """Research agent — searches web and synthesizes findings."""
    messages = state["messages"]
    last_message = messages[-1].content

    # Execute search
    search_results = search_tool.invoke(last_message)

    # Synthesize with LLM
    synthesis_prompt = f"""You are a research specialist. Synthesize these search results
    into a clear, structured summary. Include sources.

    Query: {last_message}
    Results: {search_results}

    Return a structured summary with key findings and source URLs."""

    response = research_llm.invoke([HumanMessage(content=synthesis_prompt)])

    return {
        "messages": [AIMessage(content=f"[Researcher] {response.content}")],
        "research_results": response.content,
    }

The Coder

coder_llm = ChatOpenAI(
    model="claude-haiku-4-20250414",  # Fast, cheap, good at defined tasks
    temperature=0.0,
)

def coder_node(state: AgentState) -> dict:
    """Coding agent — writes code based on specifications."""
    messages = state["messages"]
    research = state.get("research_results", "")

    code_prompt = f"""You are a coding specialist. Write clean, tested Python code
    based on the following conversation and research context.

    Research context: {research}

    Requirements from conversation:
    {messages[-1].content}

    Return the code with inline comments and a brief explanation of your approach.
    Include error handling and type hints."""

    response = coder_llm.invoke([HumanMessage(content=code_prompt)])

    return {
        "messages": [AIMessage(content=f"[Coder] {response.content}")],
        "code_output": response.content,
    }

Notice the model selection: Sonnet for research (needs synthesis and judgment), Haiku for coding (well-defined task, speed matters). This is the model routing strategy in action — use the cheapest model that can do the job well.

Step 3: Build the Supervisor

The supervisor is the brain. It looks at the current state and decides what happens next.

from langchain_core.messages import SystemMessage

supervisor_llm = ChatOpenAI(
    model="gpt-4o",  # Best reasoning for routing decisions
    temperature=0.0,
)

SUPERVISOR_PROMPT = """You are a task supervisor managing a team of specialists:
- Researcher: searches the web and synthesizes information
- Coder: writes and tests Python code

Given the conversation so far, decide the next step:
- If the task needs information gathering, route to "researcher"
- If the task needs code written and you have enough context, route to "coder"
- If the task is complete and verified, route to "FINISH"

Respond with ONLY one of: researcher, coder, FINISH"""

def supervisor_node(state: AgentState) -> dict:
    """Supervisor — routes tasks and verifies quality."""
    messages = state["messages"]
    iteration = state.get("iteration_count", 0)

    # Safety valve: max 3 iterations
    if iteration >= 3:
        return {
            "messages": [AIMessage(content="[Supervisor] Max iterations reached. Delivering current results.")],
            "next_agent": "FINISH",
            "iteration_count": iteration + 1,
            "task_complete": True,
        }

    response = supervisor_llm.invoke(
        [SystemMessage(content=SUPERVISOR_PROMPT)] + list(messages)
    )

    next_step = response.content.strip().lower()

    # Validate routing decision
    if next_step not in ("researcher", "coder", "finish"):
        next_step = "finish"  # Fail safe

    return {
        "messages": [AIMessage(content=f"[Supervisor] Routing to: {next_step}")],
        "next_agent": next_step.upper() if next_step == "finish" else next_step,
        "iteration_count": iteration + 1,
        "task_complete": next_step == "finish",
    }

The iteration cap at 3 is non-negotiable. In production, you also want a token budget limit — track cumulative tokens across all nodes and bail if you exceed a threshold.

Step 4: Wire Up the Graph

This is where LangGraph shines. You define nodes, edges, and conditional routing in a declarative graph.

from langgraph.graph import StateGraph, END

def route_next(state: AgentState) -> str:
    """Conditional edge — routes based on supervisor's decision."""
    next_agent = state.get("next_agent", "FINISH")
    if next_agent == "FINISH":
        return END
    return next_agent

# Build the graph
workflow = StateGraph(AgentState)

# Add nodes
workflow.add_node("supervisor", supervisor_node)
workflow.add_node("researcher", researcher_node)
workflow.add_node("coder", coder_node)

# Set entry point
workflow.set_entry_point("supervisor")

# Add edges
workflow.add_conditional_edges(
    "supervisor",
    route_next,
    {
        "researcher": "researcher",
        "coder": "coder",
        END: END,
    },
)

# Specialists always report back to supervisor
workflow.add_edge("researcher", "supervisor")
workflow.add_edge("coder", "supervisor")

# Compile
app = workflow.compile()

The flow: every request enters at the supervisor. The supervisor routes to a specialist. The specialist does work and returns to the supervisor. The supervisor checks the result and either routes again or finishes.

Step 5: Run It

from langchain_core.messages import HumanMessage

result = app.invoke({
    "messages": [
        HumanMessage(content="Research the best Python web frameworks for building "
                     "REST APIs in 2026, then write a FastAPI hello-world with "
                     "health check endpoint and proper error handling.")
    ],
    "next_agent": "",
    "research_results": "",
    "code_output": "",
    "iteration_count": 0,
    "task_complete": False,
})

# Print the conversation
for msg in result["messages"]:
    print(f"\n{'='*60}")
    print(msg.content[:500])

A typical run looks like:

[Supervisor] Routing to: researcher
[Researcher] Key findings: FastAPI remains the top choice for Python REST APIs...
[Supervisor] Routing to: coder
[Coder] Here's a FastAPI application with health check and error handling...
[Supervisor] Routing to: FINISH

Three nodes, three LLM calls per specialist, one routing decision each pass. Total cost for this run: roughly $0.03.

Adding Checkpointing

For production, you want to persist state so you can resume interrupted workflows.

from langgraph.checkpoint.memory import MemorySaver

# Add checkpointing
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

# Run with a thread ID for persistence
config = {"configurable": {"thread_id": "task-001"}}
result = app.invoke(initial_state, config)

# Later: resume from checkpoint
state = app.get_state(config)

For production deployments, swap MemorySaver for a PostgreSQL or Redis-backed checkpointer. LangGraph has built-in support for both.

How This Compares to Our Stack

In the Agent Tree system, we use a similar pattern but with raw Claude API calls and file-based state instead of LangGraph’s abstractions:

Aspect	Agent Tree	LangGraph
State management	File-based (YAML/JSON)	In-memory graph state
Routing	Custom Python logic	Conditional edges
Checkpointing	Git commits + file snapshots	Built-in checkpointers
Model mixing	OpenRouter + local Ollama	LangChain model adapters
Error handling	Try/catch + iteration limits	Same, but graph-level
Deployment	Mac Mini + cron	LangGraph Cloud or self-hosted

LangGraph is more structured and portable. Our approach is more flexible and cheaper (no LangChain overhead). The right choice depends on your constraints.

Production Considerations

Before shipping this to real users:

1. Add observability. Log every routing decision, every specialist invocation, every token count. We use LangSmith for LangGraph projects and custom logging for our Agent Tree stack.

2. Set token budgets. Not just iteration limits — track cumulative tokens. A researcher that returns 50,000 tokens of search results will blow up your downstream context.

3. Handle partial failures. If the researcher fails but the coder can still work with what’s available, let it. Don’t fail the whole graph because one node had a bad day.

4. Test the routing logic. The supervisor’s routing decisions are the most critical part. Write unit tests that verify routing for common input patterns.

5. Version your prompts. When you change a specialist’s system prompt, track the version. You’ll want to A/B test and roll back.

Full Source Code

The complete working example is available on GitHub:

github.com/JDDavenport/langgraph-multi-agent-example

Clone it, set your API keys, run python main.py. The README has setup instructions.

What’s Next

Next week’s Build of the Week: Tool-Calling Agents with Claude’s Computer Use — building an agent that can browse the web, fill out forms, and extract data from sites that don’t have APIs.

This is Issue #1 of the Agent Tree Army newsletter’s Build of the Week series. Subscribe to get these in your inbox every Friday.

About the author: JD Davenport runs 50+ AI agents on a Mac Mini and documents everything at docs.agenttree.army. Follow on LinkedIn for daily updates on building AI agent systems.*