The Bridge: Tunnels, /key, Resume

Tier 3 · Real Builds 8 min read

Before this, read:

The structured-stream pivot — the bridge manages the processes that produce the structured stream
Nerve Center v5: the web cockpit — the cockpit is the bridge’s primary consumer

The agents run on a Mac Mini at home. The cockpit is deployed on Vercel. Those are two different machines, and Vercel’s serverless infrastructure cannot open a TCP connection to the Mac Mini directly. The bridge is what connects them.

What the bridge is

The bridge is a FastAPI server (~/agent-system/bridge/) running persistently on the Mac Mini. It manages:

Claude Code PTY sessions — spawning, monitoring, and killing Claude Code processes
Transcript tailing — reading session JSONL files and piping events to Supabase and live SSE clients
Input routing — receiving messages from the cockpit and forwarding them to the appropriate PTY
Auth — HMAC JWT, per-agent allowlist, cwd traversal guard
Session state — syncing session status (starting/live/exited/crashed) to Supabase for the cockpit’s sidebar and status badges

The bridge runs under a launchd plist (com.clawd.bridge) so it restarts on crash and starts at boot. scripts/bridge-reload.sh restarts it without killing any active PTY sessions (graceful reload); scripts/bridge-redeploy.sh handles the code deploy + bootstrap race.

The Cloudflare named tunnel

Early versions of the bridge used dynamic Cloudflare tunnel URLs that changed on every restart. The cockpit’s BRIDGE_URL environment variable had to be manually updated after every bridge restart or tunnel outage, which broke the cockpit silently until someone noticed.

The fix: a named tunnel. A named Cloudflare tunnel has a persistent subdomain (bridge.jddavenport.dev) regardless of which process is running the tunnel on which machine. Once configured, BRIDGE_URL is set-once in Vercel’s environment. The tunnel flaps; the URL stays the same. The cockpit doesn’t know or care.

The named tunnel is one of two things in the original cockpit-chat-v3 PRD that were scoped as “JD’s hands” items — things that required a physical browser interaction (Cloudflare and Namecheap configuration). Everything else ships from the agent system.

Session spawn and the PTY allowlist

The bridge enforces a per-agent cwd allowlist. When the cockpit requests a new session:

POST /api/sessions
{
  "agent": "ai-foundry",
  "prompt": "Good morning — what's on your agenda today?"
}

The bridge looks up the ai-foundry agent in its allowlist, resolves the allowed working directory, and spawns claude as a subprocess with that cwd and the initial prompt. Sessions spawned outside the allowlist are rejected with a 400.

The rate limiter (spawn rate limit) prevents the cockpit from spawning sessions faster than the Mac Mini can handle. This limit exists because of a real incident: in early June 2026, the cockpit was spawning ad-hoc sessions for every domain-page open, with no cap. The result was 24 live PTY sessions, a Mac Mini load average of 133, and a flapping Cloudflare tunnel that took the cockpit down entirely. The fix combined the rate limiter with the session reaper (one live session per domain, parked when idle).

The /key endpoint

For interactive sessions running in PTY mode, user input is sent to the agent via the /key endpoint:

POST /api/sessions/{sid}/key
{"bytes": 13}     # 13 = Enter

Raw byte values (ASCII or escape sequences) are forwarded directly to the PTY’s stdin. This is how the interactive /model menu works in the cockpit — the “Change model” button sends \x1b[A (up arrow) and \x0d (enter) as raw bytes.

It’s a low-level interface by design. The clean bubble-render surface uses the structured transcript; the /key endpoint is for the opt-in raw-terminal mode where you need to interact with the TUI directly.

The /input endpoint (higher-level) sends a full text string followed by a carriage return — used for the normal chat composer.

Resume from dead

The session reaper parks idle sessions by marking them as exited in Supabase and suspending the process. When a user opens the domain pane:

The cockpit checks the Supabase chat_sessions table for a resumable flag
If the session is resumable (exited cleanly), the cockpit shows an amber “Resume” button
On click: POST /api/sessions/{sid}/resume → bridge calls claude --resume {cc_session_id} → the session continues from where it left off, with full context
The cockpit polls Supabase until the session transitions from starting to live

The cc_session_id is Claude Code’s own internal session identifier, stored in the agent_runs table when the session is first spawned. Resuming with this ID tells Claude Code to restore the full conversation context from its own JSONL transcript — the agent picks up mid-thought, not from a blank start.

Keep-alive panes and the session reaper

The session reaper (session_reaper.py, 5-minute cron) runs a simple invariant: at most one live session per domain, plus the CEO. Any domain with a second live session kills the older one. Any session idle for longer than the configured threshold (true idle — no PTY writes, not just no user input) gets parked.

“Parked” means suspended, not deleted. The session record stays in Supabase with status parked. The cockpit shows parked sessions as resumable. The PTY process is gone; the transcript is not.

The pane itself in the cockpit stays “mounted” (the xterm.js terminal component is kept in the DOM) even when the session is parked. This means reconnecting is near-instant — the component doesn’t need to reinitialize, it just resumes the SSE stream when the session re-activates.

Deduplication and the 24-PTY incident

Before the deduplication logic (reconcile_dedupe) was added, the bridge had no defense against multiple PTY processes running for the same session. A cockpit tab opened, network blip, tab reconnected — now there were two PTY processes for the same session, each generating separate events, each consuming a CPU-bound Claude Code process.

reconcile_dedupe runs at bridge startup and periodically: it queries all live PTY processes, groups by cc_session_id, and kills duplicates (keeping the most recently active one). The 2026-06-01 CHANGELOG entry records it collapsing 13 sessions to 9, killing 4 duplicate PTYs on a single reconcile pass.

The combination of the reaper and the reconciler keeps the PTY count bounded. At idle with 8 domain brains registered, the expectation is 0 active PTYs (all parked) and the ability to resume any of them within a second.

What the bridge is not

The bridge is an infrastructure component, not a product. It has no user interface. It’s not where you configure the agent system or manage cron jobs. Its job is to be the plumbing between the Vercel cockpit and the local Claude Code processes — durable, auth’d, rate-limited, and invisible when it’s working correctly.

When the bridge goes down (it does), the cockpit shows errors. The agents themselves — the cron jobs, the Telegram daemon, the domain heartbeats — keep running. They don’t need the bridge. The bridge is only needed for the cockpit’s live views. Everything else runs directly on the Mac Mini without it.

Next: Clean chat rendering — parsing Claude Code JSONL into readable chat bubbles, the clean⇄raw toggle, and dropping compaction leaks.