The Structured-Stream Pivot

Tier 3 · Real Builds 9 min read

Before this, read:

The cockpit reckoning: PTY scraping was wrong — the diagnosis that made this pivot necessary

The research synthesis on May 29, 2026 said what the fix was. The cockpit-chat-v3 project (scoped, scaffolded, and built from May 29 onward) was that fix. This article is about what the pivot actually looked like — the event format, the ingest pipeline, the Supabase schema, the timeline renderer — and why each piece is designed the way it is.

What `stream-json` gives you

Running Claude Code in non-interactive mode:

claude -p --output-format stream-json "your task here"

This does not start the interactive TUI. Instead, it runs the agent headlessly and emits a newline-delimited JSON stream to stdout. Each line is a complete JSON object, typed by its type field.

The event types that matter:

{"type": "system", "subtype": "init", "apiKeySource": "none", ...}
{"type": "assistant", "message": {"content": [{"type": "text", "text": "..."}, ...]}, ...}
{"type": "tool_use", "name": "Read", "input": {...}, "id": "tool_abc123"}
{"type": "tool_result", "tool_use_id": "tool_abc123", "content": [...]}
{"type": "result", "subtype": "success", "total_cost_usd": 0.16, "stop_reason": "end_turn"}

Every message, every tool call and its result, every cost figure — structured, typed, machine-readable. No ANSI codes. No cursor movement. No visual rendering assumptions.

The apiKeySource: "none" in the init event is the Max-plan signal. When Claude Code authenticates via JD’s interactive OAuth session rather than an API key, the apiKeySource is absent or null. The event ingest pipeline uses this to distinguish Max-plan runs from API-key runs for cost attribution.

The event store

The Supabase table agent_run_events stores every event from every run:

agent_run_events (
  run_id      uuid REFERENCES agent_runs(id),
  seq         integer,       -- positional sequence within run
  event_type  text,          -- 'system', 'assistant', 'tool_use', 'tool_result', 'result'
  payload     jsonb,         -- the raw event object
  created_at  timestamptz DEFAULT now()
)

The seq column is the idempotency key. When the bridge re-tails a JSONL file (by byte offset, crash-tolerant), it can insert events in order without worrying about duplicates — the positional sequence is stable across re-reads.

The companion table agent_runs holds run-level metadata:

agent_runs (
  id          uuid PRIMARY KEY,
  cc_session_id text,          -- Claude Code session id for M5 resume
  status      text,             -- 'running', 'done', 'failed', 'timed_out'
  cost_usd    numeric,
  started_at  timestamptz,
  finished_at timestamptz
)

The M2 milestone (2026-05-29, PR #93) shipped this schema and the event ingest pipeline. It was built against real captured stream-json output — not against documentation, but against actual JSONL files from live Claude Code runs on the system. The event ingest handled the edge cases (rate-limit events, compaction turns, meta-only content blocks) before any timeline rendering was built on top.

The transcript source

For domain brains and sessions that run in interactive mode (not claude -p), the structured data source is the Claude Code session transcript: ~/.claude/projects/<enc-cwd>/<session_id>.jsonl.

Claude Code writes every session turn to this JSONL file automatically. The bridge tails it by byte-offset — reading from the last known position on each poll, appending new events to the Supabase store. This is idempotent: a crash mid-read just re-reads from the last committed byte-offset on restart.

The amux research (M0 spike) found that this transcript file is the right data source for interactive sessions, while claude -p --output-format stream-json stdout is the right data source for headless/scripted runs. The bridge handles both.

The timeline renderer

M3 (PR #120, merged and prod-deployed) shipped the structured event timeline surface: a new /runs/[runId] page and the supporting API routes.

The rendering approach:

Assistant text → markdown bubble (user-facing prose rendered with a markdown parser)
Tool calls → collapsible cards, paired by tool_use_id with their corresponding result
Tool call details → file paths, bash commands, edit diffs — collapsed by default, expandable
Result event → a “Done” divider with cost and stop reason
Error → error-styled card

This is what the CHANGELOG called “no xterm” — the timeline page has no terminal emulator. It is a structured HTML rendering of the agent’s work, built entirely from the event stream.

The cursor-based API (GET /api/runs/[id]/events?since=<seq>) lets the frontend poll for new events without re-fetching the full history on every tick. The SSE endpoint (GET /api/runs/[id]/stream) streams id-cursored frames in real-time. A client that loses connection re-subscribes from its last known seq — no replay-from-beginning, no lost events, no reconnect storm.

Clean render vs raw toggle

For domain brains and interactive sessions, the cockpit’s “clean render” (shipped 2026-06-01, CHANGELOG: “last centerpiece item”) reads the session JSONL and renders it as chat bubbles rather than raw terminal output.

The pipeline:

Bridge transcript endpoint parses ~/.claude/projects/<enc-cwd>/<session_id>.jsonl
Parser extracts assistant text, tool calls, tool results, and user messages — 11 event types handled
Frontend renders these as chat bubbles: user messages, assistant responses, tool cards

The clean⇄raw toggle keeps the raw xterm.js terminal mounted in the background. Flipping to raw gives you the interactive PTY — the /model menu, the visual feedback of a real terminal, the interactive input. Flipping back to clean gives you the readable transcript.

The result (from the live verification, 2026-06-01): the AI Foundry domain brain rendered 53 clean turns — user messages, assistant responses, tool bubbles — with zero terminal garbage. The raw toggle reached the interactive xterm within the same pane. Both modes coexist without the raw terminal contaminating the clean view.

Why Supabase as liveness store

The alternative was in-memory state in the bridge server. The argument against it:

If the bridge process restarts (cron restart, code deploy, Mac Mini reboot), in-memory state is gone. The cockpit shows blank. Every domain brain’s transcript history disappears until the PTY re-emits it.

With Supabase as the liveness store, a bridge restart means the cockpit reads from Supabase and shows everything up to the restart immediately. New events start streaming as the bridge resumes tailing. The gap is bounded by how long the bridge was down, not by how much transcript history needs to be replayed.

This is the same principle as the files-as-state doctrine applied to the agent presentation layer: persistent state goes to a durable store, not to process memory.

The June 15 billing reckoning

The M2 billing-verify research (May 29, PR synthesis) uncovered a deadline that made the structured-stream pivot urgent beyond the reliability argument. As of June 15, 2026, claude -p / Agent SDK / third-party harnesses would stop counting against the interactive Max plan limits and instead draw from a capped monthly programmatic credit ($100 for Max5x, $200 for Max20x, no rollover). Exhausted credits mean the agent halts.

This affected every cron job in the system, not just the cockpit. The billing-verdict document (M2-billing-verdict.md) changed the milestone status from “deferred” to “required before June 15” for the CLI-to-API seam work. Having the structured event store already in place meant cost attribution from claude -p runs was trackable per-run — the cockpit’s cost HUD became a billing dashboard, not just a curiosity.

Next: The bridge: tunnels, /key, resume — how the Cloudflare named tunnel, raw-key endpoint, and resume-from-dead mechanism work.