The Cockpit Reckoning: PTY Scraping Was Wrong

Tier 3 · Real Builds 10 min read

Before this, read:

Nerve Center v5: the web cockpit — you need to know what the cockpit is before you understand what it was before

This article is about a mistake. Not a small configuration error or a missed edge case — a foundational architectural mistake that generated every class of reliability bug the cockpit ever had, for months. The mistake was fixable, and it was fixed. But it’s worth understanding exactly what went wrong, because the same mistake is easy to make, and the reasoning that justified it at the time sounded reasonable.

What the cockpit originally did

Claude Code runs in a terminal. It’s an interactive TUI (text user interface) — it renders colored output, spinner animations, progress indicators, and tool-call blocks using ANSI escape sequences. When you run it in a terminal, it looks clean and intentional. ANSI is what makes the output look that way.

The original cockpit approach: attach to the Claude Code process via a PTY (pseudo-terminal), capture the raw byte stream of everything Claude Code wrote to that PTY, and replay it in the browser using xterm.js — a terminal emulator for the web. The browser xterm rendered the exact same output you’d see in a real terminal.

This sounded right. You’re watching a terminal agent. A terminal emulator in the browser is exactly the right tool for watching a terminal agent. It’s even what several existing open-source “browser terminal” projects do.

It was wrong.

Why it was wrong

The problem is that Claude Code’s TUI output is designed for human eyes in a terminal, not for machine consumption in a browser. When you attach to a TUI via PTY and replay the byte stream in a browser xterm, you inherit every characteristic of both the interactive TUI format and the browser rendering environment:

Terminal width coupling. The Claude Code TUI renders output based on the terminal’s column width. When the browser xterm is a different width than the PTY’s configured width, output wraps differently, tool-call boxes overflow or truncate, and the rendered output is mangled. You can’t reliably resize a PTY mid-session without corruption.

ANSI escape sequence brittleness. The output stream includes escape sequences for cursor movement, color, clearing lines, and overwriting previous output. An xterm.js renderer that works correctly on Chrome may handle edge cases differently than the native terminal. Edge cases accumulate.

No structural events. The byte stream is just bytes. You can’t ask “when did this tool call start and end?” or “what was the cost of this turn?” because the PTY output is a visual representation, not a structured event log. To get any metadata, you have to parse the rendered text — which means parsing the ANSI codes to extract the underlying text, then parsing the text itself. This is the wrong abstraction stack entirely.

Input races. When the user types input into the browser xterm, that input has to be forwarded to the Claude Code PTY as raw bytes. If the PTY is in the middle of rendering output when the input arrives, you get races. Input gets echoed twice (Claude Code echoes it in its line editor; the PTY echoes it to stdout), the rendered state becomes inconsistent, and the interactive session becomes unreliable.

Reconnect storms. Every browser reconnect (tab sleep, network blip, navigation) required re-reading the PTY byte stream from the beginning or losing all earlier output. Persistent state meant either keeping the PTY running (consuming memory and CPU) or discarding history. Both options were worse than a structured event log that you can cursor through.

What the bugs looked like from the outside

JD’s own notes from the pre-reckoning cockpit:

Chat panes showing blank screens on re-open (“reopened-empty-pane” — audit item #2)
Typing into the composer triggering apparent double-echo in the pane
Browser reconnect causing a “reboot feel” — everything went blank and rebuilt itself
tiny-width rendering on mobile (PTY was configured at a fixed column width; mobile xterm was narrower)
The /key input API (sending raw bytes to a PTY via an HTTP endpoint) was the only way to send messages to agents — every user input was manually serialized to PTY escape sequences

These were not separate bugs. They were the same bug: the PTY-scraping architecture was the wrong abstraction for a web cockpit.

The three-agent research synthesis

On May 29, 2026, three parallel research agents were dispatched to answer a single question: what is the right way to build a web cockpit for a Claude Code agent system?

They investigated independently:

The landscape of existing harnesses (OpenClaw, amux, various open-source projects)
Terminal-in-browser options (xterm.js, Wetty, ttyd, gotty)
What other people building on Claude Code’s API had concluded
The specific failure modes of PTY-replay architectures

The synthesis (~/clawd/projects/cockpit-chat-v3/research/SYNTHESIS.md) was unanimous. Every agent reached the same conclusion, stated the same root cause, and proposed the same fix. In the CHANGELOG (2026-05-29):

“Verdict: cockpit reinvents the wheel by scraping the interactive Claude Code TUI via PTY + replaying raw ANSI in browser xterm — root cause of ALL reliability bugs (corruption, reboot-feel, tiny-width, input-race). Fix everyone agrees on: pivot agent surface to structured claude -p --output-format stream-json event timeline (Supabase=liveness), demote raw terminal to tmux-anchored escape hatch (capture-pane snapshot + PTY resize, never byte-replay).”

The finding also noted that amux — an existing open-source project — scrapes the TUI in exactly the same way the v2 cockpit did. The research treated that as a counter-example confirming the wrong path, not as validation of the approach.

The honest retrospective

Looking back at why the PTY approach was used in the first place: it seemed like the minimum-viable path to showing “the agent doing stuff” in a browser. Xterm.js is a mature library. The PTY output is already there. Forwarding bytes seemed simpler than building an event pipeline.

What it missed: the TUI output is an end product designed for human eyes, not an intermediate representation designed for machine processing. Every attempt to extract structure from it — parsing ANSI codes, scraping rendered text, correlating visual tokens to semantic events — was fighting the abstraction. The structured output mode (stream-json) exists specifically to provide what the cockpit actually needed.

The lesson has nothing to do with Claude Code specifically. It applies to any system where you’re tempted to scrape visual output instead of consuming a structured API: the visual output is designed for the wrong consumer. If a machine-readable format exists, use it. If one doesn’t exist, build it — the effort of building it is usually less than the accumulated bugs from parsing visual output.

What didn’t change

The parallel-spawn backend engine (PR #92 at the time of the research synthesis) was explicitly preserved. The research verdict was: “KEEP the queue/worktree/drainer engine — that’s correct.” The problem was the presentation layer, not the execution layer. The PTY that runs the actual Claude Code process is still there — it just isn’t the primary data source anymore. The structured stream-json output is.

The raw terminal didn’t disappear from the cockpit. It was demoted to an opt-in escape hatch — one tap away from any chat pane, kept alive so the interactive /model menu and raw PTY features remain accessible. But it’s not the primary interface. The primary interface is the structured event timeline.

The full story of how that timeline works is next.

Next: The structured-stream pivot — how claude -p --output-format stream-json + Supabase replaced the PTY as the cockpit’s data source, and what that looks like in production.