Ship-to-Prod Workflow

Tier 2 · Building 7 min read

Before this, read:

Git discipline for agents — branch conventions the shipping flow builds on
Root-cause-first — why the pre-ship quality bar matters

Pull request review is a bottleneck. It requires context-switching to a diff view, synthesizing what the code does from changed lines, and making a judgment call without running the thing. For a system with parallel build agents shipping multiple PRs a week, it doesn’t scale.

The ship-to-prod workflow shifts the review point: JD evaluates a change by clicking around the deployed product, not by reading the diff. The agent handles everything between “code committed” and “it’s live.”

The full flow

Branch from main. Feature work goes on a branch. The branch name is descriptive and time-bound enough to find in the reflog if needed.
Open the PR. The orchestrating session creates the PR via gh pr create. The PR description summarizes what changed and why.
Wait for CI green. Do not merge a red build. If CI fails, diagnose and fix — don’t ask JD.
Merge. The orchestrating session merges when CI is green.
Wait for prod deploy. For Vercel, the deploy triggers automatically on merge; watch the deploy status via gh pr view or the Vercel dashboard.
Smoke the prod URL. Verify that the specific thing that changed actually works in production. Not “the site loads” — the specific feature, the specific API endpoint, the specific behavior JD will encounter.
Tell JD it’s live with the prod URL and one sentence on what to try. JD evaluates by using it.

JD never sees the PR. JD never reviews the diff. JD clicks around the deployed product and decides whether it works.

Why this rule exists

In May 2026, two PRs were shipped and JD was pinged with “merge them in order” and “tell me when you want me to restart the bridge.” JD replied: “I never review PRs. I never review PRs.”

The framing of “please review and merge” puts JD in a reviewer role for a system where most PRs are agent-generated, agent-tested, and agent-merged. The cost of that framing is JD’s attention spent on tasks the agent should own. Across many PRs, that attention cost adds up.

The rule: the orchestrating session is the reviewer. If code review is needed, spawn a code-reviewer agent (or use the /code-review skill). JD’s interface is the live product.

What the orchestrating session owns

Everything from branch creation to “it’s live”:

Merge conflicts: resolve them. Rebase or merge main into the branch, hand-edit semantic conflicts if needed. Don’t ask JD to resolve.
CI failures: diagnose from the GitHub Actions logs or Vercel build logs. Fix and re-push. Don’t ask JD to debug.
Deployment failures: check Vercel logs. Fix the build. Don’t ask JD.
Post-deploy restart: if a process consumes the merged code (bridge, cron, LaunchAgent), restart it. Verify the new behavior is actually live before telling JD it’s deployed.

The test: could JD walk away from the terminal after saying “ship this” and return to find it live in prod? If yes, the workflow is right. If there’s a step where the agent waits for JD input before continuing, that step is a gap.

CI gates

The CI suite is the pre-merge quality bar. As of June 2026, the required checks for the agent-system repo include:

pytest — all ~96 test files (not a subset)
import-smoke — every agent package imports successfully (catches zombie crons)
registry-check — the agent registry matches the filesystem
grep-gates — no hand-rolled Telegram notifications outside the shared notify module; no silent exception swallows

Each gate must pass before merge. Adding a new gate that catches a real class of bug is as valuable as fixing an existing bug — it prevents the whole class from ever reaching main.

Resolving the “no push-to-prod” tension

Ship-to-prod requires that CI, merge, and deployment are trustworthy enough that the operator is comfortable with the flow. Two things that build that trust:

Gate discipline. Every CI check is non-negotiable. If CI says a test failed, the test is fixed before merge — not --no-verify’d around. --no-verify is banned unless JD explicitly requests it.

Smoke testing before announcement. Don’t tell JD it’s live until you’ve verified the specific thing he cares about works. “Deployed” is not “working.” Check the prod URL, make the API call, click the button.

When something is on fire in prod

The ship-to-prod workflow is not “ship first, fix later.” If a prod smoke reveals a regression, the flow is:

Roll back or hotfix the specific issue
CI green
Re-deploy
Re-smoke
Tell JD it’s live

Not: “deployed, there’s a known issue with X, working on it.” That sequence puts JD in the position of knowing about a broken prod with no timeline. Fix first, announce when it’s working.

Next: How the system reads its own logs, proposes ranked fixes, and ships them via isolated build agents. The self-healing loop pattern.