The IP-Safety Gate: a Hard Wall in Code

Tier 3 · What we built 7 min read

Before this, read:

AgentTree Merch: the AI operator — what the gate protects
Root-cause-first as a build process — the design philosophy behind fail-closed gates

Printful, Printify, Etsy, and Amazon all permanently ban repeat IP offenders. Not suspended — banned. The store and every product on it disappears. A single Harry Potter tee slipping through costs more than a lawsuit settlement; it costs the store itself.

That’s why the IP-safety gate is the first thing designed in agents/merch/ and the only module that gets its own section in the BUILD-PLAN.md: “the design pipeline must NEVER generate or list a concept that has not passed screen_concept().”

How it works

The gate lives at agents/merch/ip_safety.py. It exposes one public function:

screen_concept(concept, slogan="", *, require_llm=True, model="sonnet")
    -> IPVerdict(safe: bool, score: int, reasons: [str], layer: str)

Two layers run in sequence: cheap first, expensive second.

Blocklist pre-filter (deterministic, free, runs first)

A hardcoded denylist of brand names, celebrity names, sports teams, character/franchise names, bands, and IP-risk phrases (“logo”, “trademark”, “in the style of [named artist]”, “copyright”). Each entry matches as a whole-word, case-insensitive substring.

The denylist is extended by a YAML overlay at ~/clawd/domains/merch/state/ip-blocklist.yaml — no code change needed to add new entries.

The blocklist alone can produce a reject (it is a high-precision filter). It cannot produce a pass. Passing requires the LLM tier.
LLM IP-risk classifier (Sonnet tier, ~$0.001/call)

The prompt: “Does this concept reference any real brand, person, character, logo, slogan, band, or copyrighted work? Score 0–10. Anything ≥ 3 → reject. Refuse on doubt.”

The score threshold is intentionally low. A false reject costs one wasted concept. A false pass costs the store.

The fail-closed contract

The most important property of this gate is what it does when it can’t decide.

# From ip_safety.py:
# FAIL-CLOSED CONTRACT (non-negotiable): if the LLM is unavailable / errors /
# the output can't be parsed, the verdict is REJECT, never "safe".

If the LLM is unavailable, the answer is REJECT. If the response can’t be parsed, REJECT. If require_llm=True (the production default) and the LLM tier didn’t run, the blocklist pass alone is not enough — REJECT.

This is the opposite of how most gating logic works. Most gates assume the absence of a signal means safety. This gate assumes the absence of a signal means uncertainty, and uncertainty means no.

What it caught

During the initial testing of the design drop pipeline, the gate caught a concept that referenced “Harry Potter” — not by name in the test prompt, but by design theme that implied the IP. The blocklist hit before any image generation cost was incurred.

The CHANGELOG entry for 2026-06-08 17:19 notes: “IP-safety gate (blocklist + fail-closed LLM classifier; caught ‘Harry Potter’ unnamed in testing).”

“Unnamed in testing” is the key detail. The concept didn’t use the trademarked name directly. The LLM classifier recognized the implied reference and scored it above the threshold. That’s the case a deterministic blocklist alone can’t catch, and it’s precisely why two layers are better than one.

Integration in the design pipeline

Every concept generated by design_agent.py hits the gate before any image spend:

# From design_agent.py — no Ideogram call happens before this
for concept in mined_concepts:
    verdict = ip_safety.screen_concept(concept["theme"], concept.get("slogan", ""))
    if not verdict.safe:
        log.warning("IP gate rejected %r — reasons: %s", concept["theme"], verdict.reasons)
        continue
    # Only safe concepts reach Ideogram
    image_url = generate_image(concept)

The cost ordering matters. The gate is cheap (a few cents per batch, mostly free on the blocklist layer). Image generation on Ideogram costs real money per call. Running the cheap gate first means IP-unsafe concepts never trigger paid image generation.

The operator’s decide() function enforces the same gate at the decision level — no automated action can skip or bypass the IP check. This is one of the three hard walls tested in CI.

Extending the blocklist

Adding new blocked terms requires no code change. Edit the YAML overlay:

extra_terms:
  - "nintendo"
  - "pokemon"
  - "starbucks"
extra_phrases:
  - "just do it"
  - "think different"

The gate reads the overlay on every call. New entries take effect immediately without a restart or deployment.

What this pattern teaches

The IP gate is a small module with an outsized lesson: the cost of a wrong answer is not symmetric. When the downside of a false positive (wasted concept) and the downside of a false negative (banned store, lawsuit) are this different, the right threshold is not “50/50.” It is “reject unless you’re confident.”

Every autonomous system that touches legal, financial, or safety-adjacent decisions should start with that asymmetry analysis. The gate isn’t conservative because of timidity. It’s conservative because the math is unambiguous.

Next: The gate held on the first real order. What else the first sale found — three fulfillment bugs caught live.