recover-from-false-positive¶
Recover a Claude Code session when Anthropic's API output classifier false-positives on legitimate engineering work — and stop it happening again.
A Claude Code plugin for the hard request failure "appears to violate our Usage
Policy / this request triggered cyber-related safeguards" — the one that kills a
turn mid-generation on dense-but-legitimate topics (mining software,
supply-chain/sigstore tooling, payment automation, infra-ops, security infra).
It's a JSON-aware .jsonl log surgeon: it removes the poisoned turns, re-stitches
the session so --resume works, and installs a guard so the next session doesn't
re-trip.
Install¶
/plugin marketplace add 88plug/claude-code-plugins
/plugin install recover-from-false-positive@88plug
Quickstart¶
The skill auto-triggers the moment the Usage-Policy / cyber-safeguards error text appears anywhere in the conversation. To run the surgeon directly:
# Fix ONLY the session that just fired (removes the trigger message + refusal,
# re-stitches the chain, backs up first). Resume cleanly afterward.
python3 ${CLAUDE_PLUGIN_ROOT}/skills/recover-from-false-positive/scripts/scrub_refusals.py --fix-active
Note
Detect-only by default. The script makes a backup tarball before writing and
refuses any change that would add a dangling parentUuid.
What it does¶
- Fix the active session — removes only the triggering message + refusal turn from the session that just fired, so it resumes as if neither happened.
- Clean old sessions — dry-run, then
--applyto remove accumulated refusal turns machine-wide. Chain-safe; old sessions stay resumable. - Prevent recurrence — classifies the trigger and installs a
## Cyber-safeguard false positives — READ FIRSTguard in the rightCLAUDE.md, naming the actual trigger files and vocabulary.
How the automatic pipeline works¶
The bundled hooks make recovery and prevention hands-free:
StopFailure→ detect — when a turn dies on the classifier, a pure-Python hook reads the live transcript, extracts the trigger vocabulary, pre-classifies it (project-only vs cross-project), and writes~/.claude/.fp-state.json. No model generation, so it can't itself re-trip.UserPromptSubmit→ recover — on your next prompt, if that state file exists, the recovery steps (repair the log, classify with edgar-morin, update the rightCLAUDE.md) are injected and the state file is consumed. It also always injects a short neutral generation guideline.SubagentStart→ frame — every subagent gets the neutral generation guideline so spawned prompts don't trip;Explore/Planadditionally get the fullCLAUDE.mdre-injected (they skip it by default).
Why not sed -i '/violate/d'¶
Claude Code .jsonl logs are a linked list — every turn carries a uuid and a
parentUuid pointing at the previous turn. Deleting a line with sed orphans the
next turn (its parentUuid points at a uuid that no longer exists), which breaks
claude --resume. This script re-stitches the chain — rewiring each orphan's
parentUuid up to the removed turn's parent — before dropping lines.
Clean all old sessions¶
# dry-run first; confirm delta=0 for every file, then apply
python3 ${CLAUDE_PLUGIN_ROOT}/skills/recover-from-false-positive/scripts/scrub_refusals.py
python3 ${CLAUDE_PLUGIN_ROOT}/skills/recover-from-false-positive/scripts/scrub_refusals.py --apply
What it bundles¶
- A skill (
recover-from-false-positive) withscripts/scrub_refusals.py— the chain-re-stitching log surgeon — andreferences/prevention.md, per-repoCLAUDE.mdguard templates (supply-chain/sigstore, mining/PoW, BTCPay/Lightning, chaos/infra-ops). - Four hooks (
hooks/hooks.json):detect-false-positive.py(StopFailure),inject-recovery-context.sh(UserPromptSubmit),inject-subagent-framing.shandinject-claudemd-into-subagents.sh(SubagentStart).
Note
The hooks run automatically once the plugin is enabled (Claude Code lists them
in the install panel). They are read-only except for ~/.claude/.fp-state.json,
which the detect hook writes and the recovery hook consumes. The
edgar-morin/start_complex_reasoning classification step expects that
reasoning tool to be available; without it, fall back to the skill's manual
classification matrix.
Recovery & the auto-fallback limitation¶
Important
Claude Code has no in-place auto-fallback to another model on this error.
Its only refusal auto-fallback is hardwired Fable 5 → Opus; fallbackModel
covers availability errors (overload/unavailable) only — not policy refusals;
and no hook can switch the active model. So a trip ends the turn, and the
switch is a manual user action.
When a turn is blocked, the un-stick path is:
- This plugin's detect hook repairs the poisoned log so
--resumestays safe. - Run
/model claude-sonnet-4-6and re-send — a same-generation, lower-risk model often clears it. - Or start a new session — context saturation is the usual real cause.
For fully automatic recovery (no manual step), two out-of-band options work — opt in yourself, this plugin does not ship them by default:
- Local proxy at
ANTHROPIC_BASE_URLthat retries a refused request on the next model in a fallback chain (e.g. Opus 4.6 → Sonnet 4.6), transparent to the CLI and before the transcript is poisoned. Best for interactive use. - Headless loop for autonomous runs: wrap
claude -p … --output-format json, detectresult.is_error/ the Usage-Policy string, and relaunch with--model claude-sonnet-4-6(loop-guarded to avoid runaway relaunches).
License¶
FSL-1.1-ALv2 — Functional Source License, converts to Apache-2.0.