Drift Detector¶

A Claude Code plugin that scores every assistant turn for behavioral drift from your active output contract and steers it back automatically.

Install¶

/plugin marketplace add 88plug/drift-detector
/plugin install drift-detector@drift-detector

Then wire the status-line badge:

bash "$(/plugin path drift-detector)/install.sh"

Quickstart¶

> from now on answer in caveman style: terse, no preamble, no hedging
> /drift:status

You'll see caveman | 12% | ok. When a reply relapses into hedging or verbosity, the badge flips to DRIFT 98% and your next prompt is quietly reminded to tighten up.

Note

F1=0.9973 on 1,283 real-corpus entries (fp=0, tp=375, fn=2, tn=906). 190-session synthetic eval: 100% accuracy, FP=0. Adversarial test suite included. The FP=0 constraint was held across all 21 tuning rounds without exception.

Why / who¶

For anyone who sets a strong output contract and watches the model erode it over a long session: persona work, strict formatting, compressed-output modes, red-team scripts that must stay in voice. Turns "it stopped listening" into a number you can see and act on.

Features¶

Feature	What it does
Per-turn scoring	Deterministic 0–100 drift score on every Stop
Live badge	Status-line segment, composes with your existing statusline
One-shot nudge	Next prompt gets a correction reminder only when drifted — never nags
Profiles	`caveman`, `strict-instructions`, `persona`, plus your own
Morin trajectory	Adaptive vs degenerative classification, velocity, chronic subclinical
Repeating-spike detection	Catches relapsing alternating-drift patterns, not just sustained runs
DCD pipeline	Deferred Correction Detection: scans N+1…N+10 for delayed user feedback
ExtraTree classifier	ML stage stacked on rules, GroupKFold/5, t=0.58
MCP tools	`drift_status`, `drift_recent`, `drift_explain` (read-only)
Commands	`/drift:status`, `report`, `profile`, `reset`, `debug`

Metrics¶

Tuned through 21 scientific-method rounds on a 1,283-entry real-corpus extracted from production sessions:

Round	F1	Notes
R0	0.21	Baseline
R18	0.633	22-feature LR classifier
R19	0.9543	ExtraTree 43-feature + DCD steps=8
R20	0.977	11 new classify_user_reply patterns + DCD steps=10
R21	0.9973	17 patterns + exact-match gate + URL gate — ceiling reached

Two irreducible FNs: one credential provision in ok context and one bare "Try now" indistinguishable without session context. Precision = 1.000 (fp=0).

See Eval & Tuning for the full 21-round campaign.

Commands¶

/drift:status — live score, verdict, drift rate, trend
/drift:report — per-turn history and dominant offenders
/drift:profile [name|list|show] — switch or inspect the active profile
/drift:reset [session|all] — clear live state
/drift:debug [on|off] — toggle structured hook logging

How it works¶

SessionStart initializes writable state. Stop scores the last assistant turn (src/lib/drift_score.py), persists to a WAL-mode SQLite index, writes the badge, and drops a marker if the turn drifted. UserPromptSubmit consumes that marker and injects a correction only when drift is degenerative.

The scoring engine is deterministic and dependency-free: stdlib only, no network calls. See Algorithm for the full pipeline and Eval for the 21-round scientific-method campaign.

Development¶

make selftest    # engine self-test
make test        # pytest suite
make validate    # full plugin CI gate (66 checks)
python3 scripts/adversarial_classify_test.py   # 37-case adversarial unit test

See Eval & Tuning for the 21-round tuning ledger.