Skip to content

Drift Detector

A Claude Code plugin that scores every assistant turn for behavioral drift from your active output contract and steers it back automatically.

plugin-validate License: MIT Claude Code plugin DeepWiki

Install

/plugin marketplace add 88plug/drift-detector
/plugin install drift-detector@drift-detector

Then wire the status-line badge:

bash "$(/plugin path drift-detector)/install.sh"

Quickstart

> from now on answer in caveman style: terse, no preamble, no hedging
> /drift:status

You'll see caveman | 12% | ok. When a reply relapses into hedging or verbosity, the badge flips to DRIFT 98% and your next prompt is quietly reminded to tighten up.

Note

F1=0.9973 on 1,283 real-corpus entries (fp=0, tp=375, fn=2, tn=906). 190-session synthetic eval: 100% accuracy, FP=0. Adversarial test suite included. The FP=0 constraint was held across all 21 tuning rounds without exception.

Why / who

For anyone who sets a strong output contract and watches the model erode it over a long session: persona work, strict formatting, compressed-output modes, red-team scripts that must stay in voice. Turns "it stopped listening" into a number you can see and act on.

Features

Feature What it does
Per-turn scoring Deterministic 0–100 drift score on every Stop
Live badge Status-line segment, composes with your existing statusline
One-shot nudge Next prompt gets a correction reminder only when drifted — never nags
Profiles caveman, strict-instructions, persona, plus your own
Morin trajectory Adaptive vs degenerative classification, velocity, chronic subclinical
Repeating-spike detection Catches relapsing alternating-drift patterns, not just sustained runs
DCD pipeline Deferred Correction Detection: scans N+1…N+10 for delayed user feedback
ExtraTree classifier ML stage stacked on rules, GroupKFold/5, t=0.58
MCP tools drift_status, drift_recent, drift_explain (read-only)
Commands /drift:status, report, profile, reset, debug

Metrics

Tuned through 21 scientific-method rounds on a 1,283-entry real-corpus extracted from production sessions:

Round F1 Notes
R0 0.21 Baseline
R18 0.633 22-feature LR classifier
R19 0.9543 ExtraTree 43-feature + DCD steps=8
R20 0.977 11 new classify_user_reply patterns + DCD steps=10
R21 0.9973 17 patterns + exact-match gate + URL gate — ceiling reached

Two irreducible FNs: one credential provision in ok context and one bare "Try now" indistinguishable without session context. Precision = 1.000 (fp=0).

See Eval & Tuning for the full 21-round campaign.

Commands

  • /drift:status — live score, verdict, drift rate, trend
  • /drift:report — per-turn history and dominant offenders
  • /drift:profile [name|list|show] — switch or inspect the active profile
  • /drift:reset [session|all] — clear live state
  • /drift:debug [on|off] — toggle structured hook logging

How it works

SessionStart initializes writable state. Stop scores the last assistant turn (src/lib/drift_score.py), persists to a WAL-mode SQLite index, writes the badge, and drops a marker if the turn drifted. UserPromptSubmit consumes that marker and injects a correction only when drift is degenerative.

The scoring engine is deterministic and dependency-free: stdlib only, no network calls. See Algorithm for the full pipeline and Eval for the 21-round scientific-method campaign.

Development

make selftest    # engine self-test
make test        # pytest suite
make validate    # full plugin CI gate (66 checks)
python3 scripts/adversarial_classify_test.py   # 37-case adversarial unit test

See Eval & Tuning for the 21-round tuning ledger.

License

MIT © 2026 88plug