Research ledger¶
Every default in this plugin was chosen falsification-first — measured against a baseline, replicated before believing, with the dead ends kept in a DO-NOT-RE-ATTACK log. The two full ledgers live at the repo root (they are large and best read on GitHub):
- EXPERIMENTS.md — the falsification ledger: 7 experiments + the refutation log.
- INVENTIONS.md — the invention slate, each mechanism grounded in a named thinker.
The headline results¶
| claim | result | how |
|---|---|---|
| Grounding beats no-grounding | ~2× landmine-catch | A/B, replicated across 12 domains |
| Compact beats heavy | every mass-adding variant lost | 10-round hill-climb + tournament |
| Trigger should be model-driven | recall 0.28 (keyword) → 0.92 (model) | 56-prompt infra + diverse eval |
| Gate intercepts the irreversible step | 100% / 0 bypass / 0 false-block | 62 labelled tool calls, 18 bypass styles |
The arc¶
An adversarial refute-gate killed the maximalist pitch ("the model already knows it" — it doesn't; training is stale). What survived: use recall to generate the list of things to verify, not the answer. The grounding content was then tuned falsification-first (compact + self-reflective wins; replicate before believing). The gate was hardened and measured. Finally the trigger itself was made model-driven — the plugin's own thesis applied to itself: trigger the model's training, never a static list a human guessed at.