Skip to content

Research ledger

Every default in this plugin was chosen falsification-first — measured against a baseline, replicated before believing, with the dead ends kept in a DO-NOT-RE-ATTACK log. The two full ledgers live at the repo root (they are large and best read on GitHub):

  • EXPERIMENTS.md — the falsification ledger: 7 experiments + the refutation log.
  • INVENTIONS.md — the invention slate, each mechanism grounded in a named thinker.

The headline results

claim result how
Grounding beats no-grounding ~2× landmine-catch A/B, replicated across 12 domains
Compact beats heavy every mass-adding variant lost 10-round hill-climb + tournament
Trigger should be model-driven recall 0.28 (keyword) → 0.92 (model) 56-prompt infra + diverse eval
Gate intercepts the irreversible step 100% / 0 bypass / 0 false-block 62 labelled tool calls, 18 bypass styles

The arc

An adversarial refute-gate killed the maximalist pitch ("the model already knows it" — it doesn't; training is stale). What survived: use recall to generate the list of things to verify, not the answer. The grounding content was then tuned falsification-first (compact + self-reflective wins; replicate before believing). The gate was hardened and measured. Finally the trigger itself was made model-driven — the plugin's own thesis applied to itself: trigger the model's training, never a static list a human guessed at.