Architecture¶
total-recall is a 4-layer pipeline. Each layer has one responsibility, depends only on the
layer below it, and can be tested in isolation. The shape is deliberately boring: a walker feeds
extractors, extractors feed an index, the index feeds delivery surfaces.
~/.claude/projects/<cwd-slug>/<session-uuid>.jsonl
|
v
+-----------------------------------------------+
| (a) JSONL walker lib/ |
| stream lines, parse, resolve DAG |
+-----------------------------------------------+
|
v
+------------------------+ +------------------+
| (b) Extractors | | detector/ |
| extractors/ |-->| escalation.py |
| pipeline-routed (11) | | (consumes rows, |
| standalone (6) | | no rows out) |
| + secrets scrubber | +------------------+
+------------------------+
|
v
+-----------------------------------------------+
| (c) Index index/ vec/ |
| core FTS5 + metrics sub-store |
| + operator sub-store (9 tables) |
| + sqlite-vec (optional) |
+-----------------------------------------------+
|
v <--- hooks/lib/query.py
| (hook<->index shim)
+-----------------------------------------------+
| (d) Delivery hooks/ mcp_server/ |
| skills/ commands/ |
| SessionStart v2 signpost, 26 MCP tools, |
| 2 skills, 15 slash commands |
+-----------------------------------------------+
Layer (a) — JSONL walker (lib/)¶
Responsibility. Turn a directory of append-only JSONL session files into a stream of typed
records, resolving parentUuid into a DAG and folding sidechains (isSidechain: true) under
their parent turn. Track per-file byte offsets in state.json so re-indexing is incremental.
Dependencies. Standard library only. No DB, no extractors. Pure parsing.
Why it's its own layer. Streaming + DAG resolution + offset bookkeeping is enough work that mixing it into extractor code makes both harder to test. The walker has one job: yield clean records, in DAG order, without OOMing on a 14k-line session.
Layer (b) — Extractors (extractors/)¶
Responsibility. Walk the record stream emitted by layer (a) and emit structured facts. The package ships 17 extractors + a universal scrubber, split into two execution lanes (plus an optional v0.9 LLM refinement lane).
Pipeline-routed extractors (run through extractors/pipeline.py::ALL_EXTRACTORS; each yields
Extraction rows that the ingest loop writes to extractions):
corrections.py— user redirects (text +queue-operationinterrupts).decisions.py— "going with X because Y" moments.self_corrections.py— model walks back its own previous claim within a session.progress.py— how far a given line of work actually got.domain_facts.py— durable cross-session signals (env, versions, conventions, identity).away_summaries.py— what changed while the operator was away (compaction-boundary deltas).model_corrections.py— pairs user pushback with the rejected approach (v0.3 highest-leverage signal).standing_decisions.py— durable "always X over Y" preferences.bans.py— provider/tool/pattern bans + failed attempts.goals.py— per-project goal stack with status state machine.truth_rhetoric.py— 7-category truth-assertion taxonomy.
Standalone extractors (run out-of-band from the main pipeline; each owns its own DB tables
and writes them directly via index/):
operator_profile.py— identity/role aggregates →operator_profile.voice_profile.py— voice-fingerprint stats →voice_profile.ontology.py— project / machine / vocabulary graph →projects(incl.related_projectsco-mention edges as of v0.8),machines,vocabulary.workflow.py(v0.8) — how the operator works: fan-out vocab, autonomy score, interrupt rate, planning idiom, peak hours, session shape, subagent adoption →workflow_profile.implicit_preferences.py(v0.8) — behavior-derived preferences (Edit vs Write, shell-command dominance, absence patterns, format prefs, recurring vocab) →implicit_preferences.satisfaction.py(v0.8) — bidirectional praise/frustration × prior-assistant-turn shape →satisfaction_profile(+satisfaction_meta).
Optional refinement lane (v0.9, extractors/llm/). Off by default. When TOTAL_RECALL_LLM_PROVIDER=auto AND a local ollama daemon is reachable AND the configured model is pulled, cmd_rebuild invokes refinement passes AFTER heuristic consolidation: machines NER filter, vocabulary definitions, project narratives. Local-only via ollama (cloud APIs deliberately excluded — would break the no-reupload-transcripts privacy guarantee). Heuristic baseline always wins on disagreement.
Universal scrubber. secrets.py runs over every emitted row's content plus every string
field in context (recursive). It is invoked from the orchestrator, not the extractors, so a
new extractor cannot accidentally opt out.
Dependencies. Layer (a) only. Each extractor is independent — failing one does not affect the others. Pipeline-routed extractors are pure functions from records to rows; standalone extractors write to the index directly (the "pure" guarantee applies to the pipeline lane, not the package).
Why it's its own layer. Extractors are where the heuristics live, and heuristics churn. Isolating them means new extractors are drop-in, and existing ones can be tuned without touching the walker or the index.
Layer (c) — Index (index/, vec/)¶
Responsibility. Persist extractor rows into a queryable store under
~/.claude/plugins/data/total-recall/.
index/ownsindex.db: SQLite with FTS5 virtual tables for keyword recall plus relational tables for sessions, cwds, and extracted rows.vec/ownsvec.db: an optionalsqlite-vecstore of embeddings (fastembedin-process) enabled only when the[vec]extra is installed. Vector recall is a query-time augmentation, not a replacement, for FTS5.
Dependencies. Layer (b) rows in, query API out. Knows nothing about hooks, MCP, or skills.
Why it's its own layer. Storage choices (FTS5 today, hybrid lexical+vector tomorrow, something else later) are the most likely thing to change. Keeping all DB knowledge here means the delivery layer is portable across index implementations.
Layer (d) — Delivery (hooks/, mcp_server/, skills/, commands/)¶
Responsibility. Put recalled facts in front of the model with the lowest token cost per session. Four surfaces, each optional:
hooks/—session-start-signpost.sh(passive context inject, 5s budget),user-prompt-retrieve.sh(async mid-prompt augmentation, 8s budget), andstop-index.sh+post-compact-index.sh(async reindex). Configured inhooks/hooks.json. All bash hooks sharehooks/lib/common.sh; database reads route throughhooks/lib/query.py, which is the only shim between hook code and the index.mcp_server/— 26 MCP tools total. Core v0.1 (6, inmcp_server/tools.py):recall,prior_sessions_for_cwd,find_failed_attempts,find_user_preferences,get_session_digest,search_messages. v0.3 operator-aware (17, inmcp_server/extras/*_tools.py):recall_corrections_about,get_recent_corrections,list_standing_decisions,get_decision_for_topic,check_banned,list_failed_attempts,get_active_goal,list_goals,get_past_truth_assertions,get_project_graph,get_machine_inventory,define_term,get_operator_profile,get_voice_profile,get_operator_context,assess_escalation_risk,recall_targeted. v0.8 behavioral (3, also inmcp_server/extras/*_tools.py):get_workflow_profile,get_satisfaction_profile,list_implicit_preferences.skills/—recall/(orientation guidance for using the MCP surface on demand) andspeak-like-operator/(operator voice-matching skill, runtime-populated fromget_voice_profile()).commands/— 15 slash commands for the human operator:/recall,/recall-status,/recall-inspect,/recall-rebuild,/recall-promote,/recall-metrics,/recall-cost,/recall-topics,/recall-health,/recall-check-banned,/recall-corrections,/recall-decisions,/recall-escalation,/recall-goal,/recall-operator-context.
Dependencies. Layer (c) query API only. Surfaces never reach into the walker, extractors, or raw JSONL — that keeps each surface trivially mockable in tests.
Why it's its own layer. Surfaces are the part the user actually feels, and they have very different cost/latency characteristics (hook = every session, MCP = on demand, skill = explicit). Separating them from the index lets us add/remove surfaces without touching the pipeline.
Cross-cutting concerns¶
- Read-only on transcripts. Only layer (a) opens session JSONL files, always
O_RDONLY. - Streaming everywhere. No layer ever materializes a full session in memory.
- Local-only. No layer makes outbound network calls. Embeddings run in-process.
- Convention-based discovery. The plugin manifest declares no hook/skill/command/mcp keys; Claude Code discovers them from the sibling directories under the plugin root.
Validation + observability roadmap¶
Validation harness (current)¶
The pipeline is exercised in two complementary ways:
- In-tree pytest —
tests/covers each layer with unit tests against a synthetic corpus, andtests/integration/runs against the real~/.claude/projects/corpus (read-only). Integration tests skip cleanly on machines without a corpus so the same suite runs in CI containers and on the author's laptop. - Docker validation harness (
Dockerfile.test) — a Python 3.11-slim image withjq,bash,sqlite3,mcp,click,fastembed, andsqlite-vecpre-installed. Agents mount the source at/pluginvia-vand run the full test matrix inside the container, so we catch environment-shape bugs that pure-pytest misses (missing$CLAUDE_PLUGIN_DATA, missingjq, missing optional dependencies). A 10-agent validation pass against this harness produced the post-0.1.0 bug-fix round (16 issues across HIGH/MEDIUM/LOW severity); regression tests for those fixes are pinned intests/integration/test_corpus.pyandtests/integration/test_golden_path.py.
Observability roadmap¶
Decision: native analytics over our own SQLite index. Shipped as the v0.2 metrics layer
(see section below) — turns, compactions, ingest_runs tables populated during ingest,
queried via total-recall metrics. Zero new runtime dependencies, tightest fit for a
local-only tool whose value prop is "don't re-upload the user's transcripts."
- OpenTelemetry SDK — deferred to v0.3+ pending upstream MCP SDK PR #421 (still open as of
mid-2026). Once merged, the OTel-shaped envelope already used by
recall::log_json→events.jsonlbecomes a drop-in upgrade path: same field names, real exporter behind a flag. - Langfuse — rejected. Wrong abstraction: Langfuse models LLM-caller traces and total-recall is not an LLM caller. We index transcripts the user already produced; there is no prompt/response pair we own.
The ring-buffered hook log at ${CLAUDE_PLUGIN_DATA}/total-recall/logs/hooks.log remains in
place for raw debug; structured per-event records go to logs/events.jsonl via
recall::log_json and are aggregated by metrics health.
v0.2: metrics layer¶
The v0.2 milestone added a self-contained analytics surface over the SQLite index. Three new tables (turns, compactions, ingest_runs) are populated during ingest from message.usage{} blocks (assistant records) and system.subtype=compact_boundary payloads. The total-recall metrics CLI reads them.
Modules:
- index/metrics.py — pure aggregation functions (summary / cost / sessions / topics / health).
- total_recall/cmd_metrics.py — Click group with 5 sub-subcommands.
- total_recall/cost.py — model→$/Mtok catalog with cache-read multiplier; CLI override via --rate sonnet=3/15.
- total_recall/events.py — NDJSON event emitter with 10MB rotation, used by metrics health for hook fire-rate stats.
Schema migration¶
schema_meta.schema_version goes from '1' to '2'. db.py::apply_schema is idempotent — every CREATE is IF NOT EXISTS. Existing v1 DBs auto-upgrade on next open.
v0.3: operator-aware layer¶
Thesis¶
The operator is the source of truth. Past sessions already encode the operator's standing decisions, bans, goals, voice, and recurring corrections; instead of asking the model to re-derive that context from raw history every session, v0.3 distills it into typed sub-stores and ships it to the model as a structured 1.8 KB bundle at session start. The model gets operator state up-front, not three corrections in.
New extractors (10)¶
model_corrections.py— pairs user pushback with the rejected approach (highest-leverage signal).standing_decisions.py— durable "always X over Y" preferences.bans.py— provider/tool/pattern bans + failed attempts.goals.py— per-project goal stack with status state machine.truth_rhetoric.py— 7-category truth-assertion taxonomy.operator_profile.py— identity / role aggregates (standalone, writes its own table).voice_profile.py— voice-fingerprint statistics (standalone).ontology.py— project / machine / vocabulary graph (standalone, writes three tables).self_corrections.py— model walks back its own claim within a session.away_summaries.py— what changed across compaction boundaries.
New tables (9)¶
operator_profile, voice_profile, standing_decisions, bans, failed_attempts,
goal_stack, projects, machines, vocabulary. Each table is created by its owning module
in index/ (e.g. index/operator.py, index/bans.py, index/ontology.py) using
CREATE TABLE IF NOT EXISTS, so existing DBs upgrade in place.
New MCP tools (17) and the registration pattern¶
The 17 v0.3 MCP tools live in mcp_server/extras/*_tools.py. None of them are imported by the
tool implementations — mcp_server/server.py imports each extras module as a side-effect
import, and the @mcp.tool() decorators on the contained functions register them against the
shared mcp instance at boot time. This means adding a new tool surface is one decorator + one
line in server.py; there is no central registry to keep in sync. Each tool is independently
graceful: if its backing table is missing, it returns an error-marked result rather than raising,
so an incomplete index never breaks the rest of the surface.
detector/escalation.py — sibling to extractors¶
detector/ is a peer of extractors/, not a member of it. The distinction is that extractors
produce rows from records; the detector consumes rows (and live inputs from the current
turn) to score operator-frustration risk. assess_escalation returns a numeric
ESCALATION_RISK score, a 5-state classifier (calm, mild_correction, escalated,
high_escalated, breaking_point), and one of four RecommendedAction values
(ship_as_is, trim_to_5_lines, run_command_paste_output, silence_then_act). Exposed to
the model via the assess_escalation_risk MCP tool. Scoring weights and state thresholds are
spec-frozen by research note O9; do not tune them without updating the spec and the tests in
tests/test_escalation.py.
SessionStart signpost: v1 → v2¶
v1 hand-rolled a "what's in memory" markdown block from a handful of queries. v2 makes a single
call to get_operator_context(cwd), which returns a JSON-shaped payload (identity, active goal,
top standing decisions, top bans, voice cheat sheet, recent corrections, machines) capped at
~1800 chars. The bash hook (hooks/session-start-signpost.sh) emits it via Claude Code's
additionalContext channel, so the model sees it as part of its initial context rather than
as user input.
Process model¶
First-run bootstrap¶
A fresh install can't show recall context until the index has been built. The hook detects
fresh-install via recall::is_fresh_install, then recall::start_bootstrap kicks off
total-recall index in the background using setsid + nohup so the backfill survives the
hook's 5s timeout and the session ending. A .bootstrapping lockfile (PID + timestamp, stale
after 30 min) prevents duplicate bootstraps. A one-shot .bootstrap_banner_shown marker
ensures the user only ever sees the "total-recall: backfilling your sessions in the
background" banner once; the banner itself is produced by recall::bootstrap_banner and
emitted via the same additionalContext channel as normal signpost content.
--jobs N parallel ingest¶
total-recall index --jobs N runs parse + extract across a ProcessPoolExecutor
(index/ingest.py); only the main process owns the SQLite writer, fed by as_completed() from
the pool. This sidesteps SQLite's single-writer constraint while still pinning all CPU-heavy
work to workers. On the author's real corpus this dropped a full reindex from ~22s to ~9s.
Default is 1 for incremental runs (overhead wins for small deltas) and min(cpu_count, 8) for full rebuilds.
Hook fire matrix¶
| Hook | Trigger | Mode | Budget | Notes |
|---|---|---|---|---|
session-start-signpost.sh |
SessionStart (matcher: "startup\|clear") |
sync | 5s | Defers compact\|resume events to amnesia plugin. |
user-prompt-retrieve.sh |
UserPromptSubmit |
async | 8s | Mid-prompt retrieval; never blocks the model. |
stop-index.sh |
Stop |
async | 60s | Incremental reindex of the session that just ended. |
post-compact-index.sh |
PostCompact |
async | 60s | Reindex after a compaction merges turns. |
Event pipeline¶
Every hook emits structured records through the recall::log_json <event> key=value … bash
helper (hooks/lib/common.sh). Records land in ${CLAUDE_PLUGIN_DATA}/total-recall/logs/events.jsonl
(10 MB rotation, see total_recall/events.py). total-recall metrics health aggregates them
into fire counts, p50/p95 latencies, and error rates per hook — the same surface that would
expose to OTel once PR #421 lands.