Recall — Memory Architecture Overview

Local-first agent memory service that manages the full lifecycle of AI agent memories — from raw daily logs through compacted summaries to distilled wisdom — with semantic search over all of it.

System Layers

┌─────────────────────────────────────────────┐
│                 CLI  (recall)               │
├─────────────────────────────────────────────┤
│              MemoryService API              │
├────────┬────────┬─────────┬───────┬─────────┤
│ Files  │ Search │ Compact │ Wiki  │  Dream  │
├────────┴────────┴─────────┴───────┴─────────┤
│              Abstraction Layer              │
│     Storage · Embeddings · Index · Model    │
└─────────────────────────────────────────────┘

Recall manages two parallel views of memory: a temporal stream (daily logs rolled up by compaction) and a topical wiki of cross-linked pages. Compaction maintains the first; the wiki and dreaming maintain the second.

Layer	Responsibility	Source
CLI	Command parsing, output formatting	`packages/core/src/cli.ts`
MemoryService	Top-level orchestrator, composes all subsystems	`packages/core/src/service.ts`
MemoryFiles	CRUD for daily, weekly, monthly, wiki, and wisdom files	`packages/core/src/files.ts`
SearchService	Two-phase hierarchical search with BM25 + vector fusion	`packages/core/src/search.ts`
Compactor	Daily→weekly→monthly→wisdom compression pipeline	`packages/core/src/compactor.ts`
WikiEngine	Topical knowledge pages — stub/append/synthesize/lint/merge/supersede	`packages/core/src/wiki-engine.ts`
DreamEngine	Asynchronous cross-temporal synthesis; writes wiki pages and supersessions	`packages/core/src/dream-engine.ts`
IdentityLoader	Agent identity frame (role/voice) used to steer synthesis	`packages/core/src/identity.ts`
Abstractions	Pluggable interfaces for storage, embeddings, indexing, and LLM	`packages/core/src/interfaces/`

Memory File Layout

<memory-root>/
├── WISDOM.md                    # Distilled principles + Knowledge Map (permanent)
├── DREAMS.md                    # Dream diary (append-only)
├── IDENTITY.md                  # Agent identity frame (role, voice)
├── memory/
│   ├── 2026-04-01.md            # Daily logs (raw, never deleted)
│   ├── 2026-04-02.md
│   ├── weekly/
│   │   ├── 2026-W13.md          # Weekly summaries (with pointers to dailies)
│   │   └── 2026-W14.md
│   ├── monthly/
│   │   ├── 2026-03.md           # Monthly summaries (with pointers to weeklies)
│   │   └── 2026-04.md
│   └── wiki/                    # Topical knowledge pages (one per subject)
│       ├── index.md
│       └── auth-middleware.md
└── .index/                      # Vectra vector index

All memory files use YAML frontmatter for metadata. Parent nodes (weekly/monthly) include a pointers array referencing their children and a salience map of per-child weights.

The earlier typed memory files (user_*.md, feedback_*.md, project_*.md, reference_*.md) have been folded into the wiki: their four types became wiki categories, and wiki migrate-typed-memories converts any legacy files in place.

Memory Lifecycle

Recall implements an eidetic (lossless) memory model — raw memories are never deleted. Compaction adds compressed layers on top without removing detail.

Daily logs  ──(7+ days)──►  Weekly summary  ──(4+ weeks)──►  Monthly summary
    │                            │                                │
    │                            │                                │
    ▼                            ▼                                ▼
  (permanent)              (permanent,                      (permanent,
                            pointers to dailies,             pointers to weeklies,
                            salience weights,                salience weights,
                            dual embeddings)                 dual embeddings)
                                                                  │
                                                                  ▼
                                                          Wisdom distillation

Compaction Pipeline

Daily → Weekly: Collects dailies for a completed ISO week, sends to LLM, produces structured summary. Extracts typed memory candidates (decisions, feedback, project context). Stores pointers to source dailies and computes salience weights per child.
Weekly → Monthly: Aggregates weeklies for a calendar month into a monthly summary. More aggressive compression (~30% of input). Stores pointers to source weeklies.
Wisdom Distillation: Merges insights from monthlies and typed memories into WISDOM.md. Deduplicates, removes stale entries, caps at configurable max entries (default 20).

Topical Knowledge → the Wiki

Durable knowledge that isn’t tied to a single day — rules, decisions, facts about people and systems — lives in the wiki rather than the temporal hierarchy. The agent writes short stub pages in real time; dreaming synthesizes them into richer pages as sources accumulate, and keeps them current via supersession. The four former typed memory types are now wiki categories:

Category	Purpose	Example
`entity`	People, teams, systems, organizations	“Northstar Gridworks — primary infra vendor”
`concept`	Rules / behavioral guidance (was `feedback`)	“Don’t mock the database in integration tests”
`project`	Ongoing work context (was `project`)	“Auth middleware: cookie→JWT migration, compliance-driven”
`reference`	Pointers to external systems (was `reference`)	“Pipeline bugs tracked in Linear project INGEST”
`theme`	Cross-cutting patterns synthesized by dreaming	“Migrations consistently slip when compliance gates them”

See The LLM Wiki for the full page anatomy, the stub→synthesis lifecycle, and supersession.

Search Architecture

Recall uses a two-phase hierarchical search that balances high recall (don’t miss relevant memories) with high precision (rank the best ones first).

Phase 1a — Candidate Retrieval (Parallel)

Three retrieval paths run concurrently:

Path	What it searches	Why
Parent vector search	`#agg` and `#summary` embeddings on weekly/monthly nodes	Coarse routing — finds relevant time periods
Raw direct search	Embeddings on individual daily/typed memories	Safety net — catches niche memories aggregation might wash out
BM25 keyword search	Parent summary text	Precise for proper nouns, IDs, error codes, rare terms

Parent oversampling (default K=10) compensates for aggregation washout in parent embeddings.

Phase 1b — Pointer Expansion

For each parent node retrieved, recursively load all children via pointers frontmatter — always expanding to leaf (raw memory) level. Also includes parent summaries as separate candidates.

Phase 2 — Reranking

Every candidate is scored with a hybrid formula:

score = w_embed · cosine(query, embedding)
      + w_bm25  · BM25(query, text)
      + w_parent · parent_score
      × temporal_affinity

Default weights: embed=0.5, bm25=0.3, parent=0.2 (configurable, must sum to 1.0).

Results are returned with a resultType (RAW or SUMMARY) and optional scoreBreakdown for debugging.

Temporal Affinity

When a query contains a time reference (“last March”, “Q3 2024”, “two weeks ago”), memories closer to that date receive a multiplicative boost:

temporal_affinity = exp(-|memory_date - reference_date| / σ)

Default σ = 30 days. Without a temporal reference, affinity is neutral (1.0 for all memories). There is no recency decay — old memories are never penalized just for age.

Temporal references are extracted via regex patterns (explicit dates, relative references, quarters, named periods). The most specific reference wins when multiple are detected. Callers can override via QueryOptions.temporalReference.

Supporting Search Features

Hybrid retrieval: Vectra’s combined semantic + BM25 retrieval is enabled, so dense and lexical matches are fused at the index level before reranking.
Catalog matching (catalog.ts): Frontmatter keyword overlap on wiki pages and any remaining typed memories — fast, no embeddings, catches exact-name hits.
Temporal embeddings: Chunk text is prefixed with [as of YYYY-MM-DD] before embedding, so a memory’s date is part of its vector — sharpening time-sensitive retrieval without relying on regex extraction alone.
Wiki score boost (wiki.scoreBoost, default 0.9): A multiplier applied to wiki-page retrieval scores. The default is a deliberate de-boost so a synthesized page can’t outrank the immutable daily that holds a specific fact — see The LLM Wiki.
Query expansion (query-expansion.ts): Generates 1–3 query variations (original, keyword-only, noun phrases) and merges results by URI. No LLM needed.

Embeddings & Indexing

Dual Embeddings Per Parent

Each parent node (weekly/monthly) stores two embeddings in the index:

Entry	URI Pattern	Source	Purpose
Summary	`weekly/2026-W15#summary`	Embedding of generated summary text	Gap coverage — captures editorially chosen importance
Aggregated	`weekly/2026-W15#agg`	Normalized weighted mean of child embeddings	Coarse routing — preserves semantic spread of children

Summary alone can miss niche details. Aggregated alone washes out when children are diverse. Together they cast a wider net during Phase 1 retrieval.

Salience-Weighted Aggregation

Aggregated embeddings use salience weights rather than uniform averaging:

e_agg = normalize( Σ salience_i × e_i )

Three salience signals (computed at compaction time, stored in parent frontmatter):

Signal	Weight	Method
Token count	0.4	Token counter — longer entries are more substantive
Entity density	0.3	NER + regex + vocabulary — more unique entities = semantically richer
Decision markers	0.3	Regex patterns (“decided to”, “switched to”, “chose X over Y”)

Alternative aggregation strategies (uniform, recency) are configurable but salience is the default.

Default Embeddings Model

LocalEmbeddings uses @huggingface/transformers with Xenova/all-MiniLM-L6-v2:

384-dimensional vectors
Fully offline, no API keys
Lazy-loaded on first use

Pluggable Abstractions

All core subsystems are behind interfaces, swappable at configuration time:

Interface	Default Implementation	Planned
FileStorage	`LocalFileStorage` (local disk)	SQLite
EmbeddingsModel	`LocalEmbeddings` (transformers.js, offline)	OpenAI, Anthropic
MemoryIndex	`VectraIndex` (Vectra local vector DB)	SQLite
MemoryModel	`CliAgentModel` (spawns CLI agent subprocess)	OpenAI, Anthropic, OSS

CliAgentModel

LLM operations (compaction, wisdom distillation) are delegated to a CLI coding agent via subprocess:

Agent name	CLI command
`"claude"`	`claude --print` (via stdin)
`"codex"`	`codex` (custom flags)
`"copilot"`	`copilot` (custom flags)

No API keys needed — the host agent’s CLI handles authentication.

CLI Surface

recall [--dir <path>] [--json] [--verbose] <command>

Command	Description
`search <query>`	Search memories (`--results`, `--max-chunks`, `--max-tokens`, `--recency-depth`, `--typed-memory-boost`, `--wiki-boost`, `--wiki-only`, `--no-wiki`, `--no-sync`)
`index`	Full rebuild of vector index
`sync`	Incremental index sync
`status`	Memory file counts and index health
`compact [level]`	Compact to level (`weekly` / `monthly` / `wisdom`) with `--dry-run` and `--agent`
`dream`	Run an asynchronous synthesis session; `dream status` shows the last run and pending signals
`wiki <subcommand>`	Manage topical knowledge pages — `list`, `show`, `stub`, `append`, `rebuild`, `merge`, `rename`, `lint`, `status` (see The LLM Wiki)
`log <entry>`	Append entry to today’s daily log
`list [type]`	List memory files
`read <file>`	Read a memory file
`watch`	Watch for changes and auto-sync, with optional `--compact` / `--dream`
`migrate`	Migrate existing memories to hierarchical architecture

Storage Budget

Three-year projection for a single agent:

Component	Estimate
Raw daily text (1,095 files)	~1.1 MB
Summary text (weekly + monthly)	~0.5 MB
Embeddings (384-dim float32, ~1,679 entries)	~2.5 MB
Index overhead (1.5× embeddings)	~1.3 MB
Typed memories + wisdom	~0.3 MB
Total	~5.7 MB

The eidetic design keeps total storage minimal because individual memory files are small (markdown text).

Dreaming System

Dreaming is an asynchronous knowledge synthesis engine that runs alongside compaction. While compaction is structural (summarize within temporal windows), dreaming is analytical — it discovers patterns across time boundaries, surfaces forgotten connections, detects contradictions, and promotes durable knowledge that compaction’s single-pass extraction missed.

How It Differs from Compaction

	Compaction	Dreaming
Trigger	Temporal boundaries (week/month end)	Scheduled (cron) or on-demand
Scope	Within one time window	Across all time windows
Input	Raw memories for a period	Search signals, entity scans, decision markers, wisdom drift
Output	Summary files in hierarchy	Wiki pages (stubs + synthesized rewrites), supersessions, contradiction flags
LLM usage	Summarization	Cross-reference analysis, gap analysis, contradiction detection, theme synthesis

Three Phases

Search signals + Entity scans + Decision markers + Staleness + Wisdom drift
    │
    ▼
Phase 1: Gather ──► Phase 2: Analyze ──► Phase 3: Write
 (signals)           (LLM synthesis)       (persist)
    │                     │                    │
    ▼                     ▼                    ▼
candidates.json     wiki ops (create/update)  memory/wiki/<slug>.md
                    supersession proposals     (+ supersedes frontmatter)
                    contradiction flags        DREAMS.md (diary)

Phase 1 — Gather: Collects signals from search logs (query patterns, hit frequency, null queries), entity frequency scans, decision-marker scans (for supersession), wiki staleness checks, and wisdom-vs-behavior drift analysis. Produces a scored candidate list.

Phase 2 — Analyze: Sends candidate clusters to the LLM via MemoryModel for cross-reference analysis, gap analysis, contradiction detection, and theme synthesis. Each analysis type has a dedicated prompt template; the output is a set of wiki operations (create a stub, synthesize a page, supersede a prior claim).

Phase 3 — Write: Applies the wiki operations — creating or rewriting memory/wiki/<slug>.md pages, recording superseded claims in their supersedes frontmatter, flagging contradictions, and appending a DREAMS.md diary entry. All output is indexed in Vectra and becomes searchable. (Legacy memory/dreams/insights/ files from earlier versions are converted to wiki pages by wiki migrate-insights.)

Signal Collection

The search service logs every query + result set to .dreams/search-log.jsonl when dreaming is enabled. From this, the engine computes:

Hit frequency — Which memories get recalled most often
Query diversity — Memories hit by many different queries are cross-cutting
Null queries — Topics the memory system can’t answer (knowledge gaps)
Temporal clusters — Which time periods are being actively investigated

File Layout

<memory-root>/
├── DREAMS.md                              # Dream diary (append-only)
├── memory/
│   └── wiki/                              # Where dreaming writes its synthesis
│       ├── auth-middleware.md             #   (pages + supersedes frontmatter)
│       └── auth-middleware-trajectory.md  #   (auto-built once a page has ≥2 supersessions)
└── .dreams/                               # Machine state (gitignored)
    ├── search-log.jsonl
    ├── candidates.json
    └── dream-state.json

CLI

recall dream                    # Run a full dreaming session
recall dream --dry-run          # Show what would be examined
recall dream status             # Last run, pending candidates, signal stats
recall watch --dream            # Enable scheduled dreaming in watch mode

For full design details, see specs/dreaming.md.

Key Dependencies

Package	Purpose
`vectra`	Vector index, storage abstractions, BM25
`@huggingface/transformers`	Local embeddings (MiniLM-L6-v2), NER for salience
`commander`	CLI framework
`gray-matter`	YAML frontmatter parsing
`gpt-tokenizer`	Token counting for budget management

Design Specs

For full details, see:

specs/memory-service.md — Service architecture, abstractions, compaction, CLI (v0.4)
specs/hierarchical-memory.md — Eidetic storage, pointers, two-phase search, salience (v0.4)
specs/dreaming.md — Asynchronous knowledge synthesis, signal collection, cross-temporal analysis (v0.2)
specs/wiki.md — Topical knowledge graph: stubs, synthesis, supersession, shared wikis (v0.4)
The LLM Wiki — Operator guide to the wiki layer and supersession
Prompts — Compaction prompt templates (daily→weekly, weekly→monthly, wisdom distillation)