docs/PHASE4: formulate — memory.jsonl + startup injection + :memory meta

Phase 4 formulate manifest. Three pillars per PHASE0 §11 row 4: memory.jsonl persistent cross-session store, startup context injection into the system prompt, and the :memory management surface + opt-in :memory summarize for candidate extraction. Resolutions baked in via §2: - Storage: append-only JSONL at <history.dir>/memory.jsonl - Format: {id, ts, kind, content, tags?, source?} - Kinds: fact / pref / context (lightly typed v1) - Forget: tombstone append, resolve at load (set-based) - Cadence: manual :memory summarize only in v1; auto-trigger Q-listed - Inject: dynamic [background] block on system prompt, capped at 2000 chars by default; LRU-by-ts selection if over-budget - Order: DEFAULT → MCP block → [background] → NORRIS suffix (Norris last so it dominates when active) New module surfaces: history.lua M.open_memory / memory:add / memory:forget / M.load_memory context.lua ctx.memory_items + [background] composer repl.lua :remember, :memory add/list/forget/clear/inject/summarize config.lua commented-out memory = {...} example Open questions (Q31-Q36) tracked in §11: Q31 auto-summarize trigger (manual v1; auto-on-quit candidate) Q32 in-place edit vs forget+re-add Q33 Norris-mode interaction (proposal: both blocks stay) Q34 split prefs into a dedicated prompt section? Q35 redaction of sensitive content during summarize Q36 duplicate detection on :memory add 5-commit roadmap in §12 (history → context → repl → summarize → config). No new module files. No substrate amendments to PHASE0 — entirely additive on top of Phase 1's history.lua pattern and Phase 3's dynamic-suffix pattern in context.lua. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 04:25:57 +00:00
parent 50666d092f
commit bea717534c
1 changed files with 348 additions and 0 deletions
@@ -0,0 +1,348 @@
+# aish — Phase 4 Manifest
+
+**Project:** aish — AI-augmented conversational shell
+**Document:** Phase 4 Requirements, Architecture & Design Decisions
+**Status:** Formulate (pre-analyze)
+**Date:** 2026-05-13
+
+PHASE0 is the locked substrate; PHASE1, PHASE2, PHASE3 are layered on top.
+This manifest specifies what Phase 4 adds — **cross-session memory** — and
+the user-facing surface for managing it.
+
+---
+
+## 1. Scope of Phase 4
+
+Three pillars per PHASE0.md §11 row 4:
+
+1. **`memory.jsonl` persistent store** — a single append-only file
+   (`<config.history.dir>/memory.jsonl`) carrying user-curated facts,
+   preferences, and project context that survive aish restarts. Same
+   storage convention as session logs but a separate file because the
+   read pattern (load at startup) and write pattern (curated only)
+   differ from session logs (append-every-turn).
+
+2. **Startup context injection** — at REPL boot, recent memory items
+   are loaded into the live `Context` so the model sees them on the
+   very first turn. Injection is bounded (token budget) and visible
+   to the user via `:memory list`.
+
+3. **`:memory` management surface + automatic candidate extraction** —
+   meta commands for `add`, `list`, `forget`, `clear`, plus an opt-in
+   summarizer that runs at session end (or on demand) extracting
+   candidate facts from the session log for the user to triage into
+   memory.
+
+**Phase 4 is done when:**
+
+- `:remember <text>` (alias for `:memory add <text>`) writes a line to
+  `memory.jsonl` and the next REPL boot sees it in context.
+- `:memory list` shows current memory items with their IDs and ages.
+- `:memory forget <id>` removes one item; `:memory clear` removes all
+  (with confirm).
+- At startup, the top-N most recent memory items are prepended to the
+  Context as a single "background:" block (configurable cap).
+- `:memory summarize` runs the active model over the current session
+  log and proposes candidate memory items; the user accepts/rejects
+  per-candidate via prompt.
+- Existing configs without a `memory` section behave exactly like
+  Phase 3 (no startup injection, no auto-summarize).
+
+---
+
+## 2. Technology Decisions (delta from Phase 3)
+
+| Decision | Choice | Rationale |
+|---|---|---|
+| Storage format | Append-only JSONL, one item per line | Same convention as Phase 1's session logs. Greppable, robust to truncation, no parser dependency beyond vendored dkjson. |
+| Storage location | `<config.history.dir>/memory.jsonl` (sibling to `sessions/`) | Co-located with session logs; users can back up one directory. Defaults to `~/.local/share/aish/memory.jsonl`. |
+| Memory-item shape | `{id, ts, kind, content, tags?, source?}` | `id` is monotonic int (counter persisted in `memory.id`); `kind ∈ {"fact","pref","context"}` lightly typed for future routing; `content` is the body text; optional `tags` array; optional `source` carrying session-id provenance when auto-extracted. |
+| Forget semantics | **Append a tombstone**, don't rewrite the file (`{id, ts, kind:"forget", target:<other_id>}`) | Append-only preserves history. `M.load_memory` resolves tombstones during read — silently drops any item whose `id` appears as a forget-target. `:memory clear` writes one tombstone per active item; could also support a wildcard forget. |
+| Auto-summarize cadence | **Manual only in v1** (`:memory summarize`). Auto-trigger on `:quit` or by token count is Q-list material. | Conservative; users opt in. Avoids burning tokens on every session end. Manual surface lets the user QA candidates before they land. |
+| Summarizer model | The `fast` preset by default (cheap; quality good-enough for extraction); configurable via `cfg.memory.summarizer_model` | Summarization is recall over precision — fast model's tendency to err on the side of inclusion is fine because the user filters per-candidate. |
+| Startup injection mechanism | A new dynamic block on the system prompt, appended by `context.to_messages()` when `ctx.memory_items` is non-empty | Same hybrid-prompt pattern as Phase 2's MCP block and Phase 3's NORRIS suffix. No new context structure beyond a list on the Context. |
+| Injection budget | `cfg.memory.inject_max_chars` (default 2000 chars total — roughly 500 tokens) | Cap so memory doesn't eat the whole context. LRU-by-`ts` selection if items exceed budget. |
+| Pruning policy | Manual `:memory forget` + optional `cfg.memory.prune_older_than_days` (default unset — no auto-pruning) | Conservative defaults; user owns the lifecycle. |
+| Interaction with sessions | `memory.jsonl` is independent of `sessions/*.jsonl`. Session JSONL stays the per-conversation log; memory is the curated cross-session knowledge | Distinct concerns. Session log answers "what did we talk about last Tuesday?"; memory answers "what does aish know about me/this-project?". |
+| Concurrency | Single-writer assumed (one aish process per memory dir). Reader is the same process | Same assumption as session logs. Multi-process memory sharing is out of scope. |
+
+---
+
+## 3. Module Changes
+
+| File | State after Phase 3 | Phase 4 changes |
+|---|---|---|
+| `history.lua` | `M.open(path, meta)`, `session:append(turn)`, `M.load(path)`, `M.list_sessions(dir)` | Add memory functions alongside session functions: `M.open_memory(path) -> memory_handle`; `memory:add(kind, content, tags?, source?) -> id`; `memory:forget(id)`; `M.load_memory(path) -> items_table` (resolves tombstones). `memory_handle` is similar shape to `session_handle` — internal fd + monotonic counter. |
+| `context.lua` | system prompt + MCP block + NORRIS suffix toggle | Add a `memory_items` field on Context. `to_messages()` composes a dynamic "[background]" block on the system prompt when `memory_items` is non-empty AND not already in Norris mode (don't double-pile). Cap respected via the inject_max_chars budget. |
+| `repl.lua` | meta cmds + tool sub-loop + Norris driver | New meta: `:remember <text>` (shortcut for `:memory add fact <text>`); `:memory add <kind> <text>`; `:memory list`; `:memory forget <id>`; `:memory clear`; `:memory summarize`. At startup, after loading config + opening session, also open memory handle and inject the top-N items into `ctx.memory_items`. |
+| `broker.lua` | streaming chat + opts.tools/max_tokens/timeout_ms | No structural changes. Used by the summarizer (calls broker.chat with the session log as a single user turn). |
+| `config.lua` | example with mcp + safety blocks | Add commented-out `memory = { ... }` example. Default behavior is "no memory injection, no auto-summarize". |
+| `executor.lua` | unchanged | unchanged |
+| `safety.lua` | is_destructive + norris_step | unchanged |
+
+No new module files. All Phase 4 functionality grows existing files —
+mostly `history.lua` and `repl.lua`.
+
+---
+
+## 4. memory.jsonl Format
+
+```jsonl
+{"id":1,"ts":"2026-05-13T19:01:01Z","kind":"fact","content":"User prefers terse responses; no end-of-turn summaries."}
+{"id":2,"ts":"2026-05-13T19:01:35Z","kind":"pref","content":"Default to :model deep for code reasoning tasks."}
+{"id":3,"ts":"2026-05-13T19:02:00Z","kind":"context","content":"Current project: aish (LuaJIT REPL with MCP tools).","tags":["aish","luajit"]}
+{"id":4,"ts":"2026-05-13T20:00:00Z","kind":"forget","target":2}
+```
+
+After `load_memory`, item `id=2` is dropped because of the tombstone.
+Active items: 1, 3.
+
+### kind values
+
+- **`fact`** — factual statement about the user, their environment, or
+  project state.
+- **`pref`** — user preference for aish behavior (response style,
+  default model, etc.).
+- **`context`** — project / domain context that helps the model orient
+  on common tasks.
+- **`forget`** — tombstone; refers to another id via `target`.
+
+v1 is lightly typed — the model sees all kinds identically as a flat
+list in the [background] block. Future phases may route them
+differently (e.g. `pref` into a system-prompt section, `context` into
+a user-style preamble). Today they're prose.
+
+---
+
+## 5. Startup Injection
+
+When aish boots and `cfg.memory` is present (or `memory.jsonl` exists):
+
+1. `history.load_memory(path)` reads all items, applies tombstone
+   resolution, returns active items sorted by `ts` descending (most
+   recent first).
+2. Take items until `cfg.memory.inject_max_chars` (default 2000) is
+   consumed. Older items are dropped from injection (still in the
+   file).
+3. Store on `ctx.memory_items` as an array of `{kind, content}` (id
+   and ts not needed at render-time).
+
+`context.to_messages()` composition:
+
+```
+<DEFAULT_SYSTEM_PROMPT>
+<Phase 2 MCP block>
+
+[background] (memory loaded at startup; managed via :memory)
+- (fact) User prefers terse responses; no end-of-turn summaries.
+- (context) Current project: aish (LuaJIT REPL with MCP tools).
+```
+
+Order of suffixes on the system prompt:
+1. Default Phase 0 prompt
+2. Phase 2 MCP guidance block (always present)
+3. Phase 4 [background] block (when memory_items non-empty)
+4. Phase 3 NORRIS MODE block (when norris_active)
+
+Norris is last so its instructions take precedence when active.
+
+---
+
+## 6. `:memory summarize` (Manual Auto-Extraction)
+
+`:memory summarize` triggers the active model (or
+`cfg.memory.summarizer_model` if set) to read the current session's
+turns and propose candidate memory items.
+
+### Flow
+
+1. Build a prompt: "Read the following conversation transcript. Extract
+   facts, preferences, or context worth remembering across future
+   sessions. Output ONE candidate per line, prefixed with the kind:
+   `fact: …`, `pref: …`, or `context: …`. Maximum 10 candidates."
+2. Send `ctx:to_messages()` minus the [background] suffix (avoid
+   feedback) + the user prompt above.
+3. Parse the response line-by-line for `(fact|pref|context):
+   <content>` shapes.
+4. For each candidate, prompt the user:
+
+   ```
+   [memory] candidate (fact): User prefers terse responses; no end-of-turn summaries.
+   keep? [y/N/edit]
+   ```
+
+   - `y` → write to memory.jsonl.
+   - `N` (or empty) → drop.
+   - `edit` → readline-edit the content before write.
+
+5. Status when done: `[aish] memory: added N candidates`.
+
+### Why manual not automatic in v1
+
+A successful auto-summarize that runs at every `:quit` would either:
+- be expensive (tokens on every exit)
+- drift over time if the model picks up noise
+- compete with the user's intentional `:remember <text>` curation
+
+Manual gives the user the trigger. Q-list tracks auto-cadence options.
+
+---
+
+## 7. Meta Commands (Phase 4 additions)
+
+| Command | Action |
+|---|---|
+| `:remember <text>` | Shortcut for `:memory add fact <text>` |
+| `:memory add <kind> <text>` | Append a memory item (kind ∈ fact, pref, context) |
+| `:memory list` | Show all active memory items (id + ts + kind + content) |
+| `:memory forget <id>` | Append a tombstone for `<id>` |
+| `:memory clear` | Forget all active items (with `[y/N]` confirm) |
+| `:memory summarize` | Extract candidate items from current session via LLM |
+| `:memory inject` | Re-inject current memory.jsonl items into Context (after edits) |
+
+`:help` updated.
+
+---
+
+## 8. Configuration Schema (Phase 4 example block)
+
+```lua
+memory = {
+    -- Path defaults to <history.dir>/memory.jsonl. Override per fleet
+    -- if you want shared memory (read-only is safer than write-shared).
+    -- path = (history.dir or "~/.local/share/aish") .. "/memory.jsonl",
+
+    -- Cap on how much memory content is injected into the system prompt
+    -- at startup. Roughly 2000 chars ≈ 500 tokens. Older items are
+    -- dropped from injection if exceeded; they remain in the file.
+    inject_max_chars = 2000,
+
+    -- Which model to use for :memory summarize. Defaults to the active
+    -- model when nil. Use "fast" for speed; "deep" for better quality.
+    summarizer_model = "fast",
+
+    -- Auto-prune items older than N days at startup. nil = never auto-prune.
+    -- Manual :memory forget always works regardless.
+    -- prune_older_than_days = 90,
+}
+```
+
+---
+
+## 9. Migration from Phase 3
+
+User-visible:
+- `:remember`, `:memory list / forget / clear / summarize` are new
+  meta commands.
+- A `[background]` block in the system prompt appears when memory items
+  exist.
+- Existing configs without `memory = {...}` continue to work — no
+  injection, no auto-summarize. Phase 3 behavior intact.
+
+Substrate (PHASE0.md §3) invariants: unchanged.
+
+The `[background]` system-prompt suffix is composed dynamically by
+`context.to_messages()` (same pattern as Phase 2 MCP block and Phase 3
+NORRIS suffix). No new substrate contract.
+
+---
+
+## 10. Out of Scope (Phase 4)
+
+Per PHASE0.md §11 these belong to later phases:
+- Multi-model routing / cloud fallback (Phase 5).
+- Tree-sitter syntax highlighting (Phase 6).
+
+Specifically out of Phase 4 scope despite proximity:
+- Multi-process memory sharing (single-writer assumed v1).
+- Retrieval-augmented injection (RAG over memory.jsonl) — v1 just LRU.
+- Auto-trigger of `:memory summarize` at `:quit` (Q-list).
+- Memory categories beyond fact/pref/context — minimal typing v1.
+- Cross-aish-instance memory sync (memory.jsonl in a synced dir
+  works coincidentally; not designed for it).
+- Encryption at rest — same posture as session logs (none in v1).
+
+---
+
+## 11. Open Questions
+
+| # | Question | Impact | Resolve by |
+|---|---|---|---|
+| Q31 | Auto-summarize trigger: manual only (current), automatic at `:quit`, automatic on token-budget eviction, or config-flagged threshold? | history.lua + repl.lua | Phase 4 (analyze) |
+| Q32 | Editing memory items in place: `:memory edit <id>` to rewrite content? Append-only means edit = new id + forget old. Worth the extra meta? | history.lua + UX | Phase 4 (analyze) |
+| Q33 | Memory injection while in Norris mode: does the [background] block stay, get suppressed, or merge with the Norris goal? Proposal: keep both; Norris is the last block and dominates. | context.lua | Phase 4 (plan) |
+| Q34 | Memory kinds: stick with fact/pref/context or split prefs into a dedicated section of the system prompt (where they're more impactful)? v1 says no — flat list. | context.lua + UX | Phase 5 if it bites |
+| Q35 | Privacy / redaction: `:memory summarize` could capture sensitive tokens from a chat (passwords, paths). Should it auto-redact? Strip command-history-style? | safety.lua + memory.lua | Phase 4 (verify) — review user-emergent risk |
+| Q36 | Memory deduplication: user adds the same fact twice. Detect and warn, dedupe silently, or allow? v1: allow (cheap; user can `:memory list` to spot). | history.lua | Phase 4 (verify) |
+
+---
+
+## 12. Implementation Plan (commit-by-commit)
+
+Bottom-up, same cadence as Phase 0/1/2/3. Five commits expected:
+
+1. **`history.lua` — memory store.** Add `M.open_memory`,
+   `memory:add(kind, content, tags?, source?)`, `memory:forget(id)`,
+   `M.load_memory(path)` with tombstone resolution. Persistent
+   monotonic counter via a sidecar `memory.id` file (or scan the JSONL
+   for max id at open time — pick at analyze). **Test in isolation**:
+   round-trip add/forget/load against a temp file.
+
+2. **`context.lua` — memory injection.** Add `ctx.memory_items` and
+   the `[background]` block composer in `to_messages()`. Cap by
+   `inject_max_chars`. **Test in isolation**: assert composition order
+   (MCP → background → Norris); cap honored.
+
+3. **`repl.lua` — `:remember` + `:memory list / add / forget / clear / inject`.**
+   At startup, after MCP setup, open the memory handle + LRU-load items.
+   Hook the meta dispatch. No summarize yet. **End-to-end**: run aish,
+   `:remember X`, `:quit`, restart, `:memory list` shows X, `:history`
+   shows X in [background].
+
+4. **`:memory summarize`** — manual extraction. Bundle a system-prompt
+   for the summarizer model; parse response; per-candidate confirm
+   prompt; append accepted items. **End-to-end**: short conversation,
+   summarize, accept one of two candidates, restart, verify accepted
+   one persists.
+
+5. **`config.lua` — example memory block.** Documentation-only;
+   commented-out example. Final commit.
+
+### Risk / non-obvious
+
+- **Counter persistence**: `memory:add` needs a monotonic id. Options:
+  (a) sidecar `memory.id` file with a single integer, (b) scan the
+  JSONL on open for max id, (c) use timestamp as id (no monotonic
+  guarantee across rapid adds). Plan: (b) — scan once at open; cache
+  in the handle. Wraps if integer overflow but at 2^53 entries we're
+  fine.
+- **Tombstone resolution at load**: build a set of forget-target ids
+  from kind=="forget" entries; filter active items to exclude. Order
+  doesn't matter (tombstones can appear before their targets if the
+  file is hand-edited; the resolution is set-based).
+- **Empty file at open** vs **nonexistent file**: both should yield an
+  empty memory handle. Phase 1's `history.open` already handles file
+  creation; extend the pattern.
+- **System prompt growth**: the suffix-stacking pattern is up to 4
+  blocks now (default + MCP + background + Norris). Token cost ~200
+  + ~80 + 2000 + ~250 = ~2530 chars baseline before any user/asst
+  turns. Worth measuring at baseline phase.
+- **`:memory summarize` parse robustness**: small models may emit
+  "fact: ..." sometimes with markdown bullets, sometimes without.
+  Parser should tolerate `^[-*]?\s*(fact|pref|context):\s*(.+)`.
+- **`:memory clear` with confirm**: same UX as Phase 3 destructive
+  prompts. `[y/N]` default-no.
+
+### Open at plan; resolve at review
+
+- Whether `:remember` should append to the LIVE `ctx.memory_items`
+  immediately (so the model sees it on the next turn without restart)
+  or only on next session boot. v1 says yes — append both to file AND
+  to live ctx for immediate visibility.
+- Whether the summarizer should be fed the FULL session log or just
+  recent turns (token budget). v1 says full minus the [background]
+  suffix; cap at session-log size <= 64KB or last N turns.
+
+---
+
+*End of Phase 4 Manifest — aish*