aish/docs/PHASE4.md

# aish — Phase 4 Manifest

**Project:** aish — AI-augmented conversational shell
**Document:** Phase 4 Requirements, Architecture & Design Decisions
**Status:** Plan (review fold-in 2026-05-13 — TOCTOU race + Norris suppression + summarizer self-amp resolved)
**Date:** 2026-05-13

**Review fold-in (2026-05-13):**

R-B1. **TOCTOU race on memory.jsonl** — two aish processes against the
    same `history.dir` would each compute the same `next_id` and
    produce duplicate ids; tombstones become ambiguous. Resolution:
    `M.open_memory` takes an `flock(LOCK_EX | LOCK_NB)` advisory lock
    on the file descriptor. Held until handle close. Failure to
    acquire → `nil, "memory.jsonl held by another aish process"`.
    Requires extending `ffi/libc.lua` with `flock(2)` — one cdef +
    two constants (LOCK_EX=2, LOCK_NB=4). The lock is the *enforcement*
    of the single-writer assumption stated in §2; documented in §2 row.

R-C1. **System-prompt growth under Norris** — over an 8-step Norris run,
    a 2KB [background] block adds ~16K redundant tokens. The Phase 0
    §8 sliding window evicts user/asst pairs but keeps the system
    prompt, so big system prompts displace conversation. Resolution
    (Q33 closed): suppress [background] when `ctx.norris_active == true`.
    Memory items rarely change Norris-step planning, and Norris has
    its goal anchor via the NORRIS suffix already. §5 + §11 reflect.

R-C2. **Summarizer self-amplification** — running `:memory summarize`
    twice in one session would feed the previous summarize call's
    *assistant turn* back into the input, leading to drift (re-propose
    accepted items, no signal about rejections). Resolution: operate
    on the session log file (`history.load(session_path)`) rather
    than `ctx:to_messages()`. The session log is the authoritative
    "what was discussed" stream. Skip lines tagged
    `{role:"assistant", meta:"summarize"}` (a new optional field on
    the JSONL turn). §6 reflects.

R-C3. **DEFAULT_SYSTEM_PROMPT bakes MCP statically** — cosmetic. §5
    diagram now reads "DEFAULT (Phase 0 + Phase 2 MCP) → [background]
    → NORRIS". No code change.

NITs folded inline:
  N1. `:memory forget <id>` for an already-tombstoned id → no-op + status.
  N2. §2 path note: memory.jsonl is sibling of sessions/, no collision.
  N3. §4 invariant: items have id ≥ 1; meta header has no id and is
      ignored; tombstones with non-matching targets are no-ops.
  N4. §7 `:memory inject` semantics: replaces `ctx.memory_items` from
      a fresh `load_memory()` + LRU-by-ts truncation (same as startup).

**Analyze findings (2026-05-13):**

**Analyze findings (2026-05-13):**

A1. **history.lua surface is clean** — `M.open`/`Session:append`/
    `Session:close`/`M.load`/`M.list_sessions`. The memory functions
    can mirror this exactly: `M.open_memory`/`memory:add`/
    `memory:forget`/`memory:close`/`M.load_memory`. No structural
    refactor needed; pure additions.

A2. **Counter persistence — scan at open, cache in handle.** Phase 1's
    session log writes a `{"meta":{...}}` header on first creation but
    doesn't track entry-id (turns aren't numbered). For memory, the
    monotonic id is needed for forget-targeting. Cheapest correct
    approach: on `M.open_memory`, read all lines once, find the max
    `id` field present (skipping the meta header if any), cache as
    `handle.next_id`. Subsequent `add` calls increment in-memory and
    persist on the next append. O(n) at open is acceptable since n is
    bounded by user curation (~hundreds, not millions). No sidecar.

A3. **System-prompt suffix order, post-analyze**: actual current
    composition is `DEFAULT_SYSTEM_PROMPT` (which has Phase 2 MCP
    guidance already baked-in as a static block) → optional `NORRIS`
    dynamic suffix. The Phase 2 MCP block is NOT computed dynamically
    — it's part of DEFAULT_SYSTEM_PROMPT. So Phase 4's `[background]`
    block lives between DEFAULT and NORRIS. Token cost measured:
    - DEFAULT: 697 chars (~174 tokens)
    - DEFAULT + NORRIS: 1458 chars (~364 tokens)
    - DEFAULT + 2KB background + NORRIS: ~3460 chars (~865 tokens)
    Within typical 4-8K context budgets.

These findings don't require manifest changes — the §3 module-changes
table and §5 injection mechanism already match. Recording the
measurements here so verify (Phase 7) has anchors.

PHASE0 is the locked substrate; PHASE1, PHASE2, PHASE3 are layered on top.
This manifest specifies what Phase 4 adds — **cross-session memory** — and
the user-facing surface for managing it.

---

## 1. Scope of Phase 4

Three pillars per PHASE0.md §11 row 4:

1. **`memory.jsonl` persistent store** — a single append-only file
   (`<config.history.dir>/memory.jsonl`) carrying user-curated facts,
   preferences, and project context that survive aish restarts. Same
   storage convention as session logs but a separate file because the
   read pattern (load at startup) and write pattern (curated only)
   differ from session logs (append-every-turn).

2. **Startup context injection** — at REPL boot, recent memory items
   are loaded into the live `Context` so the model sees them on the
   very first turn. Injection is bounded (token budget) and visible
   to the user via `:memory list`.

3. **`:memory` management surface + automatic candidate extraction** —
   meta commands for `add`, `list`, `forget`, `clear`, plus an opt-in
   summarizer that runs at session end (or on demand) extracting
   candidate facts from the session log for the user to triage into
   memory.

**Phase 4 is done when:**

- `:remember <text>` (alias for `:memory add <text>`) writes a line to
  `memory.jsonl` and the next REPL boot sees it in context.
- `:memory list` shows current memory items with their IDs and ages.
- `:memory forget <id>` removes one item; `:memory clear` removes all
  (with confirm).
- At startup, the top-N most recent memory items are prepended to the
  Context as a single "background:" block (configurable cap).
- `:memory summarize` runs the active model over the current session
  log and proposes candidate memory items; the user accepts/rejects
  per-candidate via prompt.
- Existing configs without a `memory` section behave exactly like
  Phase 3 (no startup injection, no auto-summarize).

---

## 2. Technology Decisions (delta from Phase 3)

| Decision | Choice | Rationale |
|---|---|---|
| Storage format | Append-only JSONL, one item per line | Same convention as Phase 1's session logs. Greppable, robust to truncation, no parser dependency beyond vendored dkjson. |
| Storage location | `<config.history.dir>/memory.jsonl` (sibling to `sessions/`) | Co-located with session logs; users can back up one directory. Defaults to `~/.local/share/aish/memory.jsonl`. Path is a sibling of `sessions/` (not inside it), so `:save <name>` cannot collide. |
| Memory-item shape | `{id, ts, kind, content, tags?, source?}` | `id` is monotonic int (counter persisted in `memory.id`); `kind ∈ {"fact","pref","context"}` lightly typed for future routing; `content` is the body text; optional `tags` array; optional `source` carrying session-id provenance when auto-extracted. |
| Forget semantics | **Append a tombstone**, don't rewrite the file (`{id, ts, kind:"forget", target:<other_id>}`) | Append-only preserves history. `M.load_memory` resolves tombstones during read — silently drops any item whose `id` appears as a forget-target. `:memory clear` writes one tombstone per active item; could also support a wildcard forget. |
| Auto-summarize cadence | **Manual only in v1** (`:memory summarize`). Auto-trigger on `:quit` or by token count is Q-list material. | Conservative; users opt in. Avoids burning tokens on every session end. Manual surface lets the user QA candidates before they land. |
| Summarizer model | The `fast` preset by default (cheap; quality good-enough for extraction); configurable via `cfg.memory.summarizer_model` | Summarization is recall over precision — fast model's tendency to err on the side of inclusion is fine because the user filters per-candidate. |
| Startup injection mechanism | A new dynamic block on the system prompt, appended by `context.to_messages()` when `ctx.memory_items` is non-empty | Same hybrid-prompt pattern as Phase 2's MCP block and Phase 3's NORRIS suffix. No new context structure beyond a list on the Context. |
| Injection budget | `cfg.memory.inject_max_chars` (default 2000 chars total — roughly 500 tokens) | Cap so memory doesn't eat the whole context. LRU-by-`ts` selection if items exceed budget. |
| Pruning policy | Manual `:memory forget` + optional `cfg.memory.prune_older_than_days` (default unset — no auto-pruning) | Conservative defaults; user owns the lifecycle. |
| Interaction with sessions | `memory.jsonl` is independent of `sessions/*.jsonl`. Session JSONL stays the per-conversation log; memory is the curated cross-session knowledge | Distinct concerns. Session log answers "what did we talk about last Tuesday?"; memory answers "what does aish know about me/this-project?". |
| Concurrency | Single-writer **enforced via `flock(LOCK_EX \| LOCK_NB)`** (R-B1) on the memory.jsonl file descriptor in `M.open_memory`. Held until close. Acquire failure → handle creation fails with a clear status message | Session logs got away with single-writer-by-uniqueness (timestamped filenames). memory.jsonl is one shared file, so the flock is the actual enforcement. The lock is advisory (Linux file-lock semantics) but every aish process honors it, which is sufficient for our trust model. |

---

## 3. Module Changes

| File | State after Phase 3 | Phase 4 changes |
|---|---|---|
| `history.lua` | `M.open(path, meta)`, `session:append(turn)`, `M.load(path)`, `M.list_sessions(dir)` | Add memory functions alongside session functions: `M.open_memory(path) -> handle\|nil, err`; `handle:add(kind, content, tags?, source?) -> id`; `handle:forget(id)`; `handle:close()`; `M.load_memory(path) -> items_table` (resolves tombstones). Handle internals: fd (LuaJIT FFI int), next_id (scanned from existing JSONL), held flock. |
| `ffi/libc.lua` | `chdir`, `errno`, `strerror`, plus Phase 1's waitpid/raw I/O/termios/poll, plus Phase 1's read/write/close/kill | Add `flock(2)` cdef (`int flock(int fd, int operation)`), constants `LOCK_EX = 2`, `LOCK_NB = 4`, `LOCK_UN = 8`. Wrapper `M.flock(fd, op) -> true\|false, errmsg`. Used by `history.M.open_memory` for the single-writer enforcement (R-B1). |
| `context.lua` | system prompt + MCP block + NORRIS suffix toggle | Add a `memory_items` field on Context. `to_messages()` composes a dynamic "[background]" block on the system prompt when `memory_items` is non-empty AND not already in Norris mode (don't double-pile). Cap respected via the inject_max_chars budget. |
| `repl.lua` | meta cmds + tool sub-loop + Norris driver | New meta: `:remember <text>` (shortcut for `:memory add fact <text>`); `:memory add <kind> <text>`; `:memory list`; `:memory forget <id>`; `:memory clear`; `:memory summarize`. At startup, after loading config + opening session, also open memory handle and inject the top-N items into `ctx.memory_items`. |
| `broker.lua` | streaming chat + opts.tools/max_tokens/timeout_ms | No structural changes. Used by the summarizer (calls broker.chat with the session log as a single user turn). |
| `config.lua` | example with mcp + safety blocks | Add commented-out `memory = { ... }` example. Default behavior is "no memory injection, no auto-summarize". |
| `executor.lua` | unchanged | unchanged |
| `safety.lua` | is_destructive + norris_step | unchanged (Norris-side suppression of background block is in context.lua, not safety.lua) |

No new module files. All Phase 4 functionality grows existing files —
mostly `history.lua` and `repl.lua`.

---

## 4. memory.jsonl Format

```jsonl
{"id":1,"ts":"2026-05-13T19:01:01Z","kind":"fact","content":"User prefers terse responses; no end-of-turn summaries."}
{"id":2,"ts":"2026-05-13T19:01:35Z","kind":"pref","content":"Default to :model deep for code reasoning tasks."}
{"id":3,"ts":"2026-05-13T19:02:00Z","kind":"context","content":"Current project: aish (LuaJIT REPL with MCP tools).","tags":["aish","luajit"]}
{"id":4,"ts":"2026-05-13T20:00:00Z","kind":"forget","target":2}
```

After `load_memory`, item `id=2` is dropped because of the tombstone.
Active items: 1, 3.

### kind values

- **`fact`** — factual statement about the user, their environment, or
  project state.
- **`pref`** — user preference for aish behavior (response style,
  default model, etc.).
- **`context`** — project / domain context that helps the model orient
  on common tasks.
- **`forget`** — tombstone; refers to another id via `target`.

v1 is lightly typed — the model sees all kinds identically as a flat
list in the [background] block. Future phases may route them
differently (e.g. `pref` into a system-prompt section, `context` into
a user-style preamble). Today they're prose.

### Item-id invariants (N3)

- Items have `id ≥ 1`. The optional meta header line `{"meta":{...}}`
  has no `id` field and is ignored during load.
- Tombstones with non-matching `target` (id doesn't exist, or already
  tombstoned) are no-ops at load — silently dropped from the active
  set. The `:memory forget` meta handler also checks active-set
  membership before appending a tombstone, surfacing a status when
  the id isn't active.

---

## 5. Startup Injection

When aish boots and `cfg.memory` is present (or `memory.jsonl` exists):

1. `history.load_memory(path)` reads all items, applies tombstone
   resolution, returns active items sorted by `ts` descending (most
   recent first).
2. Take items until `cfg.memory.inject_max_chars` (default 2000) is
   consumed. Older items are dropped from injection (still in the
   file).
3. Store on `ctx.memory_items` as an array of `{kind, content}` (id
   and ts not needed at render-time).

`context.to_messages()` composition:

```
<DEFAULT_SYSTEM_PROMPT> (Phase 0 + Phase 2 MCP block, statically embedded)

[background] (memory loaded at startup; managed via :memory)
- (fact) User prefers terse responses; no end-of-turn summaries.
- (context) Current project: aish (LuaJIT REPL with MCP tools).
```

Order of suffixes on the system prompt:
1. DEFAULT_SYSTEM_PROMPT (Phase 0 + Phase 2 MCP guidance, currently
   baked-in to the static constant — R-C3 note: not a separate dynamic
   block in v1; future phases may split)
2. Phase 4 [background] block (when memory_items non-empty AND NOT in
   Norris mode — R-C1 suppression to avoid ~16K of redundant tokens
   per Norris run)
3. Phase 3 NORRIS MODE block (when norris_active)

When Norris is active the order becomes: DEFAULT → NORRIS (no background).
Norris's planning loop already has the goal anchored in its suffix; the
memory items rarely change step-to-step planning.

---

## 6. `:memory summarize` (Manual Auto-Extraction)

`:memory summarize` triggers the active model (or
`cfg.memory.summarizer_model` if set) to read the current session's
turns and propose candidate memory items.

### Flow

1. **Source of truth is the session log file** (R-C2), not
   `ctx:to_messages()`. `history.load(session_path)` returns all
   turns; filter out turns tagged `meta = "summarize"` (set on the
   assistant turn that emitted a prior summarize response) so the
   summarizer can't feed on its own output across multiple calls.
2. Build a prompt: "Read the following conversation transcript. Extract
   facts, preferences, or context worth remembering across future
   sessions. Output ONE candidate per line, prefixed with the kind:
   `fact: …`, `pref: …`, or `context: …`. Maximum 10 candidates."
3. Send the filtered transcript as a single user turn + the
   instruction above. Use `cfg.memory.summarizer_model` if set (else
   the active model). The resulting assistant turn gets logged
   with `meta = "summarize"` so future :memory summarize calls
   exclude it.
4. Parse the response line-by-line for `(fact|pref|context):
   <content>` shapes. Tolerate markdown bullet prefixes (`-`, `*`).
4. For each candidate, prompt the user:

   ```
   [memory] candidate (fact): User prefers terse responses; no end-of-turn summaries.
   keep? [y/N/edit]
   ```

   - `y` → write to memory.jsonl.
   - `N` (or empty) → drop.
   - `edit` → readline-edit the content before write.

5. Status when done: `[aish] memory: added N candidates`.

### Why manual not automatic in v1

A successful auto-summarize that runs at every `:quit` would either:
- be expensive (tokens on every exit)
- drift over time if the model picks up noise
- compete with the user's intentional `:remember <text>` curation

Manual gives the user the trigger. Q-list tracks auto-cadence options.

---

## 7. Meta Commands (Phase 4 additions)

| Command | Action |
|---|---|
| `:remember <text>` | Shortcut for `:memory add fact <text>` |
| `:memory add <kind> <text>` | Append a memory item (kind ∈ fact, pref, context) |
| `:memory list` | Show all active memory items (id + ts + kind + content) |
| `:memory forget <id>` | Append a tombstone for `<id>` |
| `:memory clear` | Forget all active items (with `[y/N]` confirm) |
| `:memory summarize` | Extract candidate items from current session via LLM |
| `:memory inject` | Replace `ctx.memory_items` from a fresh `load_memory()` + LRU-by-ts truncation. Same logic as startup injection. Useful after hand-editing `memory.jsonl` or after `:memory forget` to immediately reflect in the system prompt. |

`:help` updated.

---

## 8. Configuration Schema (Phase 4 example block)

```lua
memory = {
    -- Path defaults to <history.dir>/memory.jsonl. Override per fleet
    -- if you want shared memory (read-only is safer than write-shared).
    -- path = (history.dir or "~/.local/share/aish") .. "/memory.jsonl",

    -- Cap on how much memory content is injected into the system prompt
    -- at startup. Roughly 2000 chars ≈ 500 tokens. Older items are
    -- dropped from injection if exceeded; they remain in the file.
    inject_max_chars = 2000,

    -- Which model to use for :memory summarize. Defaults to the active
    -- model when nil. Use "fast" for speed; "deep" for better quality.
    summarizer_model = "fast",

    -- Auto-prune items older than N days at startup. nil = never auto-prune.
    -- Manual :memory forget always works regardless.
    -- prune_older_than_days = 90,
}
```

---

## 9. Migration from Phase 3

User-visible:
- `:remember`, `:memory list / forget / clear / summarize` are new
  meta commands.
- A `[background]` block in the system prompt appears when memory items
  exist.
- Existing configs without `memory = {...}` continue to work — no
  injection, no auto-summarize. Phase 3 behavior intact.

Substrate (PHASE0.md §3) invariants: unchanged.

The `[background]` system-prompt suffix is composed dynamically by
`context.to_messages()` (same pattern as Phase 2 MCP block and Phase 3
NORRIS suffix). No new substrate contract.

---

## 10. Out of Scope (Phase 4)

Per PHASE0.md §11 these belong to later phases:
- Multi-model routing / cloud fallback (Phase 5).
- Tree-sitter syntax highlighting (Phase 6).

Specifically out of Phase 4 scope despite proximity:
- Multi-process memory sharing (single-writer assumed v1).
- Retrieval-augmented injection (RAG over memory.jsonl) — v1 just LRU.
- Auto-trigger of `:memory summarize` at `:quit` (Q-list).
- Memory categories beyond fact/pref/context — minimal typing v1.
- Cross-aish-instance memory sync (memory.jsonl in a synced dir
  works coincidentally; not designed for it).
- Encryption at rest — same posture as session logs (none in v1).

---

## 11. Open Questions

| # | Question | Impact | Resolve by |
|---|---|---|---|
| Q31 | Auto-summarize trigger: manual only (current), automatic at `:quit`, automatic on token-budget eviction, or config-flagged threshold? | history.lua + repl.lua | Phase 4 (analyze) |
| Q32 | Editing memory items in place: `:memory edit <id>` to rewrite content? Append-only means edit = new id + forget old. Worth the extra meta? | history.lua + UX | Phase 4 (analyze) |
| Q33 | ~~Memory injection while in Norris mode~~ | context.lua | **Resolved at review (R-C1)**: SUPPRESSED. Memory items aren't injected when `ctx.norris_active == true`. Norris has its goal anchor in the NORRIS suffix; 16K of redundant background per 8-step run is not worth the marginal context value. |
| Q34 | Memory kinds: stick with fact/pref/context or split prefs into a dedicated section of the system prompt (where they're more impactful)? v1 says no — flat list. | context.lua + UX | Phase 5 if it bites |
| Q35 | Privacy / redaction: `:memory summarize` could capture sensitive tokens from a chat (passwords, paths). Should it auto-redact? Strip command-history-style? | safety.lua + memory.lua | Phase 4 (verify) — review user-emergent risk |
| Q36 | Memory deduplication: user adds the same fact twice. Detect and warn, dedupe silently, or allow? v1: allow (cheap; user can `:memory list` to spot). | history.lua | Phase 4 (verify) |

---

## 12. Implementation Plan (commit-by-commit)

Bottom-up, same cadence as Phase 0/1/2/3. Five commits expected:

1. **`history.lua` — memory store + `ffi/libc.lua` flock (R-B1 bundled).**
   - `ffi/libc.lua`: cdef `flock(2)` + LOCK_EX/LOCK_NB/LOCK_UN constants
     + `M.flock(fd, op)` wrapper.
   - `history.lua`: `M.open_memory(path)` opens the file (creating parent
     dirs + meta-header line if empty), takes `flock(LOCK_EX | LOCK_NB)`
     on the fd, scans the existing JSONL for max id → handle.next_id.
     Returns `(handle, nil)` on success; `(nil, errmsg)` on lock-held.
   - `handle:add(kind, content, tags?, source?)`: assigns next id,
     appends JSON line, returns id.
   - `handle:forget(id)`: appends a tombstone for id.
   - `handle:close()`: releases flock + closes fd.
   - `M.load_memory(path)`: reads all lines, builds forget-target set
     from kind=="forget" entries, returns active items sorted by `ts`
     descending. Drops items whose id is in the forget-set OR whose id
     is nil (meta header).
   **Test in isolation**: round-trip add/forget/load, lock-held
   detection (open twice in same process, second should fail).

2. **`context.lua` — memory injection.** Add `ctx.memory_items` and
   the `[background]` block composer in `to_messages()`. Cap by
   `inject_max_chars`. **Test in isolation**: assert composition order
   (MCP → background → Norris); cap honored.

3. **`repl.lua` — `:remember` + `:memory list / add / forget / clear / inject`.**
   At startup, after MCP setup, open the memory handle + LRU-load items.
   Hook the meta dispatch. No summarize yet. **End-to-end**: run aish,
   `:remember X`, `:quit`, restart, `:memory list` shows X, `:history`
   shows X in [background].

4. **`:memory summarize`** — manual extraction. Bundle a system-prompt
   for the summarizer model; parse response; per-candidate confirm
   prompt; append accepted items. **End-to-end**: short conversation,
   summarize, accept one of two candidates, restart, verify accepted
   one persists.

5. **`config.lua` — example memory block.** Documentation-only;
   commented-out example. Final commit.

### Risk / non-obvious

- **Counter persistence**: `memory:add` needs a monotonic id. Options:
  (a) sidecar `memory.id` file with a single integer, (b) scan the
  JSONL on open for max id, (c) use timestamp as id (no monotonic
  guarantee across rapid adds). Plan: (b) — scan once at open; cache
  in the handle. Wraps if integer overflow but at 2^53 entries we're
  fine.
- **Tombstone resolution at load**: build a set of forget-target ids
  from kind=="forget" entries; filter active items to exclude. Order
  doesn't matter (tombstones can appear before their targets if the
  file is hand-edited; the resolution is set-based).
- **Empty file at open** vs **nonexistent file**: both should yield an
  empty memory handle. Phase 1's `history.open` already handles file
  creation; extend the pattern.
- **System prompt growth**: the suffix-stacking pattern is up to 4
  blocks now (default + MCP + background + Norris). Token cost ~200
  + ~80 + 2000 + ~250 = ~2530 chars baseline before any user/asst
  turns. Worth measuring at baseline phase.
- **`:memory summarize` parse robustness**: small models may emit
  "fact: ..." sometimes with markdown bullets, sometimes without.
  Parser should tolerate `^[-*]?\s*(fact|pref|context):\s*(.+)`.
- **`:memory clear` with confirm**: same UX as Phase 3 destructive
  prompts. `[y/N]` default-no.

### Open at plan; resolve at review

- Whether `:remember` should append to the LIVE `ctx.memory_items`
  immediately (so the model sees it on the next turn without restart)
  or only on next session boot. v1 says yes — append both to file AND
  to live ctx for immediate visibility.
- Whether the summarizer should be fed the FULL session log or just
  recent turns (token budget). v1 says full minus the [background]
  suffix; cap at session-log size <= 64KB or last N turns.

---

*End of Phase 4 Manifest — aish*