Files
marfrit ffead3986c docs/PHASE4: review fold-in — flock for race, Norris suppression, summarizer self-amp
Independent review found 1 BLOCKER + 3 CONCERNs + 4 NITs.

R-B1 (BLOCKER): TOCTOU race on memory.jsonl — two aish processes
  scanning the same file compute identical next_ids. Resolution:
  flock(LOCK_EX | LOCK_NB) on the fd in M.open_memory, held until
  close. Bundled into commit #1 (per reviewer: cannot defer because
  adding flock retroactively means reopening the handle). Requires
  ffi/libc.lua extension: flock cdef + LOCK_EX/LOCK_NB/LOCK_UN
  constants + M.flock wrapper.

R-C1 (CONCERN, closes Q33): [background] block suppressed when
  ctx.norris_active. Avoids ~16K of redundant tokens per 8-step
  Norris run. Norris already anchors via its goal in the NORRIS
  suffix; memory items rarely change step-to-step planning.

R-C2 (CONCERN): summarizer self-amplification — running :memory
  summarize twice in one session would feed the prior summarize
  call's assistant turn into the next input. Resolution: operate
  on the session log file (history.load(session_path)) instead
  of ctx:to_messages(), and tag prior summarize turns with
  meta="summarize" so they're filterable.

R-C3 (CONCERN, cosmetic): §5 diagram clarified that
  DEFAULT_SYSTEM_PROMPT already carries the Phase 2 MCP block
  statically — not a separate dynamic block in v1.

NITs N1-N4 folded inline:
  N1 forget no-op for unknown id surfaces a status
  N2 path note: memory.jsonl is sibling of sessions/, no collision
  N3 item-id invariants: id >= 1; meta header has no id; tombstones
     with non-matching targets are no-ops
  N4 :memory inject semantics explicit (replace ctx.memory_items
     from a fresh load + LRU-by-ts truncation)

§3 module-changes table grew a new ffi/libc.lua row.
§12 commit #1 description tightened — flock work bundled inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 04:50:43 +00:00

25 KiB

aish — Phase 4 Manifest

Project: aish — AI-augmented conversational shell Document: Phase 4 Requirements, Architecture & Design Decisions Status: Plan (review fold-in 2026-05-13 — TOCTOU race + Norris suppression + summarizer self-amp resolved) Date: 2026-05-13

Review fold-in (2026-05-13):

R-B1. TOCTOU race on memory.jsonl — two aish processes against the same history.dir would each compute the same next_id and produce duplicate ids; tombstones become ambiguous. Resolution: M.open_memory takes an flock(LOCK_EX | LOCK_NB) advisory lock on the file descriptor. Held until handle close. Failure to acquire → nil, "memory.jsonl held by another aish process". Requires extending ffi/libc.lua with flock(2) — one cdef + two constants (LOCK_EX=2, LOCK_NB=4). The lock is the enforcement of the single-writer assumption stated in §2; documented in §2 row.

R-C1. System-prompt growth under Norris — over an 8-step Norris run, a 2KB [background] block adds ~16K redundant tokens. The Phase 0 §8 sliding window evicts user/asst pairs but keeps the system prompt, so big system prompts displace conversation. Resolution (Q33 closed): suppress [background] when ctx.norris_active == true. Memory items rarely change Norris-step planning, and Norris has its goal anchor via the NORRIS suffix already. §5 + §11 reflect.

R-C2. Summarizer self-amplification — running :memory summarize twice in one session would feed the previous summarize call's assistant turn back into the input, leading to drift (re-propose accepted items, no signal about rejections). Resolution: operate on the session log file (history.load(session_path)) rather than ctx:to_messages(). The session log is the authoritative "what was discussed" stream. Skip lines tagged {role:"assistant", meta:"summarize"} (a new optional field on the JSONL turn). §6 reflects.

R-C3. DEFAULT_SYSTEM_PROMPT bakes MCP statically — cosmetic. §5 diagram now reads "DEFAULT (Phase 0 + Phase 2 MCP) → [background] → NORRIS". No code change.

NITs folded inline: N1. :memory forget <id> for an already-tombstoned id → no-op + status. N2. §2 path note: memory.jsonl is sibling of sessions/, no collision. N3. §4 invariant: items have id ≥ 1; meta header has no id and is ignored; tombstones with non-matching targets are no-ops. N4. §7 :memory inject semantics: replaces ctx.memory_items from a fresh load_memory() + LRU-by-ts truncation (same as startup).

Analyze findings (2026-05-13):

Analyze findings (2026-05-13):

A1. history.lua surface is cleanM.open/Session:append/ Session:close/M.load/M.list_sessions. The memory functions can mirror this exactly: M.open_memory/memory:add/ memory:forget/memory:close/M.load_memory. No structural refactor needed; pure additions.

A2. Counter persistence — scan at open, cache in handle. Phase 1's session log writes a {"meta":{...}} header on first creation but doesn't track entry-id (turns aren't numbered). For memory, the monotonic id is needed for forget-targeting. Cheapest correct approach: on M.open_memory, read all lines once, find the max id field present (skipping the meta header if any), cache as handle.next_id. Subsequent add calls increment in-memory and persist on the next append. O(n) at open is acceptable since n is bounded by user curation (~hundreds, not millions). No sidecar.

A3. System-prompt suffix order, post-analyze: actual current composition is DEFAULT_SYSTEM_PROMPT (which has Phase 2 MCP guidance already baked-in as a static block) → optional NORRIS dynamic suffix. The Phase 2 MCP block is NOT computed dynamically — it's part of DEFAULT_SYSTEM_PROMPT. So Phase 4's [background] block lives between DEFAULT and NORRIS. Token cost measured: - DEFAULT: 697 chars (~174 tokens) - DEFAULT + NORRIS: 1458 chars (~364 tokens) - DEFAULT + 2KB background + NORRIS: ~3460 chars (~865 tokens) Within typical 4-8K context budgets.

These findings don't require manifest changes — the §3 module-changes table and §5 injection mechanism already match. Recording the measurements here so verify (Phase 7) has anchors.

PHASE0 is the locked substrate; PHASE1, PHASE2, PHASE3 are layered on top. This manifest specifies what Phase 4 adds — cross-session memory — and the user-facing surface for managing it.


1. Scope of Phase 4

Three pillars per PHASE0.md §11 row 4:

  1. memory.jsonl persistent store — a single append-only file (<config.history.dir>/memory.jsonl) carrying user-curated facts, preferences, and project context that survive aish restarts. Same storage convention as session logs but a separate file because the read pattern (load at startup) and write pattern (curated only) differ from session logs (append-every-turn).

  2. Startup context injection — at REPL boot, recent memory items are loaded into the live Context so the model sees them on the very first turn. Injection is bounded (token budget) and visible to the user via :memory list.

  3. :memory management surface + automatic candidate extraction — meta commands for add, list, forget, clear, plus an opt-in summarizer that runs at session end (or on demand) extracting candidate facts from the session log for the user to triage into memory.

Phase 4 is done when:

  • :remember <text> (alias for :memory add <text>) writes a line to memory.jsonl and the next REPL boot sees it in context.
  • :memory list shows current memory items with their IDs and ages.
  • :memory forget <id> removes one item; :memory clear removes all (with confirm).
  • At startup, the top-N most recent memory items are prepended to the Context as a single "background:" block (configurable cap).
  • :memory summarize runs the active model over the current session log and proposes candidate memory items; the user accepts/rejects per-candidate via prompt.
  • Existing configs without a memory section behave exactly like Phase 3 (no startup injection, no auto-summarize).

2. Technology Decisions (delta from Phase 3)

Decision Choice Rationale
Storage format Append-only JSONL, one item per line Same convention as Phase 1's session logs. Greppable, robust to truncation, no parser dependency beyond vendored dkjson.
Storage location <config.history.dir>/memory.jsonl (sibling to sessions/) Co-located with session logs; users can back up one directory. Defaults to ~/.local/share/aish/memory.jsonl. Path is a sibling of sessions/ (not inside it), so :save <name> cannot collide.
Memory-item shape {id, ts, kind, content, tags?, source?} id is monotonic int (counter persisted in memory.id); kind ∈ {"fact","pref","context"} lightly typed for future routing; content is the body text; optional tags array; optional source carrying session-id provenance when auto-extracted.
Forget semantics Append a tombstone, don't rewrite the file ({id, ts, kind:"forget", target:<other_id>}) Append-only preserves history. M.load_memory resolves tombstones during read — silently drops any item whose id appears as a forget-target. :memory clear writes one tombstone per active item; could also support a wildcard forget.
Auto-summarize cadence Manual only in v1 (:memory summarize). Auto-trigger on :quit or by token count is Q-list material. Conservative; users opt in. Avoids burning tokens on every session end. Manual surface lets the user QA candidates before they land.
Summarizer model The fast preset by default (cheap; quality good-enough for extraction); configurable via cfg.memory.summarizer_model Summarization is recall over precision — fast model's tendency to err on the side of inclusion is fine because the user filters per-candidate.
Startup injection mechanism A new dynamic block on the system prompt, appended by context.to_messages() when ctx.memory_items is non-empty Same hybrid-prompt pattern as Phase 2's MCP block and Phase 3's NORRIS suffix. No new context structure beyond a list on the Context.
Injection budget cfg.memory.inject_max_chars (default 2000 chars total — roughly 500 tokens) Cap so memory doesn't eat the whole context. LRU-by-ts selection if items exceed budget.
Pruning policy Manual :memory forget + optional cfg.memory.prune_older_than_days (default unset — no auto-pruning) Conservative defaults; user owns the lifecycle.
Interaction with sessions memory.jsonl is independent of sessions/*.jsonl. Session JSONL stays the per-conversation log; memory is the curated cross-session knowledge Distinct concerns. Session log answers "what did we talk about last Tuesday?"; memory answers "what does aish know about me/this-project?".
Concurrency Single-writer enforced via flock(LOCK_EX | LOCK_NB) (R-B1) on the memory.jsonl file descriptor in M.open_memory. Held until close. Acquire failure → handle creation fails with a clear status message Session logs got away with single-writer-by-uniqueness (timestamped filenames). memory.jsonl is one shared file, so the flock is the actual enforcement. The lock is advisory (Linux file-lock semantics) but every aish process honors it, which is sufficient for our trust model.

3. Module Changes

File State after Phase 3 Phase 4 changes
history.lua M.open(path, meta), session:append(turn), M.load(path), M.list_sessions(dir) Add memory functions alongside session functions: M.open_memory(path) -> handle|nil, err; handle:add(kind, content, tags?, source?) -> id; handle:forget(id); handle:close(); M.load_memory(path) -> items_table (resolves tombstones). Handle internals: fd (LuaJIT FFI int), next_id (scanned from existing JSONL), held flock.
ffi/libc.lua chdir, errno, strerror, plus Phase 1's waitpid/raw I/O/termios/poll, plus Phase 1's read/write/close/kill Add flock(2) cdef (int flock(int fd, int operation)), constants LOCK_EX = 2, LOCK_NB = 4, LOCK_UN = 8. Wrapper M.flock(fd, op) -> true|false, errmsg. Used by history.M.open_memory for the single-writer enforcement (R-B1).
context.lua system prompt + MCP block + NORRIS suffix toggle Add a memory_items field on Context. to_messages() composes a dynamic "[background]" block on the system prompt when memory_items is non-empty AND not already in Norris mode (don't double-pile). Cap respected via the inject_max_chars budget.
repl.lua meta cmds + tool sub-loop + Norris driver New meta: :remember <text> (shortcut for :memory add fact <text>); :memory add <kind> <text>; :memory list; :memory forget <id>; :memory clear; :memory summarize. At startup, after loading config + opening session, also open memory handle and inject the top-N items into ctx.memory_items.
broker.lua streaming chat + opts.tools/max_tokens/timeout_ms No structural changes. Used by the summarizer (calls broker.chat with the session log as a single user turn).
config.lua example with mcp + safety blocks Add commented-out memory = { ... } example. Default behavior is "no memory injection, no auto-summarize".
executor.lua unchanged unchanged
safety.lua is_destructive + norris_step unchanged (Norris-side suppression of background block is in context.lua, not safety.lua)

No new module files. All Phase 4 functionality grows existing files — mostly history.lua and repl.lua.


4. memory.jsonl Format

{"id":1,"ts":"2026-05-13T19:01:01Z","kind":"fact","content":"User prefers terse responses; no end-of-turn summaries."}
{"id":2,"ts":"2026-05-13T19:01:35Z","kind":"pref","content":"Default to :model deep for code reasoning tasks."}
{"id":3,"ts":"2026-05-13T19:02:00Z","kind":"context","content":"Current project: aish (LuaJIT REPL with MCP tools).","tags":["aish","luajit"]}
{"id":4,"ts":"2026-05-13T20:00:00Z","kind":"forget","target":2}

After load_memory, item id=2 is dropped because of the tombstone. Active items: 1, 3.

kind values

  • fact — factual statement about the user, their environment, or project state.
  • pref — user preference for aish behavior (response style, default model, etc.).
  • context — project / domain context that helps the model orient on common tasks.
  • forget — tombstone; refers to another id via target.

v1 is lightly typed — the model sees all kinds identically as a flat list in the [background] block. Future phases may route them differently (e.g. pref into a system-prompt section, context into a user-style preamble). Today they're prose.

Item-id invariants (N3)

  • Items have id ≥ 1. The optional meta header line {"meta":{...}} has no id field and is ignored during load.
  • Tombstones with non-matching target (id doesn't exist, or already tombstoned) are no-ops at load — silently dropped from the active set. The :memory forget meta handler also checks active-set membership before appending a tombstone, surfacing a status when the id isn't active.

5. Startup Injection

When aish boots and cfg.memory is present (or memory.jsonl exists):

  1. history.load_memory(path) reads all items, applies tombstone resolution, returns active items sorted by ts descending (most recent first).
  2. Take items until cfg.memory.inject_max_chars (default 2000) is consumed. Older items are dropped from injection (still in the file).
  3. Store on ctx.memory_items as an array of {kind, content} (id and ts not needed at render-time).

context.to_messages() composition:

<DEFAULT_SYSTEM_PROMPT> (Phase 0 + Phase 2 MCP block, statically embedded)

[background] (memory loaded at startup; managed via :memory)
- (fact) User prefers terse responses; no end-of-turn summaries.
- (context) Current project: aish (LuaJIT REPL with MCP tools).

Order of suffixes on the system prompt:

  1. DEFAULT_SYSTEM_PROMPT (Phase 0 + Phase 2 MCP guidance, currently baked-in to the static constant — R-C3 note: not a separate dynamic block in v1; future phases may split)
  2. Phase 4 [background] block (when memory_items non-empty AND NOT in Norris mode — R-C1 suppression to avoid ~16K of redundant tokens per Norris run)
  3. Phase 3 NORRIS MODE block (when norris_active)

When Norris is active the order becomes: DEFAULT → NORRIS (no background). Norris's planning loop already has the goal anchored in its suffix; the memory items rarely change step-to-step planning.


6. :memory summarize (Manual Auto-Extraction)

:memory summarize triggers the active model (or cfg.memory.summarizer_model if set) to read the current session's turns and propose candidate memory items.

Flow

  1. Source of truth is the session log file (R-C2), not ctx:to_messages(). history.load(session_path) returns all turns; filter out turns tagged meta = "summarize" (set on the assistant turn that emitted a prior summarize response) so the summarizer can't feed on its own output across multiple calls.

  2. Build a prompt: "Read the following conversation transcript. Extract facts, preferences, or context worth remembering across future sessions. Output ONE candidate per line, prefixed with the kind: fact: …, pref: …, or context: …. Maximum 10 candidates."

  3. Send the filtered transcript as a single user turn + the instruction above. Use cfg.memory.summarizer_model if set (else the active model). The resulting assistant turn gets logged with meta = "summarize" so future :memory summarize calls exclude it.

  4. Parse the response line-by-line for (fact|pref|context): <content> shapes. Tolerate markdown bullet prefixes (-, *).

  5. For each candidate, prompt the user:

    [memory] candidate (fact): User prefers terse responses; no end-of-turn summaries.
    keep? [y/N/edit]
    
    • y → write to memory.jsonl.
    • N (or empty) → drop.
    • edit → readline-edit the content before write.
  6. Status when done: [aish] memory: added N candidates.

Why manual not automatic in v1

A successful auto-summarize that runs at every :quit would either:

  • be expensive (tokens on every exit)
  • drift over time if the model picks up noise
  • compete with the user's intentional :remember <text> curation

Manual gives the user the trigger. Q-list tracks auto-cadence options.


7. Meta Commands (Phase 4 additions)

Command Action
:remember <text> Shortcut for :memory add fact <text>
:memory add <kind> <text> Append a memory item (kind ∈ fact, pref, context)
:memory list Show all active memory items (id + ts + kind + content)
:memory forget <id> Append a tombstone for <id>
:memory clear Forget all active items (with [y/N] confirm)
:memory summarize Extract candidate items from current session via LLM
:memory inject Replace ctx.memory_items from a fresh load_memory() + LRU-by-ts truncation. Same logic as startup injection. Useful after hand-editing memory.jsonl or after :memory forget to immediately reflect in the system prompt.

:help updated.


8. Configuration Schema (Phase 4 example block)

memory = {
    -- Path defaults to <history.dir>/memory.jsonl. Override per fleet
    -- if you want shared memory (read-only is safer than write-shared).
    -- path = (history.dir or "~/.local/share/aish") .. "/memory.jsonl",

    -- Cap on how much memory content is injected into the system prompt
    -- at startup. Roughly 2000 chars ≈ 500 tokens. Older items are
    -- dropped from injection if exceeded; they remain in the file.
    inject_max_chars = 2000,

    -- Which model to use for :memory summarize. Defaults to the active
    -- model when nil. Use "fast" for speed; "deep" for better quality.
    summarizer_model = "fast",

    -- Auto-prune items older than N days at startup. nil = never auto-prune.
    -- Manual :memory forget always works regardless.
    -- prune_older_than_days = 90,
}

9. Migration from Phase 3

User-visible:

  • :remember, :memory list / forget / clear / summarize are new meta commands.
  • A [background] block in the system prompt appears when memory items exist.
  • Existing configs without memory = {...} continue to work — no injection, no auto-summarize. Phase 3 behavior intact.

Substrate (PHASE0.md §3) invariants: unchanged.

The [background] system-prompt suffix is composed dynamically by context.to_messages() (same pattern as Phase 2 MCP block and Phase 3 NORRIS suffix). No new substrate contract.


10. Out of Scope (Phase 4)

Per PHASE0.md §11 these belong to later phases:

  • Multi-model routing / cloud fallback (Phase 5).
  • Tree-sitter syntax highlighting (Phase 6).

Specifically out of Phase 4 scope despite proximity:

  • Multi-process memory sharing (single-writer assumed v1).
  • Retrieval-augmented injection (RAG over memory.jsonl) — v1 just LRU.
  • Auto-trigger of :memory summarize at :quit (Q-list).
  • Memory categories beyond fact/pref/context — minimal typing v1.
  • Cross-aish-instance memory sync (memory.jsonl in a synced dir works coincidentally; not designed for it).
  • Encryption at rest — same posture as session logs (none in v1).

11. Open Questions

# Question Impact Resolve by
Q31 Auto-summarize trigger: manual only (current), automatic at :quit, automatic on token-budget eviction, or config-flagged threshold? history.lua + repl.lua Phase 4 (analyze)
Q32 Editing memory items in place: :memory edit <id> to rewrite content? Append-only means edit = new id + forget old. Worth the extra meta? history.lua + UX Phase 4 (analyze)
Q33 Memory injection while in Norris mode context.lua Resolved at review (R-C1): SUPPRESSED. Memory items aren't injected when ctx.norris_active == true. Norris has its goal anchor in the NORRIS suffix; 16K of redundant background per 8-step run is not worth the marginal context value.
Q34 Memory kinds: stick with fact/pref/context or split prefs into a dedicated section of the system prompt (where they're more impactful)? v1 says no — flat list. context.lua + UX Phase 5 if it bites
Q35 Privacy / redaction: :memory summarize could capture sensitive tokens from a chat (passwords, paths). Should it auto-redact? Strip command-history-style? safety.lua + memory.lua Phase 4 (verify) — review user-emergent risk
Q36 Memory deduplication: user adds the same fact twice. Detect and warn, dedupe silently, or allow? v1: allow (cheap; user can :memory list to spot). history.lua Phase 4 (verify)

12. Implementation Plan (commit-by-commit)

Bottom-up, same cadence as Phase 0/1/2/3. Five commits expected:

  1. history.lua — memory store + ffi/libc.lua flock (R-B1 bundled).

    • ffi/libc.lua: cdef flock(2) + LOCK_EX/LOCK_NB/LOCK_UN constants
      • M.flock(fd, op) wrapper.
    • history.lua: M.open_memory(path) opens the file (creating parent dirs + meta-header line if empty), takes flock(LOCK_EX | LOCK_NB) on the fd, scans the existing JSONL for max id → handle.next_id. Returns (handle, nil) on success; (nil, errmsg) on lock-held.
    • handle:add(kind, content, tags?, source?): assigns next id, appends JSON line, returns id.
    • handle:forget(id): appends a tombstone for id.
    • handle:close(): releases flock + closes fd.
    • M.load_memory(path): reads all lines, builds forget-target set from kind=="forget" entries, returns active items sorted by ts descending. Drops items whose id is in the forget-set OR whose id is nil (meta header). Test in isolation: round-trip add/forget/load, lock-held detection (open twice in same process, second should fail).
  2. context.lua — memory injection. Add ctx.memory_items and the [background] block composer in to_messages(). Cap by inject_max_chars. Test in isolation: assert composition order (MCP → background → Norris); cap honored.

  3. repl.lua:remember + :memory list / add / forget / clear / inject. At startup, after MCP setup, open the memory handle + LRU-load items. Hook the meta dispatch. No summarize yet. End-to-end: run aish, :remember X, :quit, restart, :memory list shows X, :history shows X in [background].

  4. :memory summarize — manual extraction. Bundle a system-prompt for the summarizer model; parse response; per-candidate confirm prompt; append accepted items. End-to-end: short conversation, summarize, accept one of two candidates, restart, verify accepted one persists.

  5. config.lua — example memory block. Documentation-only; commented-out example. Final commit.

Risk / non-obvious

  • Counter persistence: memory:add needs a monotonic id. Options: (a) sidecar memory.id file with a single integer, (b) scan the JSONL on open for max id, (c) use timestamp as id (no monotonic guarantee across rapid adds). Plan: (b) — scan once at open; cache in the handle. Wraps if integer overflow but at 2^53 entries we're fine.
  • Tombstone resolution at load: build a set of forget-target ids from kind=="forget" entries; filter active items to exclude. Order doesn't matter (tombstones can appear before their targets if the file is hand-edited; the resolution is set-based).
  • Empty file at open vs nonexistent file: both should yield an empty memory handle. Phase 1's history.open already handles file creation; extend the pattern.
  • System prompt growth: the suffix-stacking pattern is up to 4 blocks now (default + MCP + background + Norris). Token cost ~200
    • ~80 + 2000 + ~250 = ~2530 chars baseline before any user/asst turns. Worth measuring at baseline phase.
  • :memory summarize parse robustness: small models may emit "fact: ..." sometimes with markdown bullets, sometimes without. Parser should tolerate ^[-*]?\s*(fact|pref|context):\s*(.+).
  • :memory clear with confirm: same UX as Phase 3 destructive prompts. [y/N] default-no.

Open at plan; resolve at review

  • Whether :remember should append to the LIVE ctx.memory_items immediately (so the model sees it on the next turn without restart) or only on next session boot. v1 says yes — append both to file AND to live ctx for immediate visibility.
  • Whether the summarizer should be fed the FULL session log or just recent turns (token budget). v1 says full minus the [background] suffix; cap at session-log size <= 64KB or last N turns.

End of Phase 4 Manifest — aish