From ffead3986c0eec5886607e69369ef202f46bd3c3 Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Wed, 13 May 2026 04:50:43 +0000
Subject: [PATCH] =?UTF-8?q?docs/PHASE4:=20review=20fold-in=20=E2=80=94=20f?=
 =?UTF-8?q?lock=20for=20race,=20Norris=20suppression,=20summarizer=20self-?=
 =?UTF-8?q?amp?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Independent review found 1 BLOCKER + 3 CONCERNs + 4 NITs.

R-B1 (BLOCKER): TOCTOU race on memory.jsonl — two aish processes
  scanning the same file compute identical next_ids. Resolution:
  flock(LOCK_EX | LOCK_NB) on the fd in M.open_memory, held until
  close. Bundled into commit #1 (per reviewer: cannot defer because
  adding flock retroactively means reopening the handle). Requires
  ffi/libc.lua extension: flock cdef + LOCK_EX/LOCK_NB/LOCK_UN
  constants + M.flock wrapper.

R-C1 (CONCERN, closes Q33): [background] block suppressed when
  ctx.norris_active. Avoids ~16K of redundant tokens per 8-step
  Norris run. Norris already anchors via its goal in the NORRIS
  suffix; memory items rarely change step-to-step planning.

R-C2 (CONCERN): summarizer self-amplification — running :memory
  summarize twice in one session would feed the prior summarize
  call's assistant turn into the next input. Resolution: operate
  on the session log file (history.load(session_path)) instead
  of ctx:to_messages(), and tag prior summarize turns with
  meta="summarize" so they're filterable.

R-C3 (CONCERN, cosmetic): §5 diagram clarified that
  DEFAULT_SYSTEM_PROMPT already carries the Phase 2 MCP block
  statically — not a separate dynamic block in v1.

NITs N1-N4 folded inline:
  N1 forget no-op for unknown id surfaces a status
  N2 path note: memory.jsonl is sibling of sessions/, no collision
  N3 item-id invariants: id >= 1; meta header has no id; tombstones
     with non-matching targets are no-ops
  N4 :memory inject semantics explicit (replace ctx.memory_items
     from a fresh load + LRU-by-ts truncation)

§3 module-changes table grew a new ffi/libc.lua row.
§12 commit #1 description tightened — flock work bundled inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 docs/PHASE4.md | 128 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 103 insertions(+), 25 deletions(-)

diff --git a/docs/PHASE4.md b/docs/PHASE4.md
index 8da394c..f845035 100644
--- a/docs/PHASE4.md
+++ b/docs/PHASE4.md
@@ -2,9 +2,53 @@
 
 **Project:** aish — AI-augmented conversational shell
 **Document:** Phase 4 Requirements, Architecture & Design Decisions
-**Status:** Analyze (formulate complete; current tree at `bea7175` probed)
+**Status:** Plan (review fold-in 2026-05-13 — TOCTOU race + Norris suppression + summarizer self-amp resolved)
 **Date:** 2026-05-13
 
+**Review fold-in (2026-05-13):**
+
+R-B1. **TOCTOU race on memory.jsonl** — two aish processes against the
+    same `history.dir` would each compute the same `next_id` and
+    produce duplicate ids; tombstones become ambiguous. Resolution:
+    `M.open_memory` takes an `flock(LOCK_EX | LOCK_NB)` advisory lock
+    on the file descriptor. Held until handle close. Failure to
+    acquire → `nil, "memory.jsonl held by another aish process"`.
+    Requires extending `ffi/libc.lua` with `flock(2)` — one cdef +
+    two constants (LOCK_EX=2, LOCK_NB=4). The lock is the *enforcement*
+    of the single-writer assumption stated in §2; documented in §2 row.
+
+R-C1. **System-prompt growth under Norris** — over an 8-step Norris run,
+    a 2KB [background] block adds ~16K redundant tokens. The Phase 0
+    §8 sliding window evicts user/asst pairs but keeps the system
+    prompt, so big system prompts displace conversation. Resolution
+    (Q33 closed): suppress [background] when `ctx.norris_active == true`.
+    Memory items rarely change Norris-step planning, and Norris has
+    its goal anchor via the NORRIS suffix already. §5 + §11 reflect.
+
+R-C2. **Summarizer self-amplification** — running `:memory summarize`
+    twice in one session would feed the previous summarize call's
+    *assistant turn* back into the input, leading to drift (re-propose
+    accepted items, no signal about rejections). Resolution: operate
+    on the session log file (`history.load(session_path)`) rather
+    than `ctx:to_messages()`. The session log is the authoritative
+    "what was discussed" stream. Skip lines tagged
+    `{role:"assistant", meta:"summarize"}` (a new optional field on
+    the JSONL turn). §6 reflects.
+
+R-C3. **DEFAULT_SYSTEM_PROMPT bakes MCP statically** — cosmetic. §5
+    diagram now reads "DEFAULT (Phase 0 + Phase 2 MCP) → [background]
+    → NORRIS". No code change.
+
+NITs folded inline:
+  N1. `:memory forget <id>` for an already-tombstoned id → no-op + status.
+  N2. §2 path note: memory.jsonl is sibling of sessions/, no collision.
+  N3. §4 invariant: items have id ≥ 1; meta header has no id and is
+      ignored; tombstones with non-matching targets are no-ops.
+  N4. §7 `:memory inject` semantics: replaces `ctx.memory_items` from
+      a fresh `load_memory()` + LRU-by-ts truncation (same as startup).
+
+**Analyze findings (2026-05-13):**
+
 **Analyze findings (2026-05-13):**
 
 A1. **history.lua surface is clean** — `M.open`/`Session:append`/
@@ -88,7 +132,7 @@ Three pillars per PHASE0.md §11 row 4:
 | Decision | Choice | Rationale |
 |---|---|---|
 | Storage format | Append-only JSONL, one item per line | Same convention as Phase 1's session logs. Greppable, robust to truncation, no parser dependency beyond vendored dkjson. |
-| Storage location | `<config.history.dir>/memory.jsonl` (sibling to `sessions/`) | Co-located with session logs; users can back up one directory. Defaults to `~/.local/share/aish/memory.jsonl`. |
+| Storage location | `<config.history.dir>/memory.jsonl` (sibling to `sessions/`) | Co-located with session logs; users can back up one directory. Defaults to `~/.local/share/aish/memory.jsonl`. Path is a sibling of `sessions/` (not inside it), so `:save <name>` cannot collide. |
 | Memory-item shape | `{id, ts, kind, content, tags?, source?}` | `id` is monotonic int (counter persisted in `memory.id`); `kind ∈ {"fact","pref","context"}` lightly typed for future routing; `content` is the body text; optional `tags` array; optional `source` carrying session-id provenance when auto-extracted. |
 | Forget semantics | **Append a tombstone**, don't rewrite the file (`{id, ts, kind:"forget", target:<other_id>}`) | Append-only preserves history. `M.load_memory` resolves tombstones during read — silently drops any item whose `id` appears as a forget-target. `:memory clear` writes one tombstone per active item; could also support a wildcard forget. |
 | Auto-summarize cadence | **Manual only in v1** (`:memory summarize`). Auto-trigger on `:quit` or by token count is Q-list material. | Conservative; users opt in. Avoids burning tokens on every session end. Manual surface lets the user QA candidates before they land. |
@@ -97,7 +141,7 @@ Three pillars per PHASE0.md §11 row 4:
 | Injection budget | `cfg.memory.inject_max_chars` (default 2000 chars total — roughly 500 tokens) | Cap so memory doesn't eat the whole context. LRU-by-`ts` selection if items exceed budget. |
 | Pruning policy | Manual `:memory forget` + optional `cfg.memory.prune_older_than_days` (default unset — no auto-pruning) | Conservative defaults; user owns the lifecycle. |
 | Interaction with sessions | `memory.jsonl` is independent of `sessions/*.jsonl`. Session JSONL stays the per-conversation log; memory is the curated cross-session knowledge | Distinct concerns. Session log answers "what did we talk about last Tuesday?"; memory answers "what does aish know about me/this-project?". |
-| Concurrency | Single-writer assumed (one aish process per memory dir). Reader is the same process | Same assumption as session logs. Multi-process memory sharing is out of scope. |
+| Concurrency | Single-writer **enforced via `flock(LOCK_EX \| LOCK_NB)`** (R-B1) on the memory.jsonl file descriptor in `M.open_memory`. Held until close. Acquire failure → handle creation fails with a clear status message | Session logs got away with single-writer-by-uniqueness (timestamped filenames). memory.jsonl is one shared file, so the flock is the actual enforcement. The lock is advisory (Linux file-lock semantics) but every aish process honors it, which is sufficient for our trust model. |
 
 ---
 
@@ -105,13 +149,14 @@ Three pillars per PHASE0.md §11 row 4:
 
 | File | State after Phase 3 | Phase 4 changes |
 |---|---|---|
-| `history.lua` | `M.open(path, meta)`, `session:append(turn)`, `M.load(path)`, `M.list_sessions(dir)` | Add memory functions alongside session functions: `M.open_memory(path) -> memory_handle`; `memory:add(kind, content, tags?, source?) -> id`; `memory:forget(id)`; `M.load_memory(path) -> items_table` (resolves tombstones). `memory_handle` is similar shape to `session_handle` — internal fd + monotonic counter. |
+| `history.lua` | `M.open(path, meta)`, `session:append(turn)`, `M.load(path)`, `M.list_sessions(dir)` | Add memory functions alongside session functions: `M.open_memory(path) -> handle\|nil, err`; `handle:add(kind, content, tags?, source?) -> id`; `handle:forget(id)`; `handle:close()`; `M.load_memory(path) -> items_table` (resolves tombstones). Handle internals: fd (LuaJIT FFI int), next_id (scanned from existing JSONL), held flock. |
+| `ffi/libc.lua` | `chdir`, `errno`, `strerror`, plus Phase 1's waitpid/raw I/O/termios/poll, plus Phase 1's read/write/close/kill | Add `flock(2)` cdef (`int flock(int fd, int operation)`), constants `LOCK_EX = 2`, `LOCK_NB = 4`, `LOCK_UN = 8`. Wrapper `M.flock(fd, op) -> true\|false, errmsg`. Used by `history.M.open_memory` for the single-writer enforcement (R-B1). |
 | `context.lua` | system prompt + MCP block + NORRIS suffix toggle | Add a `memory_items` field on Context. `to_messages()` composes a dynamic "[background]" block on the system prompt when `memory_items` is non-empty AND not already in Norris mode (don't double-pile). Cap respected via the inject_max_chars budget. |
 | `repl.lua` | meta cmds + tool sub-loop + Norris driver | New meta: `:remember <text>` (shortcut for `:memory add fact <text>`); `:memory add <kind> <text>`; `:memory list`; `:memory forget <id>`; `:memory clear`; `:memory summarize`. At startup, after loading config + opening session, also open memory handle and inject the top-N items into `ctx.memory_items`. |
 | `broker.lua` | streaming chat + opts.tools/max_tokens/timeout_ms | No structural changes. Used by the summarizer (calls broker.chat with the session log as a single user turn). |
 | `config.lua` | example with mcp + safety blocks | Add commented-out `memory = { ... }` example. Default behavior is "no memory injection, no auto-summarize". |
 | `executor.lua` | unchanged | unchanged |
-| `safety.lua` | is_destructive + norris_step | unchanged |
+| `safety.lua` | is_destructive + norris_step | unchanged (Norris-side suppression of background block is in context.lua, not safety.lua) |
 
 No new module files. All Phase 4 functionality grows existing files —
 mostly `history.lua` and `repl.lua`.
@@ -145,6 +190,16 @@ list in the [background] block. Future phases may route them
 differently (e.g. `pref` into a system-prompt section, `context` into
 a user-style preamble). Today they're prose.
 
+### Item-id invariants (N3)
+
+- Items have `id ≥ 1`. The optional meta header line `{"meta":{...}}`
+  has no `id` field and is ignored during load.
+- Tombstones with non-matching `target` (id doesn't exist, or already
+  tombstoned) are no-ops at load — silently dropped from the active
+  set. The `:memory forget` meta handler also checks active-set
+  membership before appending a tombstone, surfacing a status when
+  the id isn't active.
+
 ---
 
 ## 5. Startup Injection
@@ -163,8 +218,7 @@ When aish boots and `cfg.memory` is present (or `memory.jsonl` exists):
 `context.to_messages()` composition:
 
 ```
-<DEFAULT_SYSTEM_PROMPT>
-<Phase 2 MCP block>
+<DEFAULT_SYSTEM_PROMPT> (Phase 0 + Phase 2 MCP block, statically embedded)
 
 [background] (memory loaded at startup; managed via :memory)
 - (fact) User prefers terse responses; no end-of-turn summaries.
@@ -172,12 +226,17 @@ When aish boots and `cfg.memory` is present (or `memory.jsonl` exists):
 ```
 
 Order of suffixes on the system prompt:
-1. Default Phase 0 prompt
-2. Phase 2 MCP guidance block (always present)
-3. Phase 4 [background] block (when memory_items non-empty)
-4. Phase 3 NORRIS MODE block (when norris_active)
+1. DEFAULT_SYSTEM_PROMPT (Phase 0 + Phase 2 MCP guidance, currently
+   baked-in to the static constant — R-C3 note: not a separate dynamic
+   block in v1; future phases may split)
+2. Phase 4 [background] block (when memory_items non-empty AND NOT in
+   Norris mode — R-C1 suppression to avoid ~16K of redundant tokens
+   per Norris run)
+3. Phase 3 NORRIS MODE block (when norris_active)
 
-Norris is last so its instructions take precedence when active.
+When Norris is active the order becomes: DEFAULT → NORRIS (no background).
+Norris's planning loop already has the goal anchored in its suffix; the
+memory items rarely change step-to-step planning.
 
 ---
 
@@ -189,14 +248,22 @@ turns and propose candidate memory items.
 
 ### Flow
 
-1. Build a prompt: "Read the following conversation transcript. Extract
+1. **Source of truth is the session log file** (R-C2), not
+   `ctx:to_messages()`. `history.load(session_path)` returns all
+   turns; filter out turns tagged `meta = "summarize"` (set on the
+   assistant turn that emitted a prior summarize response) so the
+   summarizer can't feed on its own output across multiple calls.
+2. Build a prompt: "Read the following conversation transcript. Extract
    facts, preferences, or context worth remembering across future
    sessions. Output ONE candidate per line, prefixed with the kind:
    `fact: …`, `pref: …`, or `context: …`. Maximum 10 candidates."
-2. Send `ctx:to_messages()` minus the [background] suffix (avoid
-   feedback) + the user prompt above.
-3. Parse the response line-by-line for `(fact|pref|context):
-   <content>` shapes.
+3. Send the filtered transcript as a single user turn + the
+   instruction above. Use `cfg.memory.summarizer_model` if set (else
+   the active model). The resulting assistant turn gets logged
+   with `meta = "summarize"` so future :memory summarize calls
+   exclude it.
+4. Parse the response line-by-line for `(fact|pref|context):
+   <content>` shapes. Tolerate markdown bullet prefixes (`-`, `*`).
 4. For each candidate, prompt the user:
 
    ```
@@ -231,7 +298,7 @@ Manual gives the user the trigger. Q-list tracks auto-cadence options.
 | `:memory forget <id>` | Append a tombstone for `<id>` |
 | `:memory clear` | Forget all active items (with `[y/N]` confirm) |
 | `:memory summarize` | Extract candidate items from current session via LLM |
-| `:memory inject` | Re-inject current memory.jsonl items into Context (after edits) |
+| `:memory inject` | Replace `ctx.memory_items` from a fresh `load_memory()` + LRU-by-ts truncation. Same logic as startup injection. Useful after hand-editing `memory.jsonl` or after `:memory forget` to immediately reflect in the system prompt. |
 
 `:help` updated.
 
@@ -303,7 +370,7 @@ Specifically out of Phase 4 scope despite proximity:
 |---|---|---|---|
 | Q31 | Auto-summarize trigger: manual only (current), automatic at `:quit`, automatic on token-budget eviction, or config-flagged threshold? | history.lua + repl.lua | Phase 4 (analyze) |
 | Q32 | Editing memory items in place: `:memory edit <id>` to rewrite content? Append-only means edit = new id + forget old. Worth the extra meta? | history.lua + UX | Phase 4 (analyze) |
-| Q33 | Memory injection while in Norris mode: does the [background] block stay, get suppressed, or merge with the Norris goal? Proposal: keep both; Norris is the last block and dominates. | context.lua | Phase 4 (plan) |
+| Q33 | ~~Memory injection while in Norris mode~~ | context.lua | **Resolved at review (R-C1)**: SUPPRESSED. Memory items aren't injected when `ctx.norris_active == true`. Norris has its goal anchor in the NORRIS suffix; 16K of redundant background per 8-step run is not worth the marginal context value. |
 | Q34 | Memory kinds: stick with fact/pref/context or split prefs into a dedicated section of the system prompt (where they're more impactful)? v1 says no — flat list. | context.lua + UX | Phase 5 if it bites |
 | Q35 | Privacy / redaction: `:memory summarize` could capture sensitive tokens from a chat (passwords, paths). Should it auto-redact? Strip command-history-style? | safety.lua + memory.lua | Phase 4 (verify) — review user-emergent risk |
 | Q36 | Memory deduplication: user adds the same fact twice. Detect and warn, dedupe silently, or allow? v1: allow (cheap; user can `:memory list` to spot). | history.lua | Phase 4 (verify) |
@@ -314,12 +381,23 @@ Specifically out of Phase 4 scope despite proximity:
 
 Bottom-up, same cadence as Phase 0/1/2/3. Five commits expected:
 
-1. **`history.lua` — memory store.** Add `M.open_memory`,
-   `memory:add(kind, content, tags?, source?)`, `memory:forget(id)`,
-   `M.load_memory(path)` with tombstone resolution. Persistent
-   monotonic counter via a sidecar `memory.id` file (or scan the JSONL
-   for max id at open time — pick at analyze). **Test in isolation**:
-   round-trip add/forget/load against a temp file.
+1. **`history.lua` — memory store + `ffi/libc.lua` flock (R-B1 bundled).**
+   - `ffi/libc.lua`: cdef `flock(2)` + LOCK_EX/LOCK_NB/LOCK_UN constants
+     + `M.flock(fd, op)` wrapper.
+   - `history.lua`: `M.open_memory(path)` opens the file (creating parent
+     dirs + meta-header line if empty), takes `flock(LOCK_EX | LOCK_NB)`
+     on the fd, scans the existing JSONL for max id → handle.next_id.
+     Returns `(handle, nil)` on success; `(nil, errmsg)` on lock-held.
+   - `handle:add(kind, content, tags?, source?)`: assigns next id,
+     appends JSON line, returns id.
+   - `handle:forget(id)`: appends a tombstone for id.
+   - `handle:close()`: releases flock + closes fd.
+   - `M.load_memory(path)`: reads all lines, builds forget-target set
+     from kind=="forget" entries, returns active items sorted by `ts`
+     descending. Drops items whose id is in the forget-set OR whose id
+     is nil (meta header).
+   **Test in isolation**: round-trip add/forget/load, lock-held
+   detection (open twice in same process, second should fail).
 
 2. **`context.lua` — memory injection.** Add `ctx.memory_items` and
    the `[background]` block composer in `to_messages()`. Cap by