docs/PHASE4: review fold-in — flock for race, Norris suppression, summarizer self-amp

Independent review found 1 BLOCKER + 3 CONCERNs + 4 NITs.

R-B1 (BLOCKER): TOCTOU race on memory.jsonl — two aish processes
  scanning the same file compute identical next_ids. Resolution:
  flock(LOCK_EX | LOCK_NB) on the fd in M.open_memory, held until
  close. Bundled into commit #1 (per reviewer: cannot defer because
  adding flock retroactively means reopening the handle). Requires
  ffi/libc.lua extension: flock cdef + LOCK_EX/LOCK_NB/LOCK_UN
  constants + M.flock wrapper.

R-C1 (CONCERN, closes Q33): [background] block suppressed when
  ctx.norris_active. Avoids ~16K of redundant tokens per 8-step
  Norris run. Norris already anchors via its goal in the NORRIS
  suffix; memory items rarely change step-to-step planning.

R-C2 (CONCERN): summarizer self-amplification — running :memory
  summarize twice in one session would feed the prior summarize
  call's assistant turn into the next input. Resolution: operate
  on the session log file (history.load(session_path)) instead
  of ctx:to_messages(), and tag prior summarize turns with
  meta="summarize" so they're filterable.

R-C3 (CONCERN, cosmetic): §5 diagram clarified that
  DEFAULT_SYSTEM_PROMPT already carries the Phase 2 MCP block
  statically — not a separate dynamic block in v1.

NITs N1-N4 folded inline:
  N1 forget no-op for unknown id surfaces a status
  N2 path note: memory.jsonl is sibling of sessions/, no collision
  N3 item-id invariants: id >= 1; meta header has no id; tombstones
     with non-matching targets are no-ops
  N4 :memory inject semantics explicit (replace ctx.memory_items
     from a fresh load + LRU-by-ts truncation)

§3 module-changes table grew a new ffi/libc.lua row.
§12 commit #1 description tightened — flock work bundled inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-13 04:50:43 +00:00
parent 2146b909f8
commit ffead3986c
+103 -25
View File
@@ -2,9 +2,53 @@
**Project:** aish — AI-augmented conversational shell **Project:** aish — AI-augmented conversational shell
**Document:** Phase 4 Requirements, Architecture & Design Decisions **Document:** Phase 4 Requirements, Architecture & Design Decisions
**Status:** Analyze (formulate complete; current tree at `bea7175` probed) **Status:** Plan (review fold-in 2026-05-13 — TOCTOU race + Norris suppression + summarizer self-amp resolved)
**Date:** 2026-05-13 **Date:** 2026-05-13
**Review fold-in (2026-05-13):**
R-B1. **TOCTOU race on memory.jsonl** — two aish processes against the
same `history.dir` would each compute the same `next_id` and
produce duplicate ids; tombstones become ambiguous. Resolution:
`M.open_memory` takes an `flock(LOCK_EX | LOCK_NB)` advisory lock
on the file descriptor. Held until handle close. Failure to
acquire → `nil, "memory.jsonl held by another aish process"`.
Requires extending `ffi/libc.lua` with `flock(2)` — one cdef +
two constants (LOCK_EX=2, LOCK_NB=4). The lock is the *enforcement*
of the single-writer assumption stated in §2; documented in §2 row.
R-C1. **System-prompt growth under Norris** — over an 8-step Norris run,
a 2KB [background] block adds ~16K redundant tokens. The Phase 0
§8 sliding window evicts user/asst pairs but keeps the system
prompt, so big system prompts displace conversation. Resolution
(Q33 closed): suppress [background] when `ctx.norris_active == true`.
Memory items rarely change Norris-step planning, and Norris has
its goal anchor via the NORRIS suffix already. §5 + §11 reflect.
R-C2. **Summarizer self-amplification** — running `:memory summarize`
twice in one session would feed the previous summarize call's
*assistant turn* back into the input, leading to drift (re-propose
accepted items, no signal about rejections). Resolution: operate
on the session log file (`history.load(session_path)`) rather
than `ctx:to_messages()`. The session log is the authoritative
"what was discussed" stream. Skip lines tagged
`{role:"assistant", meta:"summarize"}` (a new optional field on
the JSONL turn). §6 reflects.
R-C3. **DEFAULT_SYSTEM_PROMPT bakes MCP statically** — cosmetic. §5
diagram now reads "DEFAULT (Phase 0 + Phase 2 MCP) → [background]
→ NORRIS". No code change.
NITs folded inline:
N1. `:memory forget <id>` for an already-tombstoned id → no-op + status.
N2. §2 path note: memory.jsonl is sibling of sessions/, no collision.
N3. §4 invariant: items have id ≥ 1; meta header has no id and is
ignored; tombstones with non-matching targets are no-ops.
N4. §7 `:memory inject` semantics: replaces `ctx.memory_items` from
a fresh `load_memory()` + LRU-by-ts truncation (same as startup).
**Analyze findings (2026-05-13):**
**Analyze findings (2026-05-13):** **Analyze findings (2026-05-13):**
A1. **history.lua surface is clean**`M.open`/`Session:append`/ A1. **history.lua surface is clean**`M.open`/`Session:append`/
@@ -88,7 +132,7 @@ Three pillars per PHASE0.md §11 row 4:
| Decision | Choice | Rationale | | Decision | Choice | Rationale |
|---|---|---| |---|---|---|
| Storage format | Append-only JSONL, one item per line | Same convention as Phase 1's session logs. Greppable, robust to truncation, no parser dependency beyond vendored dkjson. | | Storage format | Append-only JSONL, one item per line | Same convention as Phase 1's session logs. Greppable, robust to truncation, no parser dependency beyond vendored dkjson. |
| Storage location | `<config.history.dir>/memory.jsonl` (sibling to `sessions/`) | Co-located with session logs; users can back up one directory. Defaults to `~/.local/share/aish/memory.jsonl`. | | Storage location | `<config.history.dir>/memory.jsonl` (sibling to `sessions/`) | Co-located with session logs; users can back up one directory. Defaults to `~/.local/share/aish/memory.jsonl`. Path is a sibling of `sessions/` (not inside it), so `:save <name>` cannot collide. |
| Memory-item shape | `{id, ts, kind, content, tags?, source?}` | `id` is monotonic int (counter persisted in `memory.id`); `kind ∈ {"fact","pref","context"}` lightly typed for future routing; `content` is the body text; optional `tags` array; optional `source` carrying session-id provenance when auto-extracted. | | Memory-item shape | `{id, ts, kind, content, tags?, source?}` | `id` is monotonic int (counter persisted in `memory.id`); `kind ∈ {"fact","pref","context"}` lightly typed for future routing; `content` is the body text; optional `tags` array; optional `source` carrying session-id provenance when auto-extracted. |
| Forget semantics | **Append a tombstone**, don't rewrite the file (`{id, ts, kind:"forget", target:<other_id>}`) | Append-only preserves history. `M.load_memory` resolves tombstones during read — silently drops any item whose `id` appears as a forget-target. `:memory clear` writes one tombstone per active item; could also support a wildcard forget. | | Forget semantics | **Append a tombstone**, don't rewrite the file (`{id, ts, kind:"forget", target:<other_id>}`) | Append-only preserves history. `M.load_memory` resolves tombstones during read — silently drops any item whose `id` appears as a forget-target. `:memory clear` writes one tombstone per active item; could also support a wildcard forget. |
| Auto-summarize cadence | **Manual only in v1** (`:memory summarize`). Auto-trigger on `:quit` or by token count is Q-list material. | Conservative; users opt in. Avoids burning tokens on every session end. Manual surface lets the user QA candidates before they land. | | Auto-summarize cadence | **Manual only in v1** (`:memory summarize`). Auto-trigger on `:quit` or by token count is Q-list material. | Conservative; users opt in. Avoids burning tokens on every session end. Manual surface lets the user QA candidates before they land. |
@@ -97,7 +141,7 @@ Three pillars per PHASE0.md §11 row 4:
| Injection budget | `cfg.memory.inject_max_chars` (default 2000 chars total — roughly 500 tokens) | Cap so memory doesn't eat the whole context. LRU-by-`ts` selection if items exceed budget. | | Injection budget | `cfg.memory.inject_max_chars` (default 2000 chars total — roughly 500 tokens) | Cap so memory doesn't eat the whole context. LRU-by-`ts` selection if items exceed budget. |
| Pruning policy | Manual `:memory forget` + optional `cfg.memory.prune_older_than_days` (default unset — no auto-pruning) | Conservative defaults; user owns the lifecycle. | | Pruning policy | Manual `:memory forget` + optional `cfg.memory.prune_older_than_days` (default unset — no auto-pruning) | Conservative defaults; user owns the lifecycle. |
| Interaction with sessions | `memory.jsonl` is independent of `sessions/*.jsonl`. Session JSONL stays the per-conversation log; memory is the curated cross-session knowledge | Distinct concerns. Session log answers "what did we talk about last Tuesday?"; memory answers "what does aish know about me/this-project?". | | Interaction with sessions | `memory.jsonl` is independent of `sessions/*.jsonl`. Session JSONL stays the per-conversation log; memory is the curated cross-session knowledge | Distinct concerns. Session log answers "what did we talk about last Tuesday?"; memory answers "what does aish know about me/this-project?". |
| Concurrency | Single-writer assumed (one aish process per memory dir). Reader is the same process | Same assumption as session logs. Multi-process memory sharing is out of scope. | | Concurrency | Single-writer **enforced via `flock(LOCK_EX \| LOCK_NB)`** (R-B1) on the memory.jsonl file descriptor in `M.open_memory`. Held until close. Acquire failure → handle creation fails with a clear status message | Session logs got away with single-writer-by-uniqueness (timestamped filenames). memory.jsonl is one shared file, so the flock is the actual enforcement. The lock is advisory (Linux file-lock semantics) but every aish process honors it, which is sufficient for our trust model. |
--- ---
@@ -105,13 +149,14 @@ Three pillars per PHASE0.md §11 row 4:
| File | State after Phase 3 | Phase 4 changes | | File | State after Phase 3 | Phase 4 changes |
|---|---|---| |---|---|---|
| `history.lua` | `M.open(path, meta)`, `session:append(turn)`, `M.load(path)`, `M.list_sessions(dir)` | Add memory functions alongside session functions: `M.open_memory(path) -> memory_handle`; `memory:add(kind, content, tags?, source?) -> id`; `memory:forget(id)`; `M.load_memory(path) -> items_table` (resolves tombstones). `memory_handle` is similar shape to `session_handle` — internal fd + monotonic counter. | | `history.lua` | `M.open(path, meta)`, `session:append(turn)`, `M.load(path)`, `M.list_sessions(dir)` | Add memory functions alongside session functions: `M.open_memory(path) -> handle\|nil, err`; `handle:add(kind, content, tags?, source?) -> id`; `handle:forget(id)`; `handle:close()`; `M.load_memory(path) -> items_table` (resolves tombstones). Handle internals: fd (LuaJIT FFI int), next_id (scanned from existing JSONL), held flock. |
| `ffi/libc.lua` | `chdir`, `errno`, `strerror`, plus Phase 1's waitpid/raw I/O/termios/poll, plus Phase 1's read/write/close/kill | Add `flock(2)` cdef (`int flock(int fd, int operation)`), constants `LOCK_EX = 2`, `LOCK_NB = 4`, `LOCK_UN = 8`. Wrapper `M.flock(fd, op) -> true\|false, errmsg`. Used by `history.M.open_memory` for the single-writer enforcement (R-B1). |
| `context.lua` | system prompt + MCP block + NORRIS suffix toggle | Add a `memory_items` field on Context. `to_messages()` composes a dynamic "[background]" block on the system prompt when `memory_items` is non-empty AND not already in Norris mode (don't double-pile). Cap respected via the inject_max_chars budget. | | `context.lua` | system prompt + MCP block + NORRIS suffix toggle | Add a `memory_items` field on Context. `to_messages()` composes a dynamic "[background]" block on the system prompt when `memory_items` is non-empty AND not already in Norris mode (don't double-pile). Cap respected via the inject_max_chars budget. |
| `repl.lua` | meta cmds + tool sub-loop + Norris driver | New meta: `:remember <text>` (shortcut for `:memory add fact <text>`); `:memory add <kind> <text>`; `:memory list`; `:memory forget <id>`; `:memory clear`; `:memory summarize`. At startup, after loading config + opening session, also open memory handle and inject the top-N items into `ctx.memory_items`. | | `repl.lua` | meta cmds + tool sub-loop + Norris driver | New meta: `:remember <text>` (shortcut for `:memory add fact <text>`); `:memory add <kind> <text>`; `:memory list`; `:memory forget <id>`; `:memory clear`; `:memory summarize`. At startup, after loading config + opening session, also open memory handle and inject the top-N items into `ctx.memory_items`. |
| `broker.lua` | streaming chat + opts.tools/max_tokens/timeout_ms | No structural changes. Used by the summarizer (calls broker.chat with the session log as a single user turn). | | `broker.lua` | streaming chat + opts.tools/max_tokens/timeout_ms | No structural changes. Used by the summarizer (calls broker.chat with the session log as a single user turn). |
| `config.lua` | example with mcp + safety blocks | Add commented-out `memory = { ... }` example. Default behavior is "no memory injection, no auto-summarize". | | `config.lua` | example with mcp + safety blocks | Add commented-out `memory = { ... }` example. Default behavior is "no memory injection, no auto-summarize". |
| `executor.lua` | unchanged | unchanged | | `executor.lua` | unchanged | unchanged |
| `safety.lua` | is_destructive + norris_step | unchanged | | `safety.lua` | is_destructive + norris_step | unchanged (Norris-side suppression of background block is in context.lua, not safety.lua) |
No new module files. All Phase 4 functionality grows existing files — No new module files. All Phase 4 functionality grows existing files —
mostly `history.lua` and `repl.lua`. mostly `history.lua` and `repl.lua`.
@@ -145,6 +190,16 @@ list in the [background] block. Future phases may route them
differently (e.g. `pref` into a system-prompt section, `context` into differently (e.g. `pref` into a system-prompt section, `context` into
a user-style preamble). Today they're prose. a user-style preamble). Today they're prose.
### Item-id invariants (N3)
- Items have `id ≥ 1`. The optional meta header line `{"meta":{...}}`
has no `id` field and is ignored during load.
- Tombstones with non-matching `target` (id doesn't exist, or already
tombstoned) are no-ops at load — silently dropped from the active
set. The `:memory forget` meta handler also checks active-set
membership before appending a tombstone, surfacing a status when
the id isn't active.
--- ---
## 5. Startup Injection ## 5. Startup Injection
@@ -163,8 +218,7 @@ When aish boots and `cfg.memory` is present (or `memory.jsonl` exists):
`context.to_messages()` composition: `context.to_messages()` composition:
``` ```
<DEFAULT_SYSTEM_PROMPT> <DEFAULT_SYSTEM_PROMPT> (Phase 0 + Phase 2 MCP block, statically embedded)
<Phase 2 MCP block>
[background] (memory loaded at startup; managed via :memory) [background] (memory loaded at startup; managed via :memory)
- (fact) User prefers terse responses; no end-of-turn summaries. - (fact) User prefers terse responses; no end-of-turn summaries.
@@ -172,12 +226,17 @@ When aish boots and `cfg.memory` is present (or `memory.jsonl` exists):
``` ```
Order of suffixes on the system prompt: Order of suffixes on the system prompt:
1. Default Phase 0 prompt 1. DEFAULT_SYSTEM_PROMPT (Phase 0 + Phase 2 MCP guidance, currently
2. Phase 2 MCP guidance block (always present) baked-in to the static constant — R-C3 note: not a separate dynamic
3. Phase 4 [background] block (when memory_items non-empty) block in v1; future phases may split)
4. Phase 3 NORRIS MODE block (when norris_active) 2. Phase 4 [background] block (when memory_items non-empty AND NOT in
Norris mode — R-C1 suppression to avoid ~16K of redundant tokens
per Norris run)
3. Phase 3 NORRIS MODE block (when norris_active)
Norris is last so its instructions take precedence when active. When Norris is active the order becomes: DEFAULT → NORRIS (no background).
Norris's planning loop already has the goal anchored in its suffix; the
memory items rarely change step-to-step planning.
--- ---
@@ -189,14 +248,22 @@ turns and propose candidate memory items.
### Flow ### Flow
1. Build a prompt: "Read the following conversation transcript. Extract 1. **Source of truth is the session log file** (R-C2), not
`ctx:to_messages()`. `history.load(session_path)` returns all
turns; filter out turns tagged `meta = "summarize"` (set on the
assistant turn that emitted a prior summarize response) so the
summarizer can't feed on its own output across multiple calls.
2. Build a prompt: "Read the following conversation transcript. Extract
facts, preferences, or context worth remembering across future facts, preferences, or context worth remembering across future
sessions. Output ONE candidate per line, prefixed with the kind: sessions. Output ONE candidate per line, prefixed with the kind:
`fact: …`, `pref: …`, or `context: …`. Maximum 10 candidates." `fact: …`, `pref: …`, or `context: …`. Maximum 10 candidates."
2. Send `ctx:to_messages()` minus the [background] suffix (avoid 3. Send the filtered transcript as a single user turn + the
feedback) + the user prompt above. instruction above. Use `cfg.memory.summarizer_model` if set (else
3. Parse the response line-by-line for `(fact|pref|context): the active model). The resulting assistant turn gets logged
<content>` shapes. with `meta = "summarize"` so future :memory summarize calls
exclude it.
4. Parse the response line-by-line for `(fact|pref|context):
<content>` shapes. Tolerate markdown bullet prefixes (`-`, `*`).
4. For each candidate, prompt the user: 4. For each candidate, prompt the user:
``` ```
@@ -231,7 +298,7 @@ Manual gives the user the trigger. Q-list tracks auto-cadence options.
| `:memory forget <id>` | Append a tombstone for `<id>` | | `:memory forget <id>` | Append a tombstone for `<id>` |
| `:memory clear` | Forget all active items (with `[y/N]` confirm) | | `:memory clear` | Forget all active items (with `[y/N]` confirm) |
| `:memory summarize` | Extract candidate items from current session via LLM | | `:memory summarize` | Extract candidate items from current session via LLM |
| `:memory inject` | Re-inject current memory.jsonl items into Context (after edits) | | `:memory inject` | Replace `ctx.memory_items` from a fresh `load_memory()` + LRU-by-ts truncation. Same logic as startup injection. Useful after hand-editing `memory.jsonl` or after `:memory forget` to immediately reflect in the system prompt. |
`:help` updated. `:help` updated.
@@ -303,7 +370,7 @@ Specifically out of Phase 4 scope despite proximity:
|---|---|---|---| |---|---|---|---|
| Q31 | Auto-summarize trigger: manual only (current), automatic at `:quit`, automatic on token-budget eviction, or config-flagged threshold? | history.lua + repl.lua | Phase 4 (analyze) | | Q31 | Auto-summarize trigger: manual only (current), automatic at `:quit`, automatic on token-budget eviction, or config-flagged threshold? | history.lua + repl.lua | Phase 4 (analyze) |
| Q32 | Editing memory items in place: `:memory edit <id>` to rewrite content? Append-only means edit = new id + forget old. Worth the extra meta? | history.lua + UX | Phase 4 (analyze) | | Q32 | Editing memory items in place: `:memory edit <id>` to rewrite content? Append-only means edit = new id + forget old. Worth the extra meta? | history.lua + UX | Phase 4 (analyze) |
| Q33 | Memory injection while in Norris mode: does the [background] block stay, get suppressed, or merge with the Norris goal? Proposal: keep both; Norris is the last block and dominates. | context.lua | Phase 4 (plan) | | Q33 | ~~Memory injection while in Norris mode~~ | context.lua | **Resolved at review (R-C1)**: SUPPRESSED. Memory items aren't injected when `ctx.norris_active == true`. Norris has its goal anchor in the NORRIS suffix; 16K of redundant background per 8-step run is not worth the marginal context value. |
| Q34 | Memory kinds: stick with fact/pref/context or split prefs into a dedicated section of the system prompt (where they're more impactful)? v1 says no — flat list. | context.lua + UX | Phase 5 if it bites | | Q34 | Memory kinds: stick with fact/pref/context or split prefs into a dedicated section of the system prompt (where they're more impactful)? v1 says no — flat list. | context.lua + UX | Phase 5 if it bites |
| Q35 | Privacy / redaction: `:memory summarize` could capture sensitive tokens from a chat (passwords, paths). Should it auto-redact? Strip command-history-style? | safety.lua + memory.lua | Phase 4 (verify) — review user-emergent risk | | Q35 | Privacy / redaction: `:memory summarize` could capture sensitive tokens from a chat (passwords, paths). Should it auto-redact? Strip command-history-style? | safety.lua + memory.lua | Phase 4 (verify) — review user-emergent risk |
| Q36 | Memory deduplication: user adds the same fact twice. Detect and warn, dedupe silently, or allow? v1: allow (cheap; user can `:memory list` to spot). | history.lua | Phase 4 (verify) | | Q36 | Memory deduplication: user adds the same fact twice. Detect and warn, dedupe silently, or allow? v1: allow (cheap; user can `:memory list` to spot). | history.lua | Phase 4 (verify) |
@@ -314,12 +381,23 @@ Specifically out of Phase 4 scope despite proximity:
Bottom-up, same cadence as Phase 0/1/2/3. Five commits expected: Bottom-up, same cadence as Phase 0/1/2/3. Five commits expected:
1. **`history.lua` — memory store.** Add `M.open_memory`, 1. **`history.lua` — memory store + `ffi/libc.lua` flock (R-B1 bundled).**
`memory:add(kind, content, tags?, source?)`, `memory:forget(id)`, - `ffi/libc.lua`: cdef `flock(2)` + LOCK_EX/LOCK_NB/LOCK_UN constants
`M.load_memory(path)` with tombstone resolution. Persistent + `M.flock(fd, op)` wrapper.
monotonic counter via a sidecar `memory.id` file (or scan the JSONL - `history.lua`: `M.open_memory(path)` opens the file (creating parent
for max id at open time — pick at analyze). **Test in isolation**: dirs + meta-header line if empty), takes `flock(LOCK_EX | LOCK_NB)`
round-trip add/forget/load against a temp file. on the fd, scans the existing JSONL for max id → handle.next_id.
Returns `(handle, nil)` on success; `(nil, errmsg)` on lock-held.
- `handle:add(kind, content, tags?, source?)`: assigns next id,
appends JSON line, returns id.
- `handle:forget(id)`: appends a tombstone for id.
- `handle:close()`: releases flock + closes fd.
- `M.load_memory(path)`: reads all lines, builds forget-target set
from kind=="forget" entries, returns active items sorted by `ts`
descending. Drops items whose id is in the forget-set OR whose id
is nil (meta header).
**Test in isolation**: round-trip add/forget/load, lock-held
detection (open twice in same process, second should fail).
2. **`context.lua` — memory injection.** Add `ctx.memory_items` and 2. **`context.lua` — memory injection.** Add `ctx.memory_items` and
the `[background]` block composer in `to_messages()`. Cap by the `[background]` block composer in `to_messages()`. Cap by