Files
aish/docs/PHASE4.md
T
marfrit 2146b909f8 docs/PHASE4: analyze — surface confirmed, counter strategy locked
A1. history.lua surface lines up cleanly for the memory additions —
    no structural refactor; pure additive functions mirroring the
    session pattern.

A2. Counter persistence: scan at open, cache next_id in handle.
    O(n) load (n bounded by curation, ~hundreds), no sidecar file.
    Persisted ids let forget-tombstones target items even across
    restarts.

A3. System-prompt suffix order locked: DEFAULT (carrying Phase 2 MCP
    block baked in) → Phase 4 [background] → Phase 3 NORRIS. Token
    cost measured: default ~174 toks, +NORRIS ~364 toks, +NORRIS+2KB
    background ~865 toks. Well within typical context budgets.

No manifest amendments needed — §3/§5 already match. Findings recorded
inline as Phase 7 anchors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 04:47:01 +00:00

19 KiB

aish — Phase 4 Manifest

Project: aish — AI-augmented conversational shell Document: Phase 4 Requirements, Architecture & Design Decisions Status: Analyze (formulate complete; current tree at bea7175 probed) Date: 2026-05-13

Analyze findings (2026-05-13):

A1. history.lua surface is cleanM.open/Session:append/ Session:close/M.load/M.list_sessions. The memory functions can mirror this exactly: M.open_memory/memory:add/ memory:forget/memory:close/M.load_memory. No structural refactor needed; pure additions.

A2. Counter persistence — scan at open, cache in handle. Phase 1's session log writes a {"meta":{...}} header on first creation but doesn't track entry-id (turns aren't numbered). For memory, the monotonic id is needed for forget-targeting. Cheapest correct approach: on M.open_memory, read all lines once, find the max id field present (skipping the meta header if any), cache as handle.next_id. Subsequent add calls increment in-memory and persist on the next append. O(n) at open is acceptable since n is bounded by user curation (~hundreds, not millions). No sidecar.

A3. System-prompt suffix order, post-analyze: actual current composition is DEFAULT_SYSTEM_PROMPT (which has Phase 2 MCP guidance already baked-in as a static block) → optional NORRIS dynamic suffix. The Phase 2 MCP block is NOT computed dynamically — it's part of DEFAULT_SYSTEM_PROMPT. So Phase 4's [background] block lives between DEFAULT and NORRIS. Token cost measured: - DEFAULT: 697 chars (~174 tokens) - DEFAULT + NORRIS: 1458 chars (~364 tokens) - DEFAULT + 2KB background + NORRIS: ~3460 chars (~865 tokens) Within typical 4-8K context budgets.

These findings don't require manifest changes — the §3 module-changes table and §5 injection mechanism already match. Recording the measurements here so verify (Phase 7) has anchors.

PHASE0 is the locked substrate; PHASE1, PHASE2, PHASE3 are layered on top. This manifest specifies what Phase 4 adds — cross-session memory — and the user-facing surface for managing it.


1. Scope of Phase 4

Three pillars per PHASE0.md §11 row 4:

  1. memory.jsonl persistent store — a single append-only file (<config.history.dir>/memory.jsonl) carrying user-curated facts, preferences, and project context that survive aish restarts. Same storage convention as session logs but a separate file because the read pattern (load at startup) and write pattern (curated only) differ from session logs (append-every-turn).

  2. Startup context injection — at REPL boot, recent memory items are loaded into the live Context so the model sees them on the very first turn. Injection is bounded (token budget) and visible to the user via :memory list.

  3. :memory management surface + automatic candidate extraction — meta commands for add, list, forget, clear, plus an opt-in summarizer that runs at session end (or on demand) extracting candidate facts from the session log for the user to triage into memory.

Phase 4 is done when:

  • :remember <text> (alias for :memory add <text>) writes a line to memory.jsonl and the next REPL boot sees it in context.
  • :memory list shows current memory items with their IDs and ages.
  • :memory forget <id> removes one item; :memory clear removes all (with confirm).
  • At startup, the top-N most recent memory items are prepended to the Context as a single "background:" block (configurable cap).
  • :memory summarize runs the active model over the current session log and proposes candidate memory items; the user accepts/rejects per-candidate via prompt.
  • Existing configs without a memory section behave exactly like Phase 3 (no startup injection, no auto-summarize).

2. Technology Decisions (delta from Phase 3)

Decision Choice Rationale
Storage format Append-only JSONL, one item per line Same convention as Phase 1's session logs. Greppable, robust to truncation, no parser dependency beyond vendored dkjson.
Storage location <config.history.dir>/memory.jsonl (sibling to sessions/) Co-located with session logs; users can back up one directory. Defaults to ~/.local/share/aish/memory.jsonl.
Memory-item shape {id, ts, kind, content, tags?, source?} id is monotonic int (counter persisted in memory.id); kind ∈ {"fact","pref","context"} lightly typed for future routing; content is the body text; optional tags array; optional source carrying session-id provenance when auto-extracted.
Forget semantics Append a tombstone, don't rewrite the file ({id, ts, kind:"forget", target:<other_id>}) Append-only preserves history. M.load_memory resolves tombstones during read — silently drops any item whose id appears as a forget-target. :memory clear writes one tombstone per active item; could also support a wildcard forget.
Auto-summarize cadence Manual only in v1 (:memory summarize). Auto-trigger on :quit or by token count is Q-list material. Conservative; users opt in. Avoids burning tokens on every session end. Manual surface lets the user QA candidates before they land.
Summarizer model The fast preset by default (cheap; quality good-enough for extraction); configurable via cfg.memory.summarizer_model Summarization is recall over precision — fast model's tendency to err on the side of inclusion is fine because the user filters per-candidate.
Startup injection mechanism A new dynamic block on the system prompt, appended by context.to_messages() when ctx.memory_items is non-empty Same hybrid-prompt pattern as Phase 2's MCP block and Phase 3's NORRIS suffix. No new context structure beyond a list on the Context.
Injection budget cfg.memory.inject_max_chars (default 2000 chars total — roughly 500 tokens) Cap so memory doesn't eat the whole context. LRU-by-ts selection if items exceed budget.
Pruning policy Manual :memory forget + optional cfg.memory.prune_older_than_days (default unset — no auto-pruning) Conservative defaults; user owns the lifecycle.
Interaction with sessions memory.jsonl is independent of sessions/*.jsonl. Session JSONL stays the per-conversation log; memory is the curated cross-session knowledge Distinct concerns. Session log answers "what did we talk about last Tuesday?"; memory answers "what does aish know about me/this-project?".
Concurrency Single-writer assumed (one aish process per memory dir). Reader is the same process Same assumption as session logs. Multi-process memory sharing is out of scope.

3. Module Changes

File State after Phase 3 Phase 4 changes
history.lua M.open(path, meta), session:append(turn), M.load(path), M.list_sessions(dir) Add memory functions alongside session functions: M.open_memory(path) -> memory_handle; memory:add(kind, content, tags?, source?) -> id; memory:forget(id); M.load_memory(path) -> items_table (resolves tombstones). memory_handle is similar shape to session_handle — internal fd + monotonic counter.
context.lua system prompt + MCP block + NORRIS suffix toggle Add a memory_items field on Context. to_messages() composes a dynamic "[background]" block on the system prompt when memory_items is non-empty AND not already in Norris mode (don't double-pile). Cap respected via the inject_max_chars budget.
repl.lua meta cmds + tool sub-loop + Norris driver New meta: :remember <text> (shortcut for :memory add fact <text>); :memory add <kind> <text>; :memory list; :memory forget <id>; :memory clear; :memory summarize. At startup, after loading config + opening session, also open memory handle and inject the top-N items into ctx.memory_items.
broker.lua streaming chat + opts.tools/max_tokens/timeout_ms No structural changes. Used by the summarizer (calls broker.chat with the session log as a single user turn).
config.lua example with mcp + safety blocks Add commented-out memory = { ... } example. Default behavior is "no memory injection, no auto-summarize".
executor.lua unchanged unchanged
safety.lua is_destructive + norris_step unchanged

No new module files. All Phase 4 functionality grows existing files — mostly history.lua and repl.lua.


4. memory.jsonl Format

{"id":1,"ts":"2026-05-13T19:01:01Z","kind":"fact","content":"User prefers terse responses; no end-of-turn summaries."}
{"id":2,"ts":"2026-05-13T19:01:35Z","kind":"pref","content":"Default to :model deep for code reasoning tasks."}
{"id":3,"ts":"2026-05-13T19:02:00Z","kind":"context","content":"Current project: aish (LuaJIT REPL with MCP tools).","tags":["aish","luajit"]}
{"id":4,"ts":"2026-05-13T20:00:00Z","kind":"forget","target":2}

After load_memory, item id=2 is dropped because of the tombstone. Active items: 1, 3.

kind values

  • fact — factual statement about the user, their environment, or project state.
  • pref — user preference for aish behavior (response style, default model, etc.).
  • context — project / domain context that helps the model orient on common tasks.
  • forget — tombstone; refers to another id via target.

v1 is lightly typed — the model sees all kinds identically as a flat list in the [background] block. Future phases may route them differently (e.g. pref into a system-prompt section, context into a user-style preamble). Today they're prose.


5. Startup Injection

When aish boots and cfg.memory is present (or memory.jsonl exists):

  1. history.load_memory(path) reads all items, applies tombstone resolution, returns active items sorted by ts descending (most recent first).
  2. Take items until cfg.memory.inject_max_chars (default 2000) is consumed. Older items are dropped from injection (still in the file).
  3. Store on ctx.memory_items as an array of {kind, content} (id and ts not needed at render-time).

context.to_messages() composition:

<DEFAULT_SYSTEM_PROMPT>
<Phase 2 MCP block>

[background] (memory loaded at startup; managed via :memory)
- (fact) User prefers terse responses; no end-of-turn summaries.
- (context) Current project: aish (LuaJIT REPL with MCP tools).

Order of suffixes on the system prompt:

  1. Default Phase 0 prompt
  2. Phase 2 MCP guidance block (always present)
  3. Phase 4 [background] block (when memory_items non-empty)
  4. Phase 3 NORRIS MODE block (when norris_active)

Norris is last so its instructions take precedence when active.


6. :memory summarize (Manual Auto-Extraction)

:memory summarize triggers the active model (or cfg.memory.summarizer_model if set) to read the current session's turns and propose candidate memory items.

Flow

  1. Build a prompt: "Read the following conversation transcript. Extract facts, preferences, or context worth remembering across future sessions. Output ONE candidate per line, prefixed with the kind: fact: …, pref: …, or context: …. Maximum 10 candidates."

  2. Send ctx:to_messages() minus the [background] suffix (avoid feedback) + the user prompt above.

  3. Parse the response line-by-line for (fact|pref|context): <content> shapes.

  4. For each candidate, prompt the user:

    [memory] candidate (fact): User prefers terse responses; no end-of-turn summaries.
    keep? [y/N/edit]
    
    • y → write to memory.jsonl.
    • N (or empty) → drop.
    • edit → readline-edit the content before write.
  5. Status when done: [aish] memory: added N candidates.

Why manual not automatic in v1

A successful auto-summarize that runs at every :quit would either:

  • be expensive (tokens on every exit)
  • drift over time if the model picks up noise
  • compete with the user's intentional :remember <text> curation

Manual gives the user the trigger. Q-list tracks auto-cadence options.


7. Meta Commands (Phase 4 additions)

Command Action
:remember <text> Shortcut for :memory add fact <text>
:memory add <kind> <text> Append a memory item (kind ∈ fact, pref, context)
:memory list Show all active memory items (id + ts + kind + content)
:memory forget <id> Append a tombstone for <id>
:memory clear Forget all active items (with [y/N] confirm)
:memory summarize Extract candidate items from current session via LLM
:memory inject Re-inject current memory.jsonl items into Context (after edits)

:help updated.


8. Configuration Schema (Phase 4 example block)

memory = {
    -- Path defaults to <history.dir>/memory.jsonl. Override per fleet
    -- if you want shared memory (read-only is safer than write-shared).
    -- path = (history.dir or "~/.local/share/aish") .. "/memory.jsonl",

    -- Cap on how much memory content is injected into the system prompt
    -- at startup. Roughly 2000 chars ≈ 500 tokens. Older items are
    -- dropped from injection if exceeded; they remain in the file.
    inject_max_chars = 2000,

    -- Which model to use for :memory summarize. Defaults to the active
    -- model when nil. Use "fast" for speed; "deep" for better quality.
    summarizer_model = "fast",

    -- Auto-prune items older than N days at startup. nil = never auto-prune.
    -- Manual :memory forget always works regardless.
    -- prune_older_than_days = 90,
}

9. Migration from Phase 3

User-visible:

  • :remember, :memory list / forget / clear / summarize are new meta commands.
  • A [background] block in the system prompt appears when memory items exist.
  • Existing configs without memory = {...} continue to work — no injection, no auto-summarize. Phase 3 behavior intact.

Substrate (PHASE0.md §3) invariants: unchanged.

The [background] system-prompt suffix is composed dynamically by context.to_messages() (same pattern as Phase 2 MCP block and Phase 3 NORRIS suffix). No new substrate contract.


10. Out of Scope (Phase 4)

Per PHASE0.md §11 these belong to later phases:

  • Multi-model routing / cloud fallback (Phase 5).
  • Tree-sitter syntax highlighting (Phase 6).

Specifically out of Phase 4 scope despite proximity:

  • Multi-process memory sharing (single-writer assumed v1).
  • Retrieval-augmented injection (RAG over memory.jsonl) — v1 just LRU.
  • Auto-trigger of :memory summarize at :quit (Q-list).
  • Memory categories beyond fact/pref/context — minimal typing v1.
  • Cross-aish-instance memory sync (memory.jsonl in a synced dir works coincidentally; not designed for it).
  • Encryption at rest — same posture as session logs (none in v1).

11. Open Questions

# Question Impact Resolve by
Q31 Auto-summarize trigger: manual only (current), automatic at :quit, automatic on token-budget eviction, or config-flagged threshold? history.lua + repl.lua Phase 4 (analyze)
Q32 Editing memory items in place: :memory edit <id> to rewrite content? Append-only means edit = new id + forget old. Worth the extra meta? history.lua + UX Phase 4 (analyze)
Q33 Memory injection while in Norris mode: does the [background] block stay, get suppressed, or merge with the Norris goal? Proposal: keep both; Norris is the last block and dominates. context.lua Phase 4 (plan)
Q34 Memory kinds: stick with fact/pref/context or split prefs into a dedicated section of the system prompt (where they're more impactful)? v1 says no — flat list. context.lua + UX Phase 5 if it bites
Q35 Privacy / redaction: :memory summarize could capture sensitive tokens from a chat (passwords, paths). Should it auto-redact? Strip command-history-style? safety.lua + memory.lua Phase 4 (verify) — review user-emergent risk
Q36 Memory deduplication: user adds the same fact twice. Detect and warn, dedupe silently, or allow? v1: allow (cheap; user can :memory list to spot). history.lua Phase 4 (verify)

12. Implementation Plan (commit-by-commit)

Bottom-up, same cadence as Phase 0/1/2/3. Five commits expected:

  1. history.lua — memory store. Add M.open_memory, memory:add(kind, content, tags?, source?), memory:forget(id), M.load_memory(path) with tombstone resolution. Persistent monotonic counter via a sidecar memory.id file (or scan the JSONL for max id at open time — pick at analyze). Test in isolation: round-trip add/forget/load against a temp file.

  2. context.lua — memory injection. Add ctx.memory_items and the [background] block composer in to_messages(). Cap by inject_max_chars. Test in isolation: assert composition order (MCP → background → Norris); cap honored.

  3. repl.lua:remember + :memory list / add / forget / clear / inject. At startup, after MCP setup, open the memory handle + LRU-load items. Hook the meta dispatch. No summarize yet. End-to-end: run aish, :remember X, :quit, restart, :memory list shows X, :history shows X in [background].

  4. :memory summarize — manual extraction. Bundle a system-prompt for the summarizer model; parse response; per-candidate confirm prompt; append accepted items. End-to-end: short conversation, summarize, accept one of two candidates, restart, verify accepted one persists.

  5. config.lua — example memory block. Documentation-only; commented-out example. Final commit.

Risk / non-obvious

  • Counter persistence: memory:add needs a monotonic id. Options: (a) sidecar memory.id file with a single integer, (b) scan the JSONL on open for max id, (c) use timestamp as id (no monotonic guarantee across rapid adds). Plan: (b) — scan once at open; cache in the handle. Wraps if integer overflow but at 2^53 entries we're fine.
  • Tombstone resolution at load: build a set of forget-target ids from kind=="forget" entries; filter active items to exclude. Order doesn't matter (tombstones can appear before their targets if the file is hand-edited; the resolution is set-based).
  • Empty file at open vs nonexistent file: both should yield an empty memory handle. Phase 1's history.open already handles file creation; extend the pattern.
  • System prompt growth: the suffix-stacking pattern is up to 4 blocks now (default + MCP + background + Norris). Token cost ~200
    • ~80 + 2000 + ~250 = ~2530 chars baseline before any user/asst turns. Worth measuring at baseline phase.
  • :memory summarize parse robustness: small models may emit "fact: ..." sometimes with markdown bullets, sometimes without. Parser should tolerate ^[-*]?\s*(fact|pref|context):\s*(.+).
  • :memory clear with confirm: same UX as Phase 3 destructive prompts. [y/N] default-no.

Open at plan; resolve at review

  • Whether :remember should append to the LIVE ctx.memory_items immediately (so the model sees it on the next turn without restart) or only on next session boot. v1 says yes — append both to file AND to live ctx for immediate visibility.
  • Whether the summarizer should be fed the FULL session log or just recent turns (token budget). v1 says full minus the [background] suffix; cap at session-log size <= 64KB or last N turns.

End of Phase 4 Manifest — aish