test-case: :memory summarize against a compliant model #42

Closed
opened 2026-05-13 08:21:39 +00:00 by claude-noether · 0 comments
Collaborator

Steps

  1. Boot aish with :model deep or :model cloud (the fast 1.5B model produces noise — verified during commit #4 testing).
  2. Have a short conversation where you state at least one fact, preference, or context worth remembering. Example:
    :ask I'm working on a LuaJIT REPL called aish; the codebase lives in ~/src/aish.
    :ask I prefer terse responses, no end-of-turn summaries.
    :ask The deep model preset maps to qwen3-30b on RK3588 currently.
  3. :memory summarize
  4. For each [memory] candidate (kind): <text> prompt:
    • Type y to accept the candidate
    • Type N (or Enter) to reject
    • Type edit, then provide a rewritten text
  5. :memory list after the summarize completes — verify accepted candidates are present.
  6. Run :memory summarize AGAIN immediately. Expected: model sees only the original conversation (filtered to exclude the prior summarize's assistant turn) and emits roughly the same candidates.

Expected

  • Step 3: status summarizing via <model> ..., then candidate prompts one at a time.
  • Step 4 edits work: the prompt asks for replacement text, accepts it, persists.
  • Step 5: :memory list shows the accepted/edited items with auto-assigned ids.
  • Step 6: no self-amplification — the second summarize doesn't re-propose-and-treat-as-new the items from the first run.

What this exercises

  • broker.chat with max_tokens=1024, timeout_ms=90000
  • Response parsing (tolerates markdown bullets per N1 in re-review)
  • Per-candidate y/N/edit branch
  • meta="summarize" tagging on the assistant turn so subsequent summarize calls filter it
  • R-C2 resolution: source-of-truth is session log file, not ctx:to_messages()

Likely failure modes

  • No candidates parsed → check the response shape. Model may not follow the kind: text format reliably with the fast preset.
  • Edit branch crashes → readline prompt for replacement text
  • Second summarize re-proposes the SAME items as if they were fresh → R-C2 filter not applied; check the t.meta ~= "summarize" skip in the summarize handler.
## Steps 1. Boot aish with `:model deep` or `:model cloud` (the fast 1.5B model produces noise — verified during commit #4 testing). 2. Have a short conversation where you state at least one fact, preference, or context worth remembering. Example: `:ask I'm working on a LuaJIT REPL called aish; the codebase lives in ~/src/aish.` `:ask I prefer terse responses, no end-of-turn summaries.` `:ask The deep model preset maps to qwen3-30b on RK3588 currently.` 3. `:memory summarize` 4. For each `[memory] candidate (kind): <text>` prompt: - Type `y` to accept the candidate - Type `N` (or Enter) to reject - Type `edit`, then provide a rewritten text 5. `:memory list` after the summarize completes — verify accepted candidates are present. 6. Run `:memory summarize` AGAIN immediately. Expected: model sees only the original conversation (filtered to exclude the prior summarize's assistant turn) and emits roughly the same candidates. ## Expected - Step 3: status `summarizing via <model> ...`, then candidate prompts one at a time. - Step 4 edits work: the prompt asks for replacement text, accepts it, persists. - Step 5: `:memory list` shows the accepted/edited items with auto-assigned ids. - Step 6: no self-amplification — the second summarize doesn't re-propose-and-treat-as-new the items from the first run. ## What this exercises - broker.chat with max_tokens=1024, timeout_ms=90000 - Response parsing (tolerates markdown bullets per N1 in re-review) - Per-candidate y/N/edit branch - meta="summarize" tagging on the assistant turn so subsequent summarize calls filter it - R-C2 resolution: source-of-truth is session log file, not ctx:to_messages() ## Likely failure modes - No candidates parsed → check the response shape. Model may not follow the `kind: text` format reliably with the fast preset. - Edit branch crashes → readline prompt for replacement text - Second summarize re-proposes the SAME items as if they were fresh → R-C2 filter not applied; check the `t.meta ~= "summarize"` skip in the summarize handler.
claude-noether added the test-case label 2026-05-13 08:21:39 +00:00
Sign in to join this conversation.