test-case: :memory summarize against a compliant model #42

New Issue

2026-05-13T08:21:39Z

claude-noether commented

2026-05-13 08:21:39 +00:00

Steps

Boot aish with :model deep or :model cloud (the fast 1.5B model produces noise — verified during commit #4 testing).
Have a short conversation where you state at least one fact, preference, or context worth remembering. Example:
:ask I'm working on a LuaJIT REPL called aish; the codebase lives in ~/src/aish.
:ask I prefer terse responses, no end-of-turn summaries.
:ask The deep model preset maps to qwen3-30b on RK3588 currently.
:memory summarize
For each [memory] candidate (kind): <text> prompt:
- Type y to accept the candidate
- Type N (or Enter) to reject
- Type edit, then provide a rewritten text
:memory list after the summarize completes — verify accepted candidates are present.
Run :memory summarize AGAIN immediately. Expected: model sees only the original conversation (filtered to exclude the prior summarize's assistant turn) and emits roughly the same candidates.

Expected

Step 3: status summarizing via <model> ..., then candidate prompts one at a time.
Step 4 edits work: the prompt asks for replacement text, accepts it, persists.
Step 5: :memory list shows the accepted/edited items with auto-assigned ids.
Step 6: no self-amplification — the second summarize doesn't re-propose-and-treat-as-new the items from the first run.

What this exercises

broker.chat with max_tokens=1024, timeout_ms=90000
Response parsing (tolerates markdown bullets per N1 in re-review)
Per-candidate y/N/edit branch
meta="summarize" tagging on the assistant turn so subsequent summarize calls filter it
R-C2 resolution: source-of-truth is session log file, not ctx:to_messages()

Likely failure modes

No candidates parsed → check the response shape. Model may not follow the kind: text format reliably with the fast preset.
Edit branch crashes → readline prompt for replacement text
Second summarize re-proposes the SAME items as if they were fresh → R-C2 filter not applied; check the t.meta ~= "summarize" skip in the summarize handler.

## Steps 1. Boot aish with `:model deep` or `:model cloud` (the fast 1.5B model produces noise — verified during commit #4 testing). 2. Have a short conversation where you state at least one fact, preference, or context worth remembering. Example: `:ask I'm working on a LuaJIT REPL called aish; the codebase lives in ~/src/aish.` `:ask I prefer terse responses, no end-of-turn summaries.` `:ask The deep model preset maps to qwen3-30b on RK3588 currently.` 3. `:memory summarize` 4. For each `[memory] candidate (kind): <text>` prompt: - Type `y` to accept the candidate - Type `N` (or Enter) to reject - Type `edit`, then provide a rewritten text 5. `:memory list` after the summarize completes — verify accepted candidates are present. 6. Run `:memory summarize` AGAIN immediately. Expected: model sees only the original conversation (filtered to exclude the prior summarize's assistant turn) and emits roughly the same candidates. ## Expected - Step 3: status `summarizing via <model> ...`, then candidate prompts one at a time. - Step 4 edits work: the prompt asks for replacement text, accepts it, persists. - Step 5: `:memory list` shows the accepted/edited items with auto-assigned ids. - Step 6: no self-amplification — the second summarize doesn't re-propose-and-treat-as-new the items from the first run. ## What this exercises - broker.chat with max_tokens=1024, timeout_ms=90000 - Response parsing (tolerates markdown bullets per N1 in re-review) - Per-candidate y/N/edit branch - meta="summarize" tagging on the assistant turn so subsequent summarize calls filter it - R-C2 resolution: source-of-truth is session log file, not ctx:to_messages() ## Likely failure modes - No candidates parsed → check the response shape. Model may not follow the `kind: text` format reliably with the fast preset. - Edit branch crashes → readline prompt for replacement text - Second summarize re-proposes the SAME items as if they were fresh → R-C2 filter not applied; check the `t.meta ~= "summarize"` skip in the summarize handler.

claude-noether added the test-case label 2026-05-13 08:21:39 +00:00

claude-noether closed this issue

2026-05-13 12:26:56 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: marfrit/aish#42