docs/PHASE5: analyze — surface clean, summary lives on ctx.summary not turns

A1. router.lua surface clean; classify_model is a natural sibling of
    classify. No structural refactor.

A2. broker error message shapes confirmed: all transport errors carry
    "transport: " prefix; "api: " for SSE-framed semantic errors;
    "broker: " for config bugs. Fallback matcher must strip the prefix
    before testing — list of eligible patterns tightened in §5.

A3. Q38 RESOLVED — summary doesn't go in ctx.turns (would create
    system/system back-to-back, same gotcha as PHASE0 §6 user/user).
    Instead lives on ctx.summary (string) and composes into the
    system message between [background] and NORRIS suffix. No new
    role:"system" turn; no alternation risk. §3 + §6 reflect.

Module-changes table updated to specify ctx.summary string field +
the to_messages composition order. Storage shape diagram in §6
rewritten.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-13 11:12:50 +00:00
parent 4453b93ab5
commit 555fdd7717
+74 -18
View File
@@ -2,9 +2,43 @@
**Project:** aish — AI-augmented conversational shell
**Document:** Phase 5 Requirements, Architecture & Design Decisions
**Status:** Formulate (pre-analyze)
**Status:** Analyze (formulate complete; current tree at `4453b93` probed)
**Date:** 2026-05-13
**Analyze findings (2026-05-13):**
A1. **router.lua surface clean** — already a pure-Lua module with
`M.classify(line, config) -> (kind, payload)`. Adding
`M.classify_model(text, cfg) -> name | nil` is a natural sibling.
No structural refactor.
A2. **broker error message shapes** all carry transport-stage prefixes
that the fallback matcher must account for. The actual shapes
callers see:
"transport: HTTP %d%d%d: <body-snippet>" (from post_sse status>=400)
"transport: Timeout was reached"
"transport: Couldn't resolve host"
"transport: Connection refused"
"transport: HTTP response code said error" (rare; from FAILONERROR)
"api: <error.message>" (SSE-framed error envelope)
"broker: model_cfg.endpoint and .model required" (config bug)
Fallback patterns in §5 should match against the "transport: "
prefix explicitly. "api: ..." errors don't fall back (they're
semantic — bad request shape, not server failure). "broker: ..."
errors don't fall back either (config bug).
A3. **Q38 resolved at analyze** — placing the rolling summary as
`turns[1]` with `role:"system"` would produce system/system
back-to-back in to_messages output (msg[1] is the composed
system prompt; msg[2] would be the summary as another system
message). Strict templates may reject this same way they reject
user/user (PHASE0 §6). Resolution: render the summary INSIDE the
composed system message (same pattern as the [background] and
NORRIS blocks). Storage stays simple — keep `_summary` text on
`ctx.summary` (NOT in `ctx.turns`), append to the system prompt
in `to_messages` alongside the [background] and NORRIS blocks.
§6 + §3 reflect.
PHASE0 is the locked substrate; PHASE1-4 are layered on top. This manifest
specifies what Phase 5 adds — **multi-model routing**, **cloud fallback**,
and **context summarization on eviction**.
@@ -77,7 +111,7 @@ Three pillars per PHASE0.md §11 row 5:
| File | State after Phase 4 | Phase 5 changes |
|---|---|---|
| `router.lua` | `classify(line, config)``(kind, payload)` for shell/AI/meta dispatch | Add `M.classify_model(text, cfg) -> name | nil`. Heuristics: line length > N, presence of code-fence backticks, keywords like "traceback", "stacktrace", "explain", "why does", etc. Returns the model NAME (string) or nil = keep current. |
| `context.lua` | turns + memory_items + Norris suffix | Extend `enforce_budget()` to invoke a callback (passed in via `Context.new(opts.summarize_fn)`) when about to evict. If callback returns text, prepend a `{role="system", _summary=true, content=...}` turn (replacing prior `_summary` turn if present). If callback returns nil, fall back to silent eviction. The summarize callback itself lives in repl.lua (because it needs broker.chat). |
| `context.lua` | turns + memory_items + Norris suffix | Extend `enforce_budget()` to invoke a callback (passed via `Context.new(opts.summarize_fn)`) when about to evict. Store the returned summary as `ctx.summary` (string) — NOT a turn (A3 — avoids system/system alternation). `to_messages` composes it into the system message alongside `[background]` and NORRIS, between them: `system → [background] → [earlier summary] → NORRIS`. New evictions append to `ctx.summary`; when its length exceeds `max_summary_chars` (default 2000), the callback is invoked AGAIN with `(prior_summary, new_evicted_turns)` to re-summarize. Silent eviction is the fallback when the callback returns nil. |
| `repl.lua` | tool-sub-loop + meta + memory injection | (a) Pre-broker hook: if `cfg.routing.auto`, call `router.classify_model(text, cfg)` and switch `active_cfg` for THIS request only (revert after). (b) Post-broker error hook: if err matches a fallback pattern AND `cfg.routing.cloud_fallback`, retry against the fallback model once. (c) Wire `Context.new` with a `summarize_fn = function(turns) ... end` closure that calls `broker.chat(cfg.models[cfg.context.summarizer_model], ..., {max_tokens=300})`. |
| `broker.lua` | streaming + opts.tools/max_tokens/timeout_ms | Unchanged — Phase 5 composes on top of the existing surface. |
| `config.lua` | example with mcp/safety/memory blocks | Add commented-out `routing = {...}` and `context.summarize_on_evict = true` example. |
@@ -150,17 +184,23 @@ ONLY when `cfg.routing.cloud_fallback == true`. Otherwise returns false.
### Fallback-eligible error patterns
| Pattern | Meaning |
All patterns match against the err string AS IT ARRIVES from broker.lua,
which is prefixed `"transport: "` for libcurl/HTTP issues (A2 confirmed).
The matcher strips the prefix before testing.
| Pattern (after prefix strip) | Meaning |
|---|---|
| `HTTP 5%d%d` | server-side error (502 Bad Gateway, 503 Unavailable, 504 Timeout) |
| `HTTP 404.*model_not_found` | the routed model isn't loaded on the local backend |
| `^HTTP 5%d%d` | server-side error (502 Bad Gateway, 503 Unavailable, 504 Timeout) |
| `^HTTP 404.*model_not_found` | the routed model isn't loaded on the local backend |
| `Couldn'?t resolve host` | DNS / unreachable local broker |
| `Connection refused` | broker not listening |
| `Timeout was reached` | broker too slow |
Errors NOT matched (and therefore NOT retried):
Errors NOT matched (NOT retried):
- HTTP 401 / 403 (auth failure — won't get better on cloud)
- HTTP 400 (bad request — schema issue)
- `^api:` errors (semantic — bad request shape)
- `^broker:` errors (config bug — endpoint/model missing)
- Lua-level errors (broker pipeline bug, not transport)
---
@@ -172,28 +212,44 @@ nil` closure. When set AND `enforce_budget` would evict, the callback
is invoked with the evicted slice; the returned summary (if non-nil)
replaces the rolling summary turn.
### Storage shape
### Storage shape (post-A3 resolution)
Index 1 of `ctx.turns` is reserved for the summary turn when present:
The rolling summary lives on `ctx.summary` (a string), NOT in `ctx.turns`:
```lua
{ role = "system", _summary = true,
content = "[earlier conversation summary]\n<summary text>" }
ctx.summary = "Earlier conversation: user discussed X, asked about Y, "
.. "agreed to Z. Later asked..."
```
`to_messages()` renders this normally (system role; model sees it).
`to_messages()` composes it into the system message between `[background]`
and the NORRIS suffix:
```
DEFAULT_SYSTEM_PROMPT
[background] (memory items)
- (fact) ...
[earlier conversation summary]
<ctx.summary>
[NORRIS MODE] (if active)
...
```
No new role:"system" message at turns[1] — avoids system/system alternation.
### Summary update flow
1. enforce_budget identifies the oldest 2 turns to evict (user + assistant).
2. If `summarize_fn` is set, call it with those 2 turns.
2. If `summarize_fn` is set, call it with `(prior_summary, evicted_turns)`.
3. If summary text returned:
- If `ctx.turns[1]._summary` exists, append to its content
(truncate at `max_summary_chars` default 2000 — re-summarize via
same callback if exceeded).
- Else insert a new `_summary` turn at index 1.
- Replace `ctx.summary` with the new text.
- If `#ctx.summary > max_summary_chars` (default 2000), invoke the
callback once more with `(ctx.summary, {})` to re-summarize for
compactness. Lossy by design — Q40 documents this trade-off.
4. Remove the evicted turns from `ctx.turns`.
5. If callback returned nil → silent eviction (Phase 0 behavior).
5. If callback returned nil → silent eviction; `ctx.summary` unchanged.
### Failure handling
@@ -261,7 +317,7 @@ Specifically out of Phase 5:
| # | Question | Impact | Resolve by |
|---|---|---|---|
| Q37 | Should routing apply to `:ask <text>` (explicit AI route) the same way it does to bare prompts? Yes seems obvious but worth documenting. | repl.lua | Phase 5 (plan) |
| Q38 | Summary turn placement: index 1 (right after a putative system prompt — but to_messages prepends the system prompt fresh each call) vs index 0 in self.turns. Index 1 in self.turns means the system message comes from `to_messages()` and the `_summary` turn lives next as a `role:"system"` message. Strict templates may reject system/system back-to-back. | context.lua | Phase 5 (analyze) |
| Q38 | ~~Summary turn placement: index 1 vs index 0~~ | context.lua | **Resolved at analyze (A3)**: NEITHER — summary lives on `ctx.summary` (string) and composes into the SYSTEM MESSAGE alongside [background] and NORRIS suffix. No new role:"system" message; no alternation risk. |
| Q39 | When fallback fires AND the user is in Norris mode, does the fallback model also drive the planning loop? Or single-request-only fallback? v1 says single-request only (Norris stays on its configured model). | repl.lua + safety.lua | Phase 5 (plan) |
| Q40 | Summarizer recursion: the summary itself might be summarized later when it grows past max_summary_chars. Does the re-summarize lose fidelity? Probably yes; acceptable trade-off. Note the lossy-by-design contract in §6. | context.lua | Phase 5 (verify) |
| Q41 | Eligibility patterns for fallback: should HTTP 408 (Request Timeout, distinct from Timeout was reached) be matched? Some servers emit it. Default: yes — pattern `HTTP 408`. Phase 7 verify will adjust. | repl.lua | Phase 7 (verify) |