diff --git a/docs/PHASE5.md b/docs/PHASE5.md index fae4f8c..ef2a93e 100644 --- a/docs/PHASE5.md +++ b/docs/PHASE5.md @@ -2,9 +2,43 @@ **Project:** aish — AI-augmented conversational shell **Document:** Phase 5 Requirements, Architecture & Design Decisions -**Status:** Formulate (pre-analyze) +**Status:** Analyze (formulate complete; current tree at `4453b93` probed) **Date:** 2026-05-13 +**Analyze findings (2026-05-13):** + +A1. **router.lua surface clean** — already a pure-Lua module with + `M.classify(line, config) -> (kind, payload)`. Adding + `M.classify_model(text, cfg) -> name | nil` is a natural sibling. + No structural refactor. + +A2. **broker error message shapes** all carry transport-stage prefixes + that the fallback matcher must account for. The actual shapes + callers see: + "transport: HTTP %d%d%d: " (from post_sse status>=400) + "transport: Timeout was reached" + "transport: Couldn't resolve host" + "transport: Connection refused" + "transport: HTTP response code said error" (rare; from FAILONERROR) + "api: " (SSE-framed error envelope) + "broker: model_cfg.endpoint and .model required" (config bug) + Fallback patterns in §5 should match against the "transport: " + prefix explicitly. "api: ..." errors don't fall back (they're + semantic — bad request shape, not server failure). "broker: ..." + errors don't fall back either (config bug). + +A3. **Q38 resolved at analyze** — placing the rolling summary as + `turns[1]` with `role:"system"` would produce system/system + back-to-back in to_messages output (msg[1] is the composed + system prompt; msg[2] would be the summary as another system + message). Strict templates may reject this same way they reject + user/user (PHASE0 §6). Resolution: render the summary INSIDE the + composed system message (same pattern as the [background] and + NORRIS blocks). Storage stays simple — keep `_summary` text on + `ctx.summary` (NOT in `ctx.turns`), append to the system prompt + in `to_messages` alongside the [background] and NORRIS blocks. + §6 + §3 reflect. + PHASE0 is the locked substrate; PHASE1-4 are layered on top. This manifest specifies what Phase 5 adds — **multi-model routing**, **cloud fallback**, and **context summarization on eviction**. @@ -77,7 +111,7 @@ Three pillars per PHASE0.md §11 row 5: | File | State after Phase 4 | Phase 5 changes | |---|---|---| | `router.lua` | `classify(line, config)` → `(kind, payload)` for shell/AI/meta dispatch | Add `M.classify_model(text, cfg) -> name | nil`. Heuristics: line length > N, presence of code-fence backticks, keywords like "traceback", "stacktrace", "explain", "why does", etc. Returns the model NAME (string) or nil = keep current. | -| `context.lua` | turns + memory_items + Norris suffix | Extend `enforce_budget()` to invoke a callback (passed in via `Context.new(opts.summarize_fn)`) when about to evict. If callback returns text, prepend a `{role="system", _summary=true, content=...}` turn (replacing prior `_summary` turn if present). If callback returns nil, fall back to silent eviction. The summarize callback itself lives in repl.lua (because it needs broker.chat). | +| `context.lua` | turns + memory_items + Norris suffix | Extend `enforce_budget()` to invoke a callback (passed via `Context.new(opts.summarize_fn)`) when about to evict. Store the returned summary as `ctx.summary` (string) — NOT a turn (A3 — avoids system/system alternation). `to_messages` composes it into the system message alongside `[background]` and NORRIS, between them: `system → [background] → [earlier summary] → NORRIS`. New evictions append to `ctx.summary`; when its length exceeds `max_summary_chars` (default 2000), the callback is invoked AGAIN with `(prior_summary, new_evicted_turns)` to re-summarize. Silent eviction is the fallback when the callback returns nil. | | `repl.lua` | tool-sub-loop + meta + memory injection | (a) Pre-broker hook: if `cfg.routing.auto`, call `router.classify_model(text, cfg)` and switch `active_cfg` for THIS request only (revert after). (b) Post-broker error hook: if err matches a fallback pattern AND `cfg.routing.cloud_fallback`, retry against the fallback model once. (c) Wire `Context.new` with a `summarize_fn = function(turns) ... end` closure that calls `broker.chat(cfg.models[cfg.context.summarizer_model], ..., {max_tokens=300})`. | | `broker.lua` | streaming + opts.tools/max_tokens/timeout_ms | Unchanged — Phase 5 composes on top of the existing surface. | | `config.lua` | example with mcp/safety/memory blocks | Add commented-out `routing = {...}` and `context.summarize_on_evict = true` example. | @@ -150,17 +184,23 @@ ONLY when `cfg.routing.cloud_fallback == true`. Otherwise returns false. ### Fallback-eligible error patterns -| Pattern | Meaning | +All patterns match against the err string AS IT ARRIVES from broker.lua, +which is prefixed `"transport: "` for libcurl/HTTP issues (A2 confirmed). +The matcher strips the prefix before testing. + +| Pattern (after prefix strip) | Meaning | |---|---| -| `HTTP 5%d%d` | server-side error (502 Bad Gateway, 503 Unavailable, 504 Timeout) | -| `HTTP 404.*model_not_found` | the routed model isn't loaded on the local backend | +| `^HTTP 5%d%d` | server-side error (502 Bad Gateway, 503 Unavailable, 504 Timeout) | +| `^HTTP 404.*model_not_found` | the routed model isn't loaded on the local backend | | `Couldn'?t resolve host` | DNS / unreachable local broker | | `Connection refused` | broker not listening | | `Timeout was reached` | broker too slow | -Errors NOT matched (and therefore NOT retried): +Errors NOT matched (NOT retried): - HTTP 401 / 403 (auth failure — won't get better on cloud) - HTTP 400 (bad request — schema issue) +- `^api:` errors (semantic — bad request shape) +- `^broker:` errors (config bug — endpoint/model missing) - Lua-level errors (broker pipeline bug, not transport) --- @@ -172,28 +212,44 @@ nil` closure. When set AND `enforce_budget` would evict, the callback is invoked with the evicted slice; the returned summary (if non-nil) replaces the rolling summary turn. -### Storage shape +### Storage shape (post-A3 resolution) -Index 1 of `ctx.turns` is reserved for the summary turn when present: +The rolling summary lives on `ctx.summary` (a string), NOT in `ctx.turns`: ```lua -{ role = "system", _summary = true, - content = "[earlier conversation summary]\n" } +ctx.summary = "Earlier conversation: user discussed X, asked about Y, " + .. "agreed to Z. Later asked..." ``` -`to_messages()` renders this normally (system role; model sees it). +`to_messages()` composes it into the system message between `[background]` +and the NORRIS suffix: + +``` +DEFAULT_SYSTEM_PROMPT + +[background] (memory items) +- (fact) ... + +[earlier conversation summary] + + +[NORRIS MODE] (if active) +... +``` + +No new role:"system" message at turns[1] — avoids system/system alternation. ### Summary update flow 1. enforce_budget identifies the oldest 2 turns to evict (user + assistant). -2. If `summarize_fn` is set, call it with those 2 turns. +2. If `summarize_fn` is set, call it with `(prior_summary, evicted_turns)`. 3. If summary text returned: - - If `ctx.turns[1]._summary` exists, append to its content - (truncate at `max_summary_chars` default 2000 — re-summarize via - same callback if exceeded). - - Else insert a new `_summary` turn at index 1. + - Replace `ctx.summary` with the new text. + - If `#ctx.summary > max_summary_chars` (default 2000), invoke the + callback once more with `(ctx.summary, {})` to re-summarize for + compactness. Lossy by design — Q40 documents this trade-off. 4. Remove the evicted turns from `ctx.turns`. -5. If callback returned nil → silent eviction (Phase 0 behavior). +5. If callback returned nil → silent eviction; `ctx.summary` unchanged. ### Failure handling @@ -261,7 +317,7 @@ Specifically out of Phase 5: | # | Question | Impact | Resolve by | |---|---|---|---| | Q37 | Should routing apply to `:ask ` (explicit AI route) the same way it does to bare prompts? Yes seems obvious but worth documenting. | repl.lua | Phase 5 (plan) | -| Q38 | Summary turn placement: index 1 (right after a putative system prompt — but to_messages prepends the system prompt fresh each call) vs index 0 in self.turns. Index 1 in self.turns means the system message comes from `to_messages()` and the `_summary` turn lives next as a `role:"system"` message. Strict templates may reject system/system back-to-back. | context.lua | Phase 5 (analyze) | +| Q38 | ~~Summary turn placement: index 1 vs index 0~~ | context.lua | **Resolved at analyze (A3)**: NEITHER — summary lives on `ctx.summary` (string) and composes into the SYSTEM MESSAGE alongside [background] and NORRIS suffix. No new role:"system" message; no alternation risk. | | Q39 | When fallback fires AND the user is in Norris mode, does the fallback model also drive the planning loop? Or single-request-only fallback? v1 says single-request only (Norris stays on its configured model). | repl.lua + safety.lua | Phase 5 (plan) | | Q40 | Summarizer recursion: the summary itself might be summarized later when it grows past max_summary_chars. Does the re-summarize lose fidelity? Probably yes; acceptable trade-off. Note the lossy-by-design contract in §6. | context.lua | Phase 5 (verify) | | Q41 | Eligibility patterns for fallback: should HTTP 408 (Request Timeout, distinct from Timeout was reached) be matched? Some servers emit it. Default: yes — pattern `HTTP 408`. Phase 7 verify will adjust. | repl.lua | Phase 7 (verify) |