docs/PHASE5: review fold-in — callback signature, Norris suppression, cost defaults

Independent review found 1 BLOCKER + 5 CONCERNs + 4 NITs. Resolutions: B1 BLOCKER: summary callback signature was inconsistent across §3 and §6. Canonical now: summarize_fn(prior_summary, evicted_turns) -> string|nil dispatching on the two args: (nil, [turns]) — first-time summarize (str, [turns]) — additive (extend prior summary with new evictions) (str, nil) — compress (re-summarize the prior summary itself) C1: re-summarize trigger now uses the (str, nil) compress signal rather than degenerate (str, {}). C2: routing decision is taken once on entry to ask_ai. The chosen active_cfg is used for every tool-sub-loop iteration. Original active_cfg restored after ask_ai returns. C3: AUTO-routing does NOT fire inside the Norris loop. Model fixed at :norris launch time; planner stays on it for every iteration. Q39 resolved. Per-iteration fallback still gated by cfg.routing.fallback — retries the failing call against cloud without permanently switching the planner. C4: Summary block suppressed in Norris (mirrors Phase 4 R-C1 for the [background] block). Both are "earlier context" the planner generally doesn't need. C5: Fallback pattern coverage expanded — added HTTP 408 (Q41 resolved) and "Operation timed out" (libcurl version variant). Dropped "HTTP response code said error" from A2 — FAILONERROR was removed in Phase 4 f26cbd9. NITs folded: N1 :route check <text> always runs heuristic; suffix "(routing currently disabled)" when cfg.routing.auto = false N2 reasoning → nil by default (not → "cloud"); user explicitly opts in to map reasoning to a paid model. Same cost-safety rationale as confirm_cmd default true. N3 "Retry only when no deltas have arrived" promoted to §5 normative rule (was in §11 risk row). N4 cfg.routing.cloud_fallback renamed cfg.routing.fallback to align with the :fallback meta verb. Reviewer verdict: commit #1 (router.classify_model) is implement- ready; B1/C1 resolution required before commit #2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:15:39 +00:00
parent 555fdd7717
commit 2e389c1475
1 changed files with 60 additions and 5 deletions
@@ -2,9 +2,60 @@

 **Project:** aish — AI-augmented conversational shell
 **Document:** Phase 5 Requirements, Architecture & Design Decisions
-**Status:** Analyze (formulate complete; current tree at `4453b93` probed)
+**Status:** Plan (review fold-in 2026-05-13 — callback signature, Norris suppression, cost defaults resolved)
 **Date:** 2026-05-13

+**Review fold-in (2026-05-13):**
+
+R-B1. **Summary callback signature canonical**: the closure is
+    `summarize_fn(prior_summary, evicted_turns) -> string | nil`.
+    `prior_summary` is `nil` on the first ever summarize; otherwise
+    the current `ctx.summary` string. `evicted_turns` is `nil` for
+    the re-summarize-compress trigger (C1 resolution); otherwise the
+    array of evicted turn tables. The closure dispatches:
+      first-time:  prior=nil,    evicted=[...]   → "summarize these turns"
+      additive:    prior=str,    evicted=[...]   → "extend the prior summary"
+      compress:    prior=str,    evicted=nil     → "compress the prior summary"
+
+R-C2. **Routing taken once per ask_ai**: the model decision happens
+    on entry to `ask_ai`. The chosen `active_cfg` is used for every
+    iteration of the tool-call sub-loop. Original `active_cfg` is
+    restored after `ask_ai` returns. NOT per-broker-call.
+
+R-C3. **AUTO-routing does NOT fire inside Norris**: `run_norris`
+    operates on a fixed model (whatever the user set via `:model`
+    before launching). The auto-router would otherwise switch models
+    mid-plan, which loses planning continuity and costs tokens
+    rebuilding context. State explicit in §4 + §10.
+
+R-C4. **Summary block suppressed under Norris**: mirrors Phase 4
+    R-C1 ([background] suppression). Both blocks are "earlier context"
+    the planner generally doesn't need mid-iteration. §6 + §3 reflect.
+
+R-C5. **Fallback pattern coverage**:
+    - Add `HTTP 408` to §5 patterns (Q41 moves from open to resolved).
+    - Add `Operation timed out` (curl variant of "Timeout was reached").
+    - Drop "HTTP response code said error" from A2 — FAILONERROR was
+      removed in Phase 4 commit `f26cbd9`, this shape no longer fires.
+
+NITs folded:
+  N1. `:route check <text>` always runs the heuristic regardless of
+      `cfg.routing.auto` — debug aid surfaces the class + would-be
+      model + "(routing currently disabled)" suffix when auto is off.
+  N2. **`reasoning → nil` by default** — the v1 heuristic that maps
+      "explain" / "why" / "how does" to a model is too aggressive
+      paired with `nil = keep current` semantics. User must
+      EXPLICITLY map `routing.classes.reasoning = "cloud"` to send
+      reasoning prompts to paid API. Same cost-safety rationale as
+      `cfg.routing.auto = false`.
+  N3. "Retry only when no deltas have arrived" promoted to normative
+      rule in §5 (was in §11 risk row).
+  N4. Config key renamed `cfg.routing.cloud_fallback` →
+      `cfg.routing.fallback` to align with the `:fallback` meta verb.
+      Single-source naming.
+
+**Analyze findings (2026-05-13):**
+
 **Analyze findings (2026-05-13):**

 A1. **router.lua surface clean** — already a pure-Lua module with
@@ -95,9 +146,11 @@ Three pillars per PHASE0.md §11 row 5:
 |---|---|---|
 | Routing trigger | Per-request, in `repl.ask_ai`, BEFORE the broker call | Same hook point as the tool-sub-loop entry. Decision is one function call (`router.classify_model`) that returns the resolved (name, cfg) pair OR nil = keep current. |
 | Classification mechanism | **Pure-Lua heuristics** in `router.classify_model` — keyword/length thresholds, no LLM call | Fast (no network), deterministic, debuggable. An LLM-based classifier is overkill v1; can be added in Phase 6+ if heuristics drift. |
-| Routing classes (v1) | `code`, `reasoning`, `default` → mapped to model presets via `cfg.routing.classes` | Three is enough for the first cut. The defaults map `code → deep`, `reasoning → cloud`, `default → cfg.default_model`. User can remap. |
+| Routing classes (v1) | `code`, `reasoning`, `default` → mapped to model presets via `cfg.routing.classes` | Three classes for the first cut. **Defaults (N2 fold-in)**: `code → "deep"`, `reasoning → nil` (heuristic still fires but no override unless user maps it), `default → nil`. The aggressive `reasoning → "cloud"` default sent ordinary "why does ..." prompts to a paid API; user must opt in explicitly to pay for reasoning. Same cost-safety rationale as `cfg.routing.auto = false`. |
+| Routing cost-safety | `cfg.routing.auto = false` default | Same rationale as `confirm_cmd = true` and `llm_second_opinion = true`: a default-on routing maps "explain ..." prompts to whatever class maps to `"cloud"`, spending paid-API tokens on prompts the user typed for what they thought was their local model. Default off; user opts in. |
 | Fallback trigger | Transport-error pattern match against `err` string — HTTP 5xx, model_not_found, "Connection refused", "Couldn't resolve host", "Timeout was reached" | These are the four shapes the broker actually emits. Library-error patterns are stable enough that string-match is fine for v1. |
 | Fallback target | `cfg.routing.fallback_model` (default `"cloud"` when present) | One-hop fallback only; if cloud also fails, surface the error normally. No retry loops. |
+| Fallback timing | **Only retry when no deltas have arrived yet** (N3 fold-in) | If the local broker emits partial text then 5xx's mid-stream, the user has seen prose; retrying via cloud would duplicate the prefix and confuse the user. The retry path checks an `any_delta` flag in the on_delta callback; only retries when false. |
 | Fallback announcement | Status line `[aish] local <name> failed (<reason>); retrying via <fallback_name>` | Visibility — user always knows when a fallback fired. |
 | Summarize trigger | Inside `context.enforce_budget()`, when it would otherwise `table.remove` | Same place the eviction status fires. The summarize is a *replacement* not an addition; total turn count stays bounded. |
 | Summary turn shape | Single rolling `{role = "system", content = "[earlier conversation]\n<summary>", _summary = true}` turn at index 1 (after the system prompt) | One synthetic turn carries all evicted history. New evictions either *append* to it (cheap) or trigger a re-summarize when the summary itself exceeds a char cap (default 2000). |
@@ -192,9 +245,11 @@ The matcher strips the prefix before testing.
 |---|---|
 | `^HTTP 5%d%d` | server-side error (502 Bad Gateway, 503 Unavailable, 504 Timeout) |
 | `^HTTP 404.*model_not_found` | the routed model isn't loaded on the local backend |
+| `^HTTP 408` | Request Timeout (gateway-level; some proxies emit this — Q41 resolved) |
 | `Couldn'?t resolve host` | DNS / unreachable local broker |
 | `Connection refused` | broker not listening |
-| `Timeout was reached` | broker too slow |
+| `Timeout was reached` | libcurl's internal timeout phrasing |
+| `Operation timed out` | curl variant of timeout (libcurl version-dependent) |

 Errors NOT matched (NOT retried):
 - HTTP 401 / 403 (auth failure — won't get better on cloud)
@@ -318,9 +373,9 @@ Specifically out of Phase 5:
 |---|---|---|---|
 | Q37 | Should routing apply to `:ask <text>` (explicit AI route) the same way it does to bare prompts? Yes seems obvious but worth documenting. | repl.lua | Phase 5 (plan) |
 | Q38 | ~~Summary turn placement: index 1 vs index 0~~ | context.lua | **Resolved at analyze (A3)**: NEITHER — summary lives on `ctx.summary` (string) and composes into the SYSTEM MESSAGE alongside [background] and NORRIS suffix. No new role:"system" message; no alternation risk. |
-| Q39 | When fallback fires AND the user is in Norris mode, does the fallback model also drive the planning loop? Or single-request-only fallback? v1 says single-request only (Norris stays on its configured model). | repl.lua + safety.lua | Phase 5 (plan) |
+| Q39 | ~~Fallback under Norris~~ | repl.lua + safety.lua | **Resolved at review (R-C3)**: AUTO-routing does NOT fire inside the Norris loop. The model is fixed at `:norris <goal>` launch time; the planner stays on it for every iteration. Per-iteration fallback (if a local broker call inside Norris fails) is still gated by `cfg.routing.fallback`; that retries the failed call against cloud but doesn't permanently switch the planner. |
 | Q40 | Summarizer recursion: the summary itself might be summarized later when it grows past max_summary_chars. Does the re-summarize lose fidelity? Probably yes; acceptable trade-off. Note the lossy-by-design contract in §6. | context.lua | Phase 5 (verify) |
-| Q41 | Eligibility patterns for fallback: should HTTP 408 (Request Timeout, distinct from Timeout was reached) be matched? Some servers emit it. Default: yes — pattern `HTTP 408`. Phase 7 verify will adjust. | repl.lua | Phase 7 (verify) |
+| Q41 | ~~HTTP 408 / Operation timed out eligibility~~ | repl.lua | **Resolved at review (R-C5)**: both added to §5 patterns. |
 | Q42 | Auto-router decisions inside the tool-call sub-loop: does each sub-iteration re-classify, or does the first user turn fix the model for the whole sub-loop? Proposal: fix at sub-loop entry — model switching mid-tool-call would confuse the model AND cost tokens by rebuilding context. | repl.lua | Phase 5 (plan) |

 ---