From 2e389c14756b7c8948ce9dd20508e364787fa59a Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Wed, 13 May 2026 11:15:39 +0000 Subject: [PATCH] =?UTF-8?q?docs/PHASE5:=20review=20fold-in=20=E2=80=94=20c?= =?UTF-8?q?allback=20signature,=20Norris=20suppression,=20cost=20defaults?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Independent review found 1 BLOCKER + 5 CONCERNs + 4 NITs. Resolutions: B1 BLOCKER: summary callback signature was inconsistent across §3 and §6. Canonical now: summarize_fn(prior_summary, evicted_turns) -> string|nil dispatching on the two args: (nil, [turns]) — first-time summarize (str, [turns]) — additive (extend prior summary with new evictions) (str, nil) — compress (re-summarize the prior summary itself) C1: re-summarize trigger now uses the (str, nil) compress signal rather than degenerate (str, {}). C2: routing decision is taken once on entry to ask_ai. The chosen active_cfg is used for every tool-sub-loop iteration. Original active_cfg restored after ask_ai returns. C3: AUTO-routing does NOT fire inside the Norris loop. Model fixed at :norris launch time; planner stays on it for every iteration. Q39 resolved. Per-iteration fallback still gated by cfg.routing.fallback — retries the failing call against cloud without permanently switching the planner. C4: Summary block suppressed in Norris (mirrors Phase 4 R-C1 for the [background] block). Both are "earlier context" the planner generally doesn't need. C5: Fallback pattern coverage expanded — added HTTP 408 (Q41 resolved) and "Operation timed out" (libcurl version variant). Dropped "HTTP response code said error" from A2 — FAILONERROR was removed in Phase 4 f26cbd9. NITs folded: N1 :route check always runs heuristic; suffix "(routing currently disabled)" when cfg.routing.auto = false N2 reasoning → nil by default (not → "cloud"); user explicitly opts in to map reasoning to a paid model. Same cost-safety rationale as confirm_cmd default true. N3 "Retry only when no deltas have arrived" promoted to §5 normative rule (was in §11 risk row). N4 cfg.routing.cloud_fallback renamed cfg.routing.fallback to align with the :fallback meta verb. Reviewer verdict: commit #1 (router.classify_model) is implement- ready; B1/C1 resolution required before commit #2. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/PHASE5.md | 65 ++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 60 insertions(+), 5 deletions(-) diff --git a/docs/PHASE5.md b/docs/PHASE5.md index ef2a93e..8ad3faa 100644 --- a/docs/PHASE5.md +++ b/docs/PHASE5.md @@ -2,9 +2,60 @@ **Project:** aish — AI-augmented conversational shell **Document:** Phase 5 Requirements, Architecture & Design Decisions -**Status:** Analyze (formulate complete; current tree at `4453b93` probed) +**Status:** Plan (review fold-in 2026-05-13 — callback signature, Norris suppression, cost defaults resolved) **Date:** 2026-05-13 +**Review fold-in (2026-05-13):** + +R-B1. **Summary callback signature canonical**: the closure is + `summarize_fn(prior_summary, evicted_turns) -> string | nil`. + `prior_summary` is `nil` on the first ever summarize; otherwise + the current `ctx.summary` string. `evicted_turns` is `nil` for + the re-summarize-compress trigger (C1 resolution); otherwise the + array of evicted turn tables. The closure dispatches: + first-time: prior=nil, evicted=[...] → "summarize these turns" + additive: prior=str, evicted=[...] → "extend the prior summary" + compress: prior=str, evicted=nil → "compress the prior summary" + +R-C2. **Routing taken once per ask_ai**: the model decision happens + on entry to `ask_ai`. The chosen `active_cfg` is used for every + iteration of the tool-call sub-loop. Original `active_cfg` is + restored after `ask_ai` returns. NOT per-broker-call. + +R-C3. **AUTO-routing does NOT fire inside Norris**: `run_norris` + operates on a fixed model (whatever the user set via `:model` + before launching). The auto-router would otherwise switch models + mid-plan, which loses planning continuity and costs tokens + rebuilding context. State explicit in §4 + §10. + +R-C4. **Summary block suppressed under Norris**: mirrors Phase 4 + R-C1 ([background] suppression). Both blocks are "earlier context" + the planner generally doesn't need mid-iteration. §6 + §3 reflect. + +R-C5. **Fallback pattern coverage**: + - Add `HTTP 408` to §5 patterns (Q41 moves from open to resolved). + - Add `Operation timed out` (curl variant of "Timeout was reached"). + - Drop "HTTP response code said error" from A2 — FAILONERROR was + removed in Phase 4 commit `f26cbd9`, this shape no longer fires. + +NITs folded: + N1. `:route check ` always runs the heuristic regardless of + `cfg.routing.auto` — debug aid surfaces the class + would-be + model + "(routing currently disabled)" suffix when auto is off. + N2. **`reasoning → nil` by default** — the v1 heuristic that maps + "explain" / "why" / "how does" to a model is too aggressive + paired with `nil = keep current` semantics. User must + EXPLICITLY map `routing.classes.reasoning = "cloud"` to send + reasoning prompts to paid API. Same cost-safety rationale as + `cfg.routing.auto = false`. + N3. "Retry only when no deltas have arrived" promoted to normative + rule in §5 (was in §11 risk row). + N4. Config key renamed `cfg.routing.cloud_fallback` → + `cfg.routing.fallback` to align with the `:fallback` meta verb. + Single-source naming. + +**Analyze findings (2026-05-13):** + **Analyze findings (2026-05-13):** A1. **router.lua surface clean** — already a pure-Lua module with @@ -95,9 +146,11 @@ Three pillars per PHASE0.md §11 row 5: |---|---|---| | Routing trigger | Per-request, in `repl.ask_ai`, BEFORE the broker call | Same hook point as the tool-sub-loop entry. Decision is one function call (`router.classify_model`) that returns the resolved (name, cfg) pair OR nil = keep current. | | Classification mechanism | **Pure-Lua heuristics** in `router.classify_model` — keyword/length thresholds, no LLM call | Fast (no network), deterministic, debuggable. An LLM-based classifier is overkill v1; can be added in Phase 6+ if heuristics drift. | -| Routing classes (v1) | `code`, `reasoning`, `default` → mapped to model presets via `cfg.routing.classes` | Three is enough for the first cut. The defaults map `code → deep`, `reasoning → cloud`, `default → cfg.default_model`. User can remap. | +| Routing classes (v1) | `code`, `reasoning`, `default` → mapped to model presets via `cfg.routing.classes` | Three classes for the first cut. **Defaults (N2 fold-in)**: `code → "deep"`, `reasoning → nil` (heuristic still fires but no override unless user maps it), `default → nil`. The aggressive `reasoning → "cloud"` default sent ordinary "why does ..." prompts to a paid API; user must opt in explicitly to pay for reasoning. Same cost-safety rationale as `cfg.routing.auto = false`. | +| Routing cost-safety | `cfg.routing.auto = false` default | Same rationale as `confirm_cmd = true` and `llm_second_opinion = true`: a default-on routing maps "explain ..." prompts to whatever class maps to `"cloud"`, spending paid-API tokens on prompts the user typed for what they thought was their local model. Default off; user opts in. | | Fallback trigger | Transport-error pattern match against `err` string — HTTP 5xx, model_not_found, "Connection refused", "Couldn't resolve host", "Timeout was reached" | These are the four shapes the broker actually emits. Library-error patterns are stable enough that string-match is fine for v1. | | Fallback target | `cfg.routing.fallback_model` (default `"cloud"` when present) | One-hop fallback only; if cloud also fails, surface the error normally. No retry loops. | +| Fallback timing | **Only retry when no deltas have arrived yet** (N3 fold-in) | If the local broker emits partial text then 5xx's mid-stream, the user has seen prose; retrying via cloud would duplicate the prefix and confuse the user. The retry path checks an `any_delta` flag in the on_delta callback; only retries when false. | | Fallback announcement | Status line `[aish] local failed (); retrying via ` | Visibility — user always knows when a fallback fired. | | Summarize trigger | Inside `context.enforce_budget()`, when it would otherwise `table.remove` | Same place the eviction status fires. The summarize is a *replacement* not an addition; total turn count stays bounded. | | Summary turn shape | Single rolling `{role = "system", content = "[earlier conversation]\n", _summary = true}` turn at index 1 (after the system prompt) | One synthetic turn carries all evicted history. New evictions either *append* to it (cheap) or trigger a re-summarize when the summary itself exceeds a char cap (default 2000). | @@ -192,9 +245,11 @@ The matcher strips the prefix before testing. |---|---| | `^HTTP 5%d%d` | server-side error (502 Bad Gateway, 503 Unavailable, 504 Timeout) | | `^HTTP 404.*model_not_found` | the routed model isn't loaded on the local backend | +| `^HTTP 408` | Request Timeout (gateway-level; some proxies emit this — Q41 resolved) | | `Couldn'?t resolve host` | DNS / unreachable local broker | | `Connection refused` | broker not listening | -| `Timeout was reached` | broker too slow | +| `Timeout was reached` | libcurl's internal timeout phrasing | +| `Operation timed out` | curl variant of timeout (libcurl version-dependent) | Errors NOT matched (NOT retried): - HTTP 401 / 403 (auth failure — won't get better on cloud) @@ -318,9 +373,9 @@ Specifically out of Phase 5: |---|---|---|---| | Q37 | Should routing apply to `:ask ` (explicit AI route) the same way it does to bare prompts? Yes seems obvious but worth documenting. | repl.lua | Phase 5 (plan) | | Q38 | ~~Summary turn placement: index 1 vs index 0~~ | context.lua | **Resolved at analyze (A3)**: NEITHER — summary lives on `ctx.summary` (string) and composes into the SYSTEM MESSAGE alongside [background] and NORRIS suffix. No new role:"system" message; no alternation risk. | -| Q39 | When fallback fires AND the user is in Norris mode, does the fallback model also drive the planning loop? Or single-request-only fallback? v1 says single-request only (Norris stays on its configured model). | repl.lua + safety.lua | Phase 5 (plan) | +| Q39 | ~~Fallback under Norris~~ | repl.lua + safety.lua | **Resolved at review (R-C3)**: AUTO-routing does NOT fire inside the Norris loop. The model is fixed at `:norris ` launch time; the planner stays on it for every iteration. Per-iteration fallback (if a local broker call inside Norris fails) is still gated by `cfg.routing.fallback`; that retries the failed call against cloud but doesn't permanently switch the planner. | | Q40 | Summarizer recursion: the summary itself might be summarized later when it grows past max_summary_chars. Does the re-summarize lose fidelity? Probably yes; acceptable trade-off. Note the lossy-by-design contract in §6. | context.lua | Phase 5 (verify) | -| Q41 | Eligibility patterns for fallback: should HTTP 408 (Request Timeout, distinct from Timeout was reached) be matched? Some servers emit it. Default: yes — pattern `HTTP 408`. Phase 7 verify will adjust. | repl.lua | Phase 7 (verify) | +| Q41 | ~~HTTP 408 / Operation timed out eligibility~~ | repl.lua | **Resolved at review (R-C5)**: both added to §5 patterns. | | Q42 | Auto-router decisions inside the tool-call sub-loop: does each sub-iteration re-classify, or does the first user turn fix the model for the whole sub-loop? Proposal: fix at sub-loop entry — model switching mid-tool-call would confuse the model AND cost tokens by rebuilding context. | repl.lua | Phase 5 (plan) | ---