docs/PHASE5: review fold-in — callback signature, Norris suppression, cost defaults

Independent review found 1 BLOCKER + 5 CONCERNs + 4 NITs. Resolutions:

B1 BLOCKER: summary callback signature was inconsistent across §3 and §6.
  Canonical now: summarize_fn(prior_summary, evicted_turns) -> string|nil
  dispatching on the two args:
    (nil, [turns])  — first-time summarize
    (str, [turns])  — additive (extend prior summary with new evictions)
    (str, nil)      — compress (re-summarize the prior summary itself)

C1: re-summarize trigger now uses the (str, nil) compress signal
  rather than degenerate (str, {}).

C2: routing decision is taken once on entry to ask_ai. The chosen
  active_cfg is used for every tool-sub-loop iteration. Original
  active_cfg restored after ask_ai returns.

C3: AUTO-routing does NOT fire inside the Norris loop. Model fixed
  at :norris launch time; planner stays on it for every iteration.
  Q39 resolved. Per-iteration fallback still gated by
  cfg.routing.fallback — retries the failing call against cloud
  without permanently switching the planner.

C4: Summary block suppressed in Norris (mirrors Phase 4 R-C1 for
  the [background] block). Both are "earlier context" the planner
  generally doesn't need.

C5: Fallback pattern coverage expanded — added HTTP 408 (Q41
  resolved) and "Operation timed out" (libcurl version variant).
  Dropped "HTTP response code said error" from A2 — FAILONERROR
  was removed in Phase 4 f26cbd9.

NITs folded:
  N1 :route check <text> always runs heuristic; suffix
     "(routing currently disabled)" when cfg.routing.auto = false
  N2 reasoning → nil by default (not → "cloud"); user explicitly
     opts in to map reasoning to a paid model. Same cost-safety
     rationale as confirm_cmd default true.
  N3 "Retry only when no deltas have arrived" promoted to §5
     normative rule (was in §11 risk row).
  N4 cfg.routing.cloud_fallback renamed cfg.routing.fallback to
     align with the :fallback meta verb.

Reviewer verdict: commit #1 (router.classify_model) is implement-
ready; B1/C1 resolution required before commit #2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-13 11:15:39 +00:00
parent 555fdd7717
commit 2e389c1475
+60 -5
View File
@@ -2,9 +2,60 @@
**Project:** aish — AI-augmented conversational shell
**Document:** Phase 5 Requirements, Architecture & Design Decisions
**Status:** Analyze (formulate complete; current tree at `4453b93` probed)
**Status:** Plan (review fold-in 2026-05-13 — callback signature, Norris suppression, cost defaults resolved)
**Date:** 2026-05-13
**Review fold-in (2026-05-13):**
R-B1. **Summary callback signature canonical**: the closure is
`summarize_fn(prior_summary, evicted_turns) -> string | nil`.
`prior_summary` is `nil` on the first ever summarize; otherwise
the current `ctx.summary` string. `evicted_turns` is `nil` for
the re-summarize-compress trigger (C1 resolution); otherwise the
array of evicted turn tables. The closure dispatches:
first-time: prior=nil, evicted=[...] → "summarize these turns"
additive: prior=str, evicted=[...] → "extend the prior summary"
compress: prior=str, evicted=nil → "compress the prior summary"
R-C2. **Routing taken once per ask_ai**: the model decision happens
on entry to `ask_ai`. The chosen `active_cfg` is used for every
iteration of the tool-call sub-loop. Original `active_cfg` is
restored after `ask_ai` returns. NOT per-broker-call.
R-C3. **AUTO-routing does NOT fire inside Norris**: `run_norris`
operates on a fixed model (whatever the user set via `:model`
before launching). The auto-router would otherwise switch models
mid-plan, which loses planning continuity and costs tokens
rebuilding context. State explicit in §4 + §10.
R-C4. **Summary block suppressed under Norris**: mirrors Phase 4
R-C1 ([background] suppression). Both blocks are "earlier context"
the planner generally doesn't need mid-iteration. §6 + §3 reflect.
R-C5. **Fallback pattern coverage**:
- Add `HTTP 408` to §5 patterns (Q41 moves from open to resolved).
- Add `Operation timed out` (curl variant of "Timeout was reached").
- Drop "HTTP response code said error" from A2 — FAILONERROR was
removed in Phase 4 commit `f26cbd9`, this shape no longer fires.
NITs folded:
N1. `:route check <text>` always runs the heuristic regardless of
`cfg.routing.auto` — debug aid surfaces the class + would-be
model + "(routing currently disabled)" suffix when auto is off.
N2. **`reasoning → nil` by default** — the v1 heuristic that maps
"explain" / "why" / "how does" to a model is too aggressive
paired with `nil = keep current` semantics. User must
EXPLICITLY map `routing.classes.reasoning = "cloud"` to send
reasoning prompts to paid API. Same cost-safety rationale as
`cfg.routing.auto = false`.
N3. "Retry only when no deltas have arrived" promoted to normative
rule in §5 (was in §11 risk row).
N4. Config key renamed `cfg.routing.cloud_fallback`
`cfg.routing.fallback` to align with the `:fallback` meta verb.
Single-source naming.
**Analyze findings (2026-05-13):**
**Analyze findings (2026-05-13):**
A1. **router.lua surface clean** — already a pure-Lua module with
@@ -95,9 +146,11 @@ Three pillars per PHASE0.md §11 row 5:
|---|---|---|
| Routing trigger | Per-request, in `repl.ask_ai`, BEFORE the broker call | Same hook point as the tool-sub-loop entry. Decision is one function call (`router.classify_model`) that returns the resolved (name, cfg) pair OR nil = keep current. |
| Classification mechanism | **Pure-Lua heuristics** in `router.classify_model` — keyword/length thresholds, no LLM call | Fast (no network), deterministic, debuggable. An LLM-based classifier is overkill v1; can be added in Phase 6+ if heuristics drift. |
| Routing classes (v1) | `code`, `reasoning`, `default` → mapped to model presets via `cfg.routing.classes` | Three is enough for the first cut. The defaults map `code → deep`, `reasoning → cloud`, `default → cfg.default_model`. User can remap. |
| Routing classes (v1) | `code`, `reasoning`, `default` → mapped to model presets via `cfg.routing.classes` | Three classes for the first cut. **Defaults (N2 fold-in)**: `code → "deep"`, `reasoning → nil` (heuristic still fires but no override unless user maps it), `default → nil`. The aggressive `reasoning → "cloud"` default sent ordinary "why does ..." prompts to a paid API; user must opt in explicitly to pay for reasoning. Same cost-safety rationale as `cfg.routing.auto = false`. |
| Routing cost-safety | `cfg.routing.auto = false` default | Same rationale as `confirm_cmd = true` and `llm_second_opinion = true`: a default-on routing maps "explain ..." prompts to whatever class maps to `"cloud"`, spending paid-API tokens on prompts the user typed for what they thought was their local model. Default off; user opts in. |
| Fallback trigger | Transport-error pattern match against `err` string — HTTP 5xx, model_not_found, "Connection refused", "Couldn't resolve host", "Timeout was reached" | These are the four shapes the broker actually emits. Library-error patterns are stable enough that string-match is fine for v1. |
| Fallback target | `cfg.routing.fallback_model` (default `"cloud"` when present) | One-hop fallback only; if cloud also fails, surface the error normally. No retry loops. |
| Fallback timing | **Only retry when no deltas have arrived yet** (N3 fold-in) | If the local broker emits partial text then 5xx's mid-stream, the user has seen prose; retrying via cloud would duplicate the prefix and confuse the user. The retry path checks an `any_delta` flag in the on_delta callback; only retries when false. |
| Fallback announcement | Status line `[aish] local <name> failed (<reason>); retrying via <fallback_name>` | Visibility — user always knows when a fallback fired. |
| Summarize trigger | Inside `context.enforce_budget()`, when it would otherwise `table.remove` | Same place the eviction status fires. The summarize is a *replacement* not an addition; total turn count stays bounded. |
| Summary turn shape | Single rolling `{role = "system", content = "[earlier conversation]\n<summary>", _summary = true}` turn at index 1 (after the system prompt) | One synthetic turn carries all evicted history. New evictions either *append* to it (cheap) or trigger a re-summarize when the summary itself exceeds a char cap (default 2000). |
@@ -192,9 +245,11 @@ The matcher strips the prefix before testing.
|---|---|
| `^HTTP 5%d%d` | server-side error (502 Bad Gateway, 503 Unavailable, 504 Timeout) |
| `^HTTP 404.*model_not_found` | the routed model isn't loaded on the local backend |
| `^HTTP 408` | Request Timeout (gateway-level; some proxies emit this — Q41 resolved) |
| `Couldn'?t resolve host` | DNS / unreachable local broker |
| `Connection refused` | broker not listening |
| `Timeout was reached` | broker too slow |
| `Timeout was reached` | libcurl's internal timeout phrasing |
| `Operation timed out` | curl variant of timeout (libcurl version-dependent) |
Errors NOT matched (NOT retried):
- HTTP 401 / 403 (auth failure — won't get better on cloud)
@@ -318,9 +373,9 @@ Specifically out of Phase 5:
|---|---|---|---|
| Q37 | Should routing apply to `:ask <text>` (explicit AI route) the same way it does to bare prompts? Yes seems obvious but worth documenting. | repl.lua | Phase 5 (plan) |
| Q38 | ~~Summary turn placement: index 1 vs index 0~~ | context.lua | **Resolved at analyze (A3)**: NEITHER — summary lives on `ctx.summary` (string) and composes into the SYSTEM MESSAGE alongside [background] and NORRIS suffix. No new role:"system" message; no alternation risk. |
| Q39 | When fallback fires AND the user is in Norris mode, does the fallback model also drive the planning loop? Or single-request-only fallback? v1 says single-request only (Norris stays on its configured model). | repl.lua + safety.lua | Phase 5 (plan) |
| Q39 | ~~Fallback under Norris~~ | repl.lua + safety.lua | **Resolved at review (R-C3)**: AUTO-routing does NOT fire inside the Norris loop. The model is fixed at `:norris <goal>` launch time; the planner stays on it for every iteration. Per-iteration fallback (if a local broker call inside Norris fails) is still gated by `cfg.routing.fallback`; that retries the failed call against cloud but doesn't permanently switch the planner. |
| Q40 | Summarizer recursion: the summary itself might be summarized later when it grows past max_summary_chars. Does the re-summarize lose fidelity? Probably yes; acceptable trade-off. Note the lossy-by-design contract in §6. | context.lua | Phase 5 (verify) |
| Q41 | Eligibility patterns for fallback: should HTTP 408 (Request Timeout, distinct from Timeout was reached) be matched? Some servers emit it. Default: yes — pattern `HTTP 408`. Phase 7 verify will adjust. | repl.lua | Phase 7 (verify) |
| Q41 | ~~HTTP 408 / Operation timed out eligibility~~ | repl.lua | **Resolved at review (R-C5)**: both added to §5 patterns. |
| Q42 | Auto-router decisions inside the tool-call sub-loop: does each sub-iteration re-classify, or does the first user turn fix the model for the whole sub-loop? Proposal: fix at sub-loop entry — model switching mid-tool-call would confuse the model AND cost tokens by rebuilding context. | repl.lua | Phase 5 (plan) |
---