docs/PHASE5: review fold-in — callback signature, Norris suppression, cost defaults
Independent review found 1 BLOCKER + 5 CONCERNs + 4 NITs. Resolutions:
B1 BLOCKER: summary callback signature was inconsistent across §3 and §6.
Canonical now: summarize_fn(prior_summary, evicted_turns) -> string|nil
dispatching on the two args:
(nil, [turns]) — first-time summarize
(str, [turns]) — additive (extend prior summary with new evictions)
(str, nil) — compress (re-summarize the prior summary itself)
C1: re-summarize trigger now uses the (str, nil) compress signal
rather than degenerate (str, {}).
C2: routing decision is taken once on entry to ask_ai. The chosen
active_cfg is used for every tool-sub-loop iteration. Original
active_cfg restored after ask_ai returns.
C3: AUTO-routing does NOT fire inside the Norris loop. Model fixed
at :norris launch time; planner stays on it for every iteration.
Q39 resolved. Per-iteration fallback still gated by
cfg.routing.fallback — retries the failing call against cloud
without permanently switching the planner.
C4: Summary block suppressed in Norris (mirrors Phase 4 R-C1 for
the [background] block). Both are "earlier context" the planner
generally doesn't need.
C5: Fallback pattern coverage expanded — added HTTP 408 (Q41
resolved) and "Operation timed out" (libcurl version variant).
Dropped "HTTP response code said error" from A2 — FAILONERROR
was removed in Phase 4 f26cbd9.
NITs folded:
N1 :route check <text> always runs heuristic; suffix
"(routing currently disabled)" when cfg.routing.auto = false
N2 reasoning → nil by default (not → "cloud"); user explicitly
opts in to map reasoning to a paid model. Same cost-safety
rationale as confirm_cmd default true.
N3 "Retry only when no deltas have arrived" promoted to §5
normative rule (was in §11 risk row).
N4 cfg.routing.cloud_fallback renamed cfg.routing.fallback to
align with the :fallback meta verb.
Reviewer verdict: commit #1 (router.classify_model) is implement-
ready; B1/C1 resolution required before commit #2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+60
-5
@@ -2,9 +2,60 @@
|
||||
|
||||
**Project:** aish — AI-augmented conversational shell
|
||||
**Document:** Phase 5 Requirements, Architecture & Design Decisions
|
||||
**Status:** Analyze (formulate complete; current tree at `4453b93` probed)
|
||||
**Status:** Plan (review fold-in 2026-05-13 — callback signature, Norris suppression, cost defaults resolved)
|
||||
**Date:** 2026-05-13
|
||||
|
||||
**Review fold-in (2026-05-13):**
|
||||
|
||||
R-B1. **Summary callback signature canonical**: the closure is
|
||||
`summarize_fn(prior_summary, evicted_turns) -> string | nil`.
|
||||
`prior_summary` is `nil` on the first ever summarize; otherwise
|
||||
the current `ctx.summary` string. `evicted_turns` is `nil` for
|
||||
the re-summarize-compress trigger (C1 resolution); otherwise the
|
||||
array of evicted turn tables. The closure dispatches:
|
||||
first-time: prior=nil, evicted=[...] → "summarize these turns"
|
||||
additive: prior=str, evicted=[...] → "extend the prior summary"
|
||||
compress: prior=str, evicted=nil → "compress the prior summary"
|
||||
|
||||
R-C2. **Routing taken once per ask_ai**: the model decision happens
|
||||
on entry to `ask_ai`. The chosen `active_cfg` is used for every
|
||||
iteration of the tool-call sub-loop. Original `active_cfg` is
|
||||
restored after `ask_ai` returns. NOT per-broker-call.
|
||||
|
||||
R-C3. **AUTO-routing does NOT fire inside Norris**: `run_norris`
|
||||
operates on a fixed model (whatever the user set via `:model`
|
||||
before launching). The auto-router would otherwise switch models
|
||||
mid-plan, which loses planning continuity and costs tokens
|
||||
rebuilding context. State explicit in §4 + §10.
|
||||
|
||||
R-C4. **Summary block suppressed under Norris**: mirrors Phase 4
|
||||
R-C1 ([background] suppression). Both blocks are "earlier context"
|
||||
the planner generally doesn't need mid-iteration. §6 + §3 reflect.
|
||||
|
||||
R-C5. **Fallback pattern coverage**:
|
||||
- Add `HTTP 408` to §5 patterns (Q41 moves from open to resolved).
|
||||
- Add `Operation timed out` (curl variant of "Timeout was reached").
|
||||
- Drop "HTTP response code said error" from A2 — FAILONERROR was
|
||||
removed in Phase 4 commit `f26cbd9`, this shape no longer fires.
|
||||
|
||||
NITs folded:
|
||||
N1. `:route check <text>` always runs the heuristic regardless of
|
||||
`cfg.routing.auto` — debug aid surfaces the class + would-be
|
||||
model + "(routing currently disabled)" suffix when auto is off.
|
||||
N2. **`reasoning → nil` by default** — the v1 heuristic that maps
|
||||
"explain" / "why" / "how does" to a model is too aggressive
|
||||
paired with `nil = keep current` semantics. User must
|
||||
EXPLICITLY map `routing.classes.reasoning = "cloud"` to send
|
||||
reasoning prompts to paid API. Same cost-safety rationale as
|
||||
`cfg.routing.auto = false`.
|
||||
N3. "Retry only when no deltas have arrived" promoted to normative
|
||||
rule in §5 (was in §11 risk row).
|
||||
N4. Config key renamed `cfg.routing.cloud_fallback` →
|
||||
`cfg.routing.fallback` to align with the `:fallback` meta verb.
|
||||
Single-source naming.
|
||||
|
||||
**Analyze findings (2026-05-13):**
|
||||
|
||||
**Analyze findings (2026-05-13):**
|
||||
|
||||
A1. **router.lua surface clean** — already a pure-Lua module with
|
||||
@@ -95,9 +146,11 @@ Three pillars per PHASE0.md §11 row 5:
|
||||
|---|---|---|
|
||||
| Routing trigger | Per-request, in `repl.ask_ai`, BEFORE the broker call | Same hook point as the tool-sub-loop entry. Decision is one function call (`router.classify_model`) that returns the resolved (name, cfg) pair OR nil = keep current. |
|
||||
| Classification mechanism | **Pure-Lua heuristics** in `router.classify_model` — keyword/length thresholds, no LLM call | Fast (no network), deterministic, debuggable. An LLM-based classifier is overkill v1; can be added in Phase 6+ if heuristics drift. |
|
||||
| Routing classes (v1) | `code`, `reasoning`, `default` → mapped to model presets via `cfg.routing.classes` | Three is enough for the first cut. The defaults map `code → deep`, `reasoning → cloud`, `default → cfg.default_model`. User can remap. |
|
||||
| Routing classes (v1) | `code`, `reasoning`, `default` → mapped to model presets via `cfg.routing.classes` | Three classes for the first cut. **Defaults (N2 fold-in)**: `code → "deep"`, `reasoning → nil` (heuristic still fires but no override unless user maps it), `default → nil`. The aggressive `reasoning → "cloud"` default sent ordinary "why does ..." prompts to a paid API; user must opt in explicitly to pay for reasoning. Same cost-safety rationale as `cfg.routing.auto = false`. |
|
||||
| Routing cost-safety | `cfg.routing.auto = false` default | Same rationale as `confirm_cmd = true` and `llm_second_opinion = true`: a default-on routing maps "explain ..." prompts to whatever class maps to `"cloud"`, spending paid-API tokens on prompts the user typed for what they thought was their local model. Default off; user opts in. |
|
||||
| Fallback trigger | Transport-error pattern match against `err` string — HTTP 5xx, model_not_found, "Connection refused", "Couldn't resolve host", "Timeout was reached" | These are the four shapes the broker actually emits. Library-error patterns are stable enough that string-match is fine for v1. |
|
||||
| Fallback target | `cfg.routing.fallback_model` (default `"cloud"` when present) | One-hop fallback only; if cloud also fails, surface the error normally. No retry loops. |
|
||||
| Fallback timing | **Only retry when no deltas have arrived yet** (N3 fold-in) | If the local broker emits partial text then 5xx's mid-stream, the user has seen prose; retrying via cloud would duplicate the prefix and confuse the user. The retry path checks an `any_delta` flag in the on_delta callback; only retries when false. |
|
||||
| Fallback announcement | Status line `[aish] local <name> failed (<reason>); retrying via <fallback_name>` | Visibility — user always knows when a fallback fired. |
|
||||
| Summarize trigger | Inside `context.enforce_budget()`, when it would otherwise `table.remove` | Same place the eviction status fires. The summarize is a *replacement* not an addition; total turn count stays bounded. |
|
||||
| Summary turn shape | Single rolling `{role = "system", content = "[earlier conversation]\n<summary>", _summary = true}` turn at index 1 (after the system prompt) | One synthetic turn carries all evicted history. New evictions either *append* to it (cheap) or trigger a re-summarize when the summary itself exceeds a char cap (default 2000). |
|
||||
@@ -192,9 +245,11 @@ The matcher strips the prefix before testing.
|
||||
|---|---|
|
||||
| `^HTTP 5%d%d` | server-side error (502 Bad Gateway, 503 Unavailable, 504 Timeout) |
|
||||
| `^HTTP 404.*model_not_found` | the routed model isn't loaded on the local backend |
|
||||
| `^HTTP 408` | Request Timeout (gateway-level; some proxies emit this — Q41 resolved) |
|
||||
| `Couldn'?t resolve host` | DNS / unreachable local broker |
|
||||
| `Connection refused` | broker not listening |
|
||||
| `Timeout was reached` | broker too slow |
|
||||
| `Timeout was reached` | libcurl's internal timeout phrasing |
|
||||
| `Operation timed out` | curl variant of timeout (libcurl version-dependent) |
|
||||
|
||||
Errors NOT matched (NOT retried):
|
||||
- HTTP 401 / 403 (auth failure — won't get better on cloud)
|
||||
@@ -318,9 +373,9 @@ Specifically out of Phase 5:
|
||||
|---|---|---|---|
|
||||
| Q37 | Should routing apply to `:ask <text>` (explicit AI route) the same way it does to bare prompts? Yes seems obvious but worth documenting. | repl.lua | Phase 5 (plan) |
|
||||
| Q38 | ~~Summary turn placement: index 1 vs index 0~~ | context.lua | **Resolved at analyze (A3)**: NEITHER — summary lives on `ctx.summary` (string) and composes into the SYSTEM MESSAGE alongside [background] and NORRIS suffix. No new role:"system" message; no alternation risk. |
|
||||
| Q39 | When fallback fires AND the user is in Norris mode, does the fallback model also drive the planning loop? Or single-request-only fallback? v1 says single-request only (Norris stays on its configured model). | repl.lua + safety.lua | Phase 5 (plan) |
|
||||
| Q39 | ~~Fallback under Norris~~ | repl.lua + safety.lua | **Resolved at review (R-C3)**: AUTO-routing does NOT fire inside the Norris loop. The model is fixed at `:norris <goal>` launch time; the planner stays on it for every iteration. Per-iteration fallback (if a local broker call inside Norris fails) is still gated by `cfg.routing.fallback`; that retries the failed call against cloud but doesn't permanently switch the planner. |
|
||||
| Q40 | Summarizer recursion: the summary itself might be summarized later when it grows past max_summary_chars. Does the re-summarize lose fidelity? Probably yes; acceptable trade-off. Note the lossy-by-design contract in §6. | context.lua | Phase 5 (verify) |
|
||||
| Q41 | Eligibility patterns for fallback: should HTTP 408 (Request Timeout, distinct from Timeout was reached) be matched? Some servers emit it. Default: yes — pattern `HTTP 408`. Phase 7 verify will adjust. | repl.lua | Phase 7 (verify) |
|
||||
| Q41 | ~~HTTP 408 / Operation timed out eligibility~~ | repl.lua | **Resolved at review (R-C5)**: both added to §5 patterns. |
|
||||
| Q42 | Auto-router decisions inside the tool-call sub-loop: does each sub-iteration re-classify, or does the first user turn fix the model for the whole sub-loop? Proposal: fix at sub-loop entry — model switching mid-tool-call would confuse the model AND cost tokens by rebuilding context. | repl.lua | Phase 5 (plan) |
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user