Files
marfrit 2e389c1475 docs/PHASE5: review fold-in — callback signature, Norris suppression, cost defaults
Independent review found 1 BLOCKER + 5 CONCERNs + 4 NITs. Resolutions:

B1 BLOCKER: summary callback signature was inconsistent across §3 and §6.
  Canonical now: summarize_fn(prior_summary, evicted_turns) -> string|nil
  dispatching on the two args:
    (nil, [turns])  — first-time summarize
    (str, [turns])  — additive (extend prior summary with new evictions)
    (str, nil)      — compress (re-summarize the prior summary itself)

C1: re-summarize trigger now uses the (str, nil) compress signal
  rather than degenerate (str, {}).

C2: routing decision is taken once on entry to ask_ai. The chosen
  active_cfg is used for every tool-sub-loop iteration. Original
  active_cfg restored after ask_ai returns.

C3: AUTO-routing does NOT fire inside the Norris loop. Model fixed
  at :norris launch time; planner stays on it for every iteration.
  Q39 resolved. Per-iteration fallback still gated by
  cfg.routing.fallback — retries the failing call against cloud
  without permanently switching the planner.

C4: Summary block suppressed in Norris (mirrors Phase 4 R-C1 for
  the [background] block). Both are "earlier context" the planner
  generally doesn't need.

C5: Fallback pattern coverage expanded — added HTTP 408 (Q41
  resolved) and "Operation timed out" (libcurl version variant).
  Dropped "HTTP response code said error" from A2 — FAILONERROR
  was removed in Phase 4 f26cbd9.

NITs folded:
  N1 :route check <text> always runs heuristic; suffix
     "(routing currently disabled)" when cfg.routing.auto = false
  N2 reasoning → nil by default (not → "cloud"); user explicitly
     opts in to map reasoning to a paid model. Same cost-safety
     rationale as confirm_cmd default true.
  N3 "Retry only when no deltas have arrived" promoted to §5
     normative rule (was in §11 risk row).
  N4 cfg.routing.cloud_fallback renamed cfg.routing.fallback to
     align with the :fallback meta verb.

Reviewer verdict: commit #1 (router.classify_model) is implement-
ready; B1/C1 resolution required before commit #2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:15:39 +00:00

23 KiB

aish — Phase 5 Manifest

Project: aish — AI-augmented conversational shell Document: Phase 5 Requirements, Architecture & Design Decisions Status: Plan (review fold-in 2026-05-13 — callback signature, Norris suppression, cost defaults resolved) Date: 2026-05-13

Review fold-in (2026-05-13):

R-B1. Summary callback signature canonical: the closure is summarize_fn(prior_summary, evicted_turns) -> string | nil. prior_summary is nil on the first ever summarize; otherwise the current ctx.summary string. evicted_turns is nil for the re-summarize-compress trigger (C1 resolution); otherwise the array of evicted turn tables. The closure dispatches: first-time: prior=nil, evicted=[...] → "summarize these turns" additive: prior=str, evicted=[...] → "extend the prior summary" compress: prior=str, evicted=nil → "compress the prior summary"

R-C2. Routing taken once per ask_ai: the model decision happens on entry to ask_ai. The chosen active_cfg is used for every iteration of the tool-call sub-loop. Original active_cfg is restored after ask_ai returns. NOT per-broker-call.

R-C3. AUTO-routing does NOT fire inside Norris: run_norris operates on a fixed model (whatever the user set via :model before launching). The auto-router would otherwise switch models mid-plan, which loses planning continuity and costs tokens rebuilding context. State explicit in §4 + §10.

R-C4. Summary block suppressed under Norris: mirrors Phase 4 R-C1 ([background] suppression). Both blocks are "earlier context" the planner generally doesn't need mid-iteration. §6 + §3 reflect.

R-C5. Fallback pattern coverage: - Add HTTP 408 to §5 patterns (Q41 moves from open to resolved). - Add Operation timed out (curl variant of "Timeout was reached"). - Drop "HTTP response code said error" from A2 — FAILONERROR was removed in Phase 4 commit f26cbd9, this shape no longer fires.

NITs folded: N1. :route check <text> always runs the heuristic regardless of cfg.routing.auto — debug aid surfaces the class + would-be model + "(routing currently disabled)" suffix when auto is off. N2. reasoning → nil by default — the v1 heuristic that maps "explain" / "why" / "how does" to a model is too aggressive paired with nil = keep current semantics. User must EXPLICITLY map routing.classes.reasoning = "cloud" to send reasoning prompts to paid API. Same cost-safety rationale as cfg.routing.auto = false. N3. "Retry only when no deltas have arrived" promoted to normative rule in §5 (was in §11 risk row). N4. Config key renamed cfg.routing.cloud_fallbackcfg.routing.fallback to align with the :fallback meta verb. Single-source naming.

Analyze findings (2026-05-13):

Analyze findings (2026-05-13):

A1. router.lua surface clean — already a pure-Lua module with M.classify(line, config) -> (kind, payload). Adding M.classify_model(text, cfg) -> name | nil is a natural sibling. No structural refactor.

A2. broker error message shapes all carry transport-stage prefixes that the fallback matcher must account for. The actual shapes callers see: "transport: HTTP %d%d%d: " (from post_sse status>=400) "transport: Timeout was reached" "transport: Couldn't resolve host" "transport: Connection refused" "transport: HTTP response code said error" (rare; from FAILONERROR) "api: <error.message>" (SSE-framed error envelope) "broker: model_cfg.endpoint and .model required" (config bug) Fallback patterns in §5 should match against the "transport: " prefix explicitly. "api: ..." errors don't fall back (they're semantic — bad request shape, not server failure). "broker: ..." errors don't fall back either (config bug).

A3. Q38 resolved at analyze — placing the rolling summary as turns[1] with role:"system" would produce system/system back-to-back in to_messages output (msg[1] is the composed system prompt; msg[2] would be the summary as another system message). Strict templates may reject this same way they reject user/user (PHASE0 §6). Resolution: render the summary INSIDE the composed system message (same pattern as the [background] and NORRIS blocks). Storage stays simple — keep _summary text on ctx.summary (NOT in ctx.turns), append to the system prompt in to_messages alongside the [background] and NORRIS blocks. §6 + §3 reflect.

PHASE0 is the locked substrate; PHASE1-4 are layered on top. This manifest specifies what Phase 5 adds — multi-model routing, cloud fallback, and context summarization on eviction.


1. Scope of Phase 5

Three pillars per PHASE0.md §11 row 5:

  1. Multi-model routing by task typerouter.lua extended with a per-request classify_model(text, cfg) that suggests a model preset based on lightweight heuristics over the user input. Opt-in via cfg.routing.auto = true; default off (explicit :model stays the only switch).

  2. Cloud fallback on local failure — when the active broker call returns nil, err for a transport reason that looks like "local backend down" (HTTP 502 / 503 / 404 model-not-found / libcurl connection-refused / timeout), automatically retry once against the configured cloud preset, surfacing a status line so the user knows what happened. Opt-in via cfg.routing.cloud_fallback = true; default off (single-shot only).

  3. Context summarization on eviction — when context.enforce_budget() would evict the oldest turn pair, instead send those turns to the fast model (or cfg.context.summarizer_model) with "summarize these turns in 2-3 sentences", then replace them with a synthetic role:"system"-adjacent turn carrying the summary. Subsequent evictions append to or re-summarize the rolling summary. Opt-in via cfg.context.summarize_on_evict = true; default off (Phase 0 silent eviction stays the default).

Phase 5 is done when:

  • With cfg.routing.auto = true, a prompt like "explain this Python traceback ..." gets routed to deep while "ls /tmp" or "what time is it?" stays on fast — visible status [aish] routed to deep.
  • With cfg.routing.cloud_fallback = true, killing the local llama.cpp upstream and asking a question yields a single retry on the cloud preset + a status line.
  • With cfg.context.summarize_on_evict = true, a long conversation that exceeds max_turns no longer silently drops history — the evicted span is summarized into a single rolling turn the model still sees.
  • Existing configs without cfg.routing or cfg.context.summarize_on_evict behave exactly like Phase 4 (Phase 4 regression coverage).

2. Technology Decisions (delta from Phase 4)

Decision Choice Rationale
Routing trigger Per-request, in repl.ask_ai, BEFORE the broker call Same hook point as the tool-sub-loop entry. Decision is one function call (router.classify_model) that returns the resolved (name, cfg) pair OR nil = keep current.
Classification mechanism Pure-Lua heuristics in router.classify_model — keyword/length thresholds, no LLM call Fast (no network), deterministic, debuggable. An LLM-based classifier is overkill v1; can be added in Phase 6+ if heuristics drift.
Routing classes (v1) code, reasoning, default → mapped to model presets via cfg.routing.classes Three classes for the first cut. Defaults (N2 fold-in): code → "deep", reasoning → nil (heuristic still fires but no override unless user maps it), default → nil. The aggressive reasoning → "cloud" default sent ordinary "why does ..." prompts to a paid API; user must opt in explicitly to pay for reasoning. Same cost-safety rationale as cfg.routing.auto = false.
Routing cost-safety cfg.routing.auto = false default Same rationale as confirm_cmd = true and llm_second_opinion = true: a default-on routing maps "explain ..." prompts to whatever class maps to "cloud", spending paid-API tokens on prompts the user typed for what they thought was their local model. Default off; user opts in.
Fallback trigger Transport-error pattern match against err string — HTTP 5xx, model_not_found, "Connection refused", "Couldn't resolve host", "Timeout was reached" These are the four shapes the broker actually emits. Library-error patterns are stable enough that string-match is fine for v1.
Fallback target cfg.routing.fallback_model (default "cloud" when present) One-hop fallback only; if cloud also fails, surface the error normally. No retry loops.
Fallback timing Only retry when no deltas have arrived yet (N3 fold-in) If the local broker emits partial text then 5xx's mid-stream, the user has seen prose; retrying via cloud would duplicate the prefix and confuse the user. The retry path checks an any_delta flag in the on_delta callback; only retries when false.
Fallback announcement Status line [aish] local <name> failed (<reason>); retrying via <fallback_name> Visibility — user always knows when a fallback fired.
Summarize trigger Inside context.enforce_budget(), when it would otherwise table.remove Same place the eviction status fires. The summarize is a replacement not an addition; total turn count stays bounded.
Summary turn shape Single rolling {role = "system", content = "[earlier conversation]\n<summary>", _summary = true} turn at index 1 (after the system prompt) One synthetic turn carries all evicted history. New evictions either append to it (cheap) or trigger a re-summarize when the summary itself exceeds a char cap (default 2000).
Summary model cfg.context.summarizer_model (default "fast") Same pattern as cfg.memory.summarizer_model. Fast model is cheap enough to summarize on every eviction.
Summary failure handling If broker returns nil, fall back to silent eviction (Phase 0 behavior) and status-log once. Don't block the user's main request. Best-effort; never let summarization break the REPL.

3. Module Changes

File State after Phase 4 Phase 5 changes
router.lua classify(line, config)(kind, payload) for shell/AI/meta dispatch Add `M.classify_model(text, cfg) -> name
context.lua turns + memory_items + Norris suffix Extend enforce_budget() to invoke a callback (passed via Context.new(opts.summarize_fn)) when about to evict. Store the returned summary as ctx.summary (string) — NOT a turn (A3 — avoids system/system alternation). to_messages composes it into the system message alongside [background] and NORRIS, between them: system → [background] → [earlier summary] → NORRIS. New evictions append to ctx.summary; when its length exceeds max_summary_chars (default 2000), the callback is invoked AGAIN with (prior_summary, new_evicted_turns) to re-summarize. Silent eviction is the fallback when the callback returns nil.
repl.lua tool-sub-loop + meta + memory injection (a) Pre-broker hook: if cfg.routing.auto, call router.classify_model(text, cfg) and switch active_cfg for THIS request only (revert after). (b) Post-broker error hook: if err matches a fallback pattern AND cfg.routing.cloud_fallback, retry against the fallback model once. (c) Wire Context.new with a summarize_fn = function(turns) ... end closure that calls broker.chat(cfg.models[cfg.context.summarizer_model], ..., {max_tokens=300}).
broker.lua streaming + opts.tools/max_tokens/timeout_ms Unchanged — Phase 5 composes on top of the existing surface.
config.lua example with mcp/safety/memory blocks Add commented-out routing = {...} and context.summarize_on_evict = true example.

No new module files. All Phase 5 functionality grows existing files — mostly repl.lua and router.lua.


4. Routing Heuristics (v1)

router.classify_model(text, cfg) returns a model NAME (looked up in cfg.routing.classes) or nil (use the user-set active model).

Heuristics, in order — first hit wins:

  1. Code class if any of:

    • Triple-backtick code fence anywhere
    • Token "traceback" / "stacktrace" / "stack trace" (case-insensitive)
    • Token "error:" or "exception:" near beginning
    • Text contains a path-like ./|/usr|~/ + .py|.lua|.c|.js|.go|.rs
    • More than 4 lines AND has indentation (looks like a paste)
  2. Reasoning class if any of:

    • Token "explain" / "why" / "how does" / "compare"
    • Question mark + > 100 chars total
  3. Default class otherwise.

Each class maps to a model name via cfg.routing.classes:

routing = {
    auto = true,
    classes = {
        code      = "deep",     -- code questions to deep
        reasoning = "cloud",    -- reasoning to cloud (best quality)
        default   = nil,        -- nil = keep current active model
    },
    cloud_fallback = true,
    fallback_model = "cloud",
}

When auto = false, classify_model returns nil always — equivalent to not setting a routing block. The heuristic functions live behind the flag.


5. Cloud Fallback Flow

In repl.ask_ai after the broker call:

local ok, err = broker.chat_stream(active_cfg, msgs, on_delta, opts)
if not ok and should_fallback(err, cfg) then
    renderer.status(("local %s failed (%s); retrying via %s")
                    :format(active_name, fallback_reason(err),
                            cfg.routing.fallback_model))
    local fb_cfg = cfg.models[cfg.routing.fallback_model]
    if fb_cfg then
        ok, err = broker.chat_stream(fb_cfg, msgs, on_delta, opts)
    end
end

should_fallback(err, cfg) matches err against fallback patterns ONLY when cfg.routing.cloud_fallback == true. Otherwise returns false.

Fallback-eligible error patterns

All patterns match against the err string AS IT ARRIVES from broker.lua, which is prefixed "transport: " for libcurl/HTTP issues (A2 confirmed). The matcher strips the prefix before testing.

Pattern (after prefix strip) Meaning
^HTTP 5%d%d server-side error (502 Bad Gateway, 503 Unavailable, 504 Timeout)
^HTTP 404.*model_not_found the routed model isn't loaded on the local backend
^HTTP 408 Request Timeout (gateway-level; some proxies emit this — Q41 resolved)
Couldn'?t resolve host DNS / unreachable local broker
Connection refused broker not listening
Timeout was reached libcurl's internal timeout phrasing
Operation timed out curl variant of timeout (libcurl version-dependent)

Errors NOT matched (NOT retried):

  • HTTP 401 / 403 (auth failure — won't get better on cloud)
  • HTTP 400 (bad request — schema issue)
  • ^api: errors (semantic — bad request shape)
  • ^broker: errors (config bug — endpoint/model missing)
  • Lua-level errors (broker pipeline bug, not transport)

6. Context Summarization on Eviction

Context.new(opts) accepts an optional summarize_fn(turns) -> string | nil closure. When set AND enforce_budget would evict, the callback is invoked with the evicted slice; the returned summary (if non-nil) replaces the rolling summary turn.

Storage shape (post-A3 resolution)

The rolling summary lives on ctx.summary (a string), NOT in ctx.turns:

ctx.summary = "Earlier conversation: user discussed X, asked about Y, "
           .. "agreed to Z. Later asked..."

to_messages() composes it into the system message between [background] and the NORRIS suffix:

DEFAULT_SYSTEM_PROMPT

[background] (memory items)
- (fact) ...

[earlier conversation summary]
<ctx.summary>

[NORRIS MODE] (if active)
...

No new role:"system" message at turns[1] — avoids system/system alternation.

Summary update flow

  1. enforce_budget identifies the oldest 2 turns to evict (user + assistant).
  2. If summarize_fn is set, call it with (prior_summary, evicted_turns).
  3. If summary text returned:
    • Replace ctx.summary with the new text.
    • If #ctx.summary > max_summary_chars (default 2000), invoke the callback once more with (ctx.summary, {}) to re-summarize for compactness. Lossy by design — Q40 documents this trade-off.
  4. Remove the evicted turns from ctx.turns.
  5. If callback returned nil → silent eviction; ctx.summary unchanged.

Failure handling

Inside the callback (in repl.lua):

local summary, err = broker.chat(summarizer_cfg, {
    {role="system", content="Summarize the following conversation in 2-3 sentences."},
    {role="user",   content=render_turns_compact(evicted)},
}, {max_tokens=300, timeout_ms=30000})
return summary  -- nil propagates; context.lua falls back to silent eviction

7. Meta Commands (Phase 5 additions)

Command Action
:route on / :route off Toggle cfg.routing.auto at runtime (overrides config)
:route classes Show the active class → model mapping
:route check <text> Print which class a given text would be routed to (debug aid)
:fallback on / :fallback off Toggle cfg.routing.cloud_fallback at runtime

:help updated.


8. Migration from Phase 4

User-visible:

  • New :route and :fallback meta commands.
  • With cfg.routing.auto, the active model may CHANGE per-request as the heuristic fires. Prompt color tag could vary (Phase 6 maybe).
  • With cfg.context.summarize_on_evict, eviction now spends a fast- model round-trip instead of silently dropping turns.

Existing configs without routing or context.summarize_on_evict continue exactly as Phase 4 — defaults are OFF.

Substrate (PHASE0.md §3) invariants: unchanged. The CMD: extraction marker, cd interception, and the entire system-prompt suffix order from Phase 4 stay the same.


9. Out of Scope (Phase 5)

Per PHASE0.md §11 these belong to Phase 6:

  • Tree-sitter syntax highlighting hooks
  • Diff-aware code injection
  • Project-level context (file tree summary)

Specifically out of Phase 5:

  • LLM-based classification (heuristics-only v1).
  • Multi-hop fallback chains (one retry only).
  • Per-class temperature overrides (use the model preset's default).
  • Cost accounting for cloud calls (Q-list candidate).
  • Auto-router learning from user :model overrides (Phase 6+).

10. Open Questions

# Question Impact Resolve by
Q37 Should routing apply to :ask <text> (explicit AI route) the same way it does to bare prompts? Yes seems obvious but worth documenting. repl.lua Phase 5 (plan)
Q38 Summary turn placement: index 1 vs index 0 context.lua Resolved at analyze (A3): NEITHER — summary lives on ctx.summary (string) and composes into the SYSTEM MESSAGE alongside [background] and NORRIS suffix. No new role:"system" message; no alternation risk.
Q39 Fallback under Norris repl.lua + safety.lua Resolved at review (R-C3): AUTO-routing does NOT fire inside the Norris loop. The model is fixed at :norris <goal> launch time; the planner stays on it for every iteration. Per-iteration fallback (if a local broker call inside Norris fails) is still gated by cfg.routing.fallback; that retries the failed call against cloud but doesn't permanently switch the planner.
Q40 Summarizer recursion: the summary itself might be summarized later when it grows past max_summary_chars. Does the re-summarize lose fidelity? Probably yes; acceptable trade-off. Note the lossy-by-design contract in §6. context.lua Phase 5 (verify)
Q41 HTTP 408 / Operation timed out eligibility repl.lua Resolved at review (R-C5): both added to §5 patterns.
Q42 Auto-router decisions inside the tool-call sub-loop: does each sub-iteration re-classify, or does the first user turn fix the model for the whole sub-loop? Proposal: fix at sub-loop entry — model switching mid-tool-call would confuse the model AND cost tokens by rebuilding context. repl.lua Phase 5 (plan)

11. Implementation Plan (commit-by-commit)

Five commits expected:

  1. router.luaclassify_model. Pure-Lua heuristics; no IO. Returns model name or nil. Module-local pattern set so tests can introspect. Test in isolation: ~30-case corpus of (input → expected class).

  2. context.lua — eviction callback. Add opts.summarize_fn, _summary index-1 turn convention, to_messages() rendering (which Just Works since _summary turns have role + content). Test in isolation: mock summarize_fn returning "(summary N)", build a context that exceeds budget, verify the summary turn appears and accumulates.

  3. repl.lua — fallback + routing wiring. Pre-broker classify_model hook (gated by cfg.routing.auto); post-error fallback retry (gated by cfg.routing.cloud_fallback); wire summarize_fn at Context.new time. Test against hossenfelder: prompt classified as "code" → routes to deep; deliberately misconfigure local endpoint → fallback fires.

  4. :route and :fallback meta commands. Standalone — config toggles via runtime cmds. End-to-end: boot, :route on, issue a query, observe routing status; :route off, query again, no routing.

  5. config.lua — routing + summarize_on_evict example. Documentation-only; commented-out example block. Final commit.

Risk / non-obvious

  • Heuristic false positives: a normal conversational question containing the word "explain" gets routed to cloud. Conservative defaults (reasoning → nil by default? then user opts in explicitly per class) might be safer. Default mapping in §4 is aggressive; tone down at plan if user prefers.
  • Active-model state after routing: the per-request routing switches active_cfg momentarily. The prompt() function reads active_name which IS reverted post-request, so the prompt label stays accurate.
  • Fallback during streaming: if the local broker fails MID-stream (e.g. emits some text then 5xx), the user has already seen partial text. Retrying via cloud means duplicated prefix. v1 only retries on errors BEFORE any deltas arrived (we can detect by tracking whether on_delta was called).
  • Summarize during Norris: Norris's planning loop generates many turns. Eviction during Norris means summarizing mid-plan — the model loses context about its earlier steps. Risky. v1 disables summarize when ctx.norris_active.
  • Memory items + summary turn: both are dynamic system-context additions. The summary is role:"system" in turns[1]; memory is the [background] block in the actual system message. Compatible — no overlap.

End of Phase 5 Manifest — aish