From 4453b93ab586e66b89e87d5cee1b078375be50bb Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Wed, 13 May 2026 11:11:26 +0000 Subject: [PATCH] =?UTF-8?q?docs/PHASE5:=20formulate=20=E2=80=94=20multi-mo?= =?UTF-8?q?del=20routing=20+=20cloud=20fallback=20+=20summarize-on-evict?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 5 formulate manifest. Three pillars per PHASE0 §11 row 5: heuristic-based per-request model routing, single-hop cloud fallback on local transport failure, and fast-model summarization at sliding- window eviction time. Resolutions baked in via §2: - Routing trigger: per-request in repl.ask_ai, gated by cfg.routing.auto (default off) - Classification: pure-Lua heuristics (length, keywords, code-fence detection, exception markers) — no LLM probe in v1 - Classes: code → deep, reasoning → cloud, default → keep active - Fallback trigger: string-match on err for HTTP 5xx / model_not_found / "Connection refused" / DNS / timeout - Fallback: one retry against cfg.routing.fallback_model (default "cloud" if configured); status line on every retry - Summarize: enforce_budget invokes summarize_fn callback wired by repl.lua to broker.chat with the fast model - Summary turn: single rolling _summary at turns[1], appended to on each eviction, re-summarized when it exceeds max_summary_chars Open questions (Q37-Q42) in §10: Q37 routing for :ask explicit ask Q38 summary turn vs system-role alternation Q39 fallback under Norris (proposal: single-request only) Q40 summary re-summarize fidelity loss (lossy by design) Q41 HTTP 408 pattern eligibility (default yes) Q42 routing inside tool-call sub-loop (proposal: fix at entry) 5-commit roadmap in §11. No new module files; mostly repl.lua and router.lua growth. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/PHASE5.md | 329 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 329 insertions(+) create mode 100644 docs/PHASE5.md diff --git a/docs/PHASE5.md b/docs/PHASE5.md new file mode 100644 index 0000000..fae4f8c --- /dev/null +++ b/docs/PHASE5.md @@ -0,0 +1,329 @@ +# aish — Phase 5 Manifest + +**Project:** aish — AI-augmented conversational shell +**Document:** Phase 5 Requirements, Architecture & Design Decisions +**Status:** Formulate (pre-analyze) +**Date:** 2026-05-13 + +PHASE0 is the locked substrate; PHASE1-4 are layered on top. This manifest +specifies what Phase 5 adds — **multi-model routing**, **cloud fallback**, +and **context summarization on eviction**. + +--- + +## 1. Scope of Phase 5 + +Three pillars per PHASE0.md §11 row 5: + +1. **Multi-model routing by task type** — `router.lua` extended with a + per-request `classify_model(text, cfg)` that suggests a model + preset based on lightweight heuristics over the user input. + Opt-in via `cfg.routing.auto = true`; default off (explicit `:model` + stays the only switch). + +2. **Cloud fallback on local failure** — when the active broker call + returns `nil, err` for a transport reason that looks like + "local backend down" (HTTP 502 / 503 / 404 model-not-found / + libcurl connection-refused / timeout), automatically retry once + against the configured `cloud` preset, surfacing a status line so + the user knows what happened. Opt-in via `cfg.routing.cloud_fallback = true`; + default off (single-shot only). + +3. **Context summarization on eviction** — when + `context.enforce_budget()` would evict the oldest turn pair, instead + send those turns to the `fast` model (or `cfg.context.summarizer_model`) + with "summarize these turns in 2-3 sentences", then replace them + with a synthetic `role:"system"`-adjacent turn carrying the summary. + Subsequent evictions append to or re-summarize the rolling summary. + Opt-in via `cfg.context.summarize_on_evict = true`; default off + (Phase 0 silent eviction stays the default). + +**Phase 5 is done when:** + +- With `cfg.routing.auto = true`, a prompt like "explain this Python + traceback ..." gets routed to `deep` while "ls /tmp" or "what time + is it?" stays on `fast` — visible status `[aish] routed to deep`. +- With `cfg.routing.cloud_fallback = true`, killing the local + llama.cpp upstream and asking a question yields a single retry on + the cloud preset + a status line. +- With `cfg.context.summarize_on_evict = true`, a long conversation + that exceeds `max_turns` no longer silently drops history — the + evicted span is summarized into a single rolling turn the model + still sees. +- Existing configs without `cfg.routing` or `cfg.context.summarize_on_evict` + behave exactly like Phase 4 (Phase 4 regression coverage). + +--- + +## 2. Technology Decisions (delta from Phase 4) + +| Decision | Choice | Rationale | +|---|---|---| +| Routing trigger | Per-request, in `repl.ask_ai`, BEFORE the broker call | Same hook point as the tool-sub-loop entry. Decision is one function call (`router.classify_model`) that returns the resolved (name, cfg) pair OR nil = keep current. | +| Classification mechanism | **Pure-Lua heuristics** in `router.classify_model` — keyword/length thresholds, no LLM call | Fast (no network), deterministic, debuggable. An LLM-based classifier is overkill v1; can be added in Phase 6+ if heuristics drift. | +| Routing classes (v1) | `code`, `reasoning`, `default` → mapped to model presets via `cfg.routing.classes` | Three is enough for the first cut. The defaults map `code → deep`, `reasoning → cloud`, `default → cfg.default_model`. User can remap. | +| Fallback trigger | Transport-error pattern match against `err` string — HTTP 5xx, model_not_found, "Connection refused", "Couldn't resolve host", "Timeout was reached" | These are the four shapes the broker actually emits. Library-error patterns are stable enough that string-match is fine for v1. | +| Fallback target | `cfg.routing.fallback_model` (default `"cloud"` when present) | One-hop fallback only; if cloud also fails, surface the error normally. No retry loops. | +| Fallback announcement | Status line `[aish] local failed (); retrying via ` | Visibility — user always knows when a fallback fired. | +| Summarize trigger | Inside `context.enforce_budget()`, when it would otherwise `table.remove` | Same place the eviction status fires. The summarize is a *replacement* not an addition; total turn count stays bounded. | +| Summary turn shape | Single rolling `{role = "system", content = "[earlier conversation]\n", _summary = true}` turn at index 1 (after the system prompt) | One synthetic turn carries all evicted history. New evictions either *append* to it (cheap) or trigger a re-summarize when the summary itself exceeds a char cap (default 2000). | +| Summary model | `cfg.context.summarizer_model` (default `"fast"`) | Same pattern as `cfg.memory.summarizer_model`. Fast model is cheap enough to summarize on every eviction. | +| Summary failure handling | If broker returns nil, fall back to *silent eviction* (Phase 0 behavior) and status-log once. Don't block the user's main request. | Best-effort; never let summarization break the REPL. | + +--- + +## 3. Module Changes + +| File | State after Phase 4 | Phase 5 changes | +|---|---|---| +| `router.lua` | `classify(line, config)` → `(kind, payload)` for shell/AI/meta dispatch | Add `M.classify_model(text, cfg) -> name | nil`. Heuristics: line length > N, presence of code-fence backticks, keywords like "traceback", "stacktrace", "explain", "why does", etc. Returns the model NAME (string) or nil = keep current. | +| `context.lua` | turns + memory_items + Norris suffix | Extend `enforce_budget()` to invoke a callback (passed in via `Context.new(opts.summarize_fn)`) when about to evict. If callback returns text, prepend a `{role="system", _summary=true, content=...}` turn (replacing prior `_summary` turn if present). If callback returns nil, fall back to silent eviction. The summarize callback itself lives in repl.lua (because it needs broker.chat). | +| `repl.lua` | tool-sub-loop + meta + memory injection | (a) Pre-broker hook: if `cfg.routing.auto`, call `router.classify_model(text, cfg)` and switch `active_cfg` for THIS request only (revert after). (b) Post-broker error hook: if err matches a fallback pattern AND `cfg.routing.cloud_fallback`, retry against the fallback model once. (c) Wire `Context.new` with a `summarize_fn = function(turns) ... end` closure that calls `broker.chat(cfg.models[cfg.context.summarizer_model], ..., {max_tokens=300})`. | +| `broker.lua` | streaming + opts.tools/max_tokens/timeout_ms | Unchanged — Phase 5 composes on top of the existing surface. | +| `config.lua` | example with mcp/safety/memory blocks | Add commented-out `routing = {...}` and `context.summarize_on_evict = true` example. | + +No new module files. All Phase 5 functionality grows existing files — +mostly `repl.lua` and `router.lua`. + +--- + +## 4. Routing Heuristics (v1) + +`router.classify_model(text, cfg)` returns a model NAME (looked up in +`cfg.routing.classes`) or `nil` (use the user-set active model). + +Heuristics, in order — first hit wins: + +1. **Code class** if any of: + - Triple-backtick code fence anywhere + - Token "traceback" / "stacktrace" / "stack trace" (case-insensitive) + - Token "error:" or "exception:" near beginning + - Text contains a path-like `./|/usr|~/` + `.py|.lua|.c|.js|.go|.rs` + - More than 4 lines AND has indentation (looks like a paste) + +2. **Reasoning class** if any of: + - Token "explain" / "why" / "how does" / "compare" + - Question mark + > 100 chars total + +3. **Default class** otherwise. + +Each class maps to a model name via `cfg.routing.classes`: + +```lua +routing = { + auto = true, + classes = { + code = "deep", -- code questions to deep + reasoning = "cloud", -- reasoning to cloud (best quality) + default = nil, -- nil = keep current active model + }, + cloud_fallback = true, + fallback_model = "cloud", +} +``` + +When `auto = false`, `classify_model` returns nil always — equivalent to +not setting a routing block. The heuristic functions live behind the +flag. + +--- + +## 5. Cloud Fallback Flow + +In `repl.ask_ai` after the broker call: + +```lua +local ok, err = broker.chat_stream(active_cfg, msgs, on_delta, opts) +if not ok and should_fallback(err, cfg) then + renderer.status(("local %s failed (%s); retrying via %s") + :format(active_name, fallback_reason(err), + cfg.routing.fallback_model)) + local fb_cfg = cfg.models[cfg.routing.fallback_model] + if fb_cfg then + ok, err = broker.chat_stream(fb_cfg, msgs, on_delta, opts) + end +end +``` + +`should_fallback(err, cfg)` matches `err` against fallback patterns +ONLY when `cfg.routing.cloud_fallback == true`. Otherwise returns false. + +### Fallback-eligible error patterns + +| Pattern | Meaning | +|---|---| +| `HTTP 5%d%d` | server-side error (502 Bad Gateway, 503 Unavailable, 504 Timeout) | +| `HTTP 404.*model_not_found` | the routed model isn't loaded on the local backend | +| `Couldn'?t resolve host` | DNS / unreachable local broker | +| `Connection refused` | broker not listening | +| `Timeout was reached` | broker too slow | + +Errors NOT matched (and therefore NOT retried): +- HTTP 401 / 403 (auth failure — won't get better on cloud) +- HTTP 400 (bad request — schema issue) +- Lua-level errors (broker pipeline bug, not transport) + +--- + +## 6. Context Summarization on Eviction + +`Context.new(opts)` accepts an optional `summarize_fn(turns) -> string | +nil` closure. When set AND `enforce_budget` would evict, the callback +is invoked with the evicted slice; the returned summary (if non-nil) +replaces the rolling summary turn. + +### Storage shape + +Index 1 of `ctx.turns` is reserved for the summary turn when present: + +```lua +{ role = "system", _summary = true, + content = "[earlier conversation summary]\n" } +``` + +`to_messages()` renders this normally (system role; model sees it). + +### Summary update flow + +1. enforce_budget identifies the oldest 2 turns to evict (user + assistant). +2. If `summarize_fn` is set, call it with those 2 turns. +3. If summary text returned: + - If `ctx.turns[1]._summary` exists, append to its content + (truncate at `max_summary_chars` default 2000 — re-summarize via + same callback if exceeded). + - Else insert a new `_summary` turn at index 1. +4. Remove the evicted turns from `ctx.turns`. +5. If callback returned nil → silent eviction (Phase 0 behavior). + +### Failure handling + +Inside the callback (in `repl.lua`): + +```lua +local summary, err = broker.chat(summarizer_cfg, { + {role="system", content="Summarize the following conversation in 2-3 sentences."}, + {role="user", content=render_turns_compact(evicted)}, +}, {max_tokens=300, timeout_ms=30000}) +return summary -- nil propagates; context.lua falls back to silent eviction +``` + +--- + +## 7. Meta Commands (Phase 5 additions) + +| Command | Action | +|---|---| +| `:route on` / `:route off` | Toggle `cfg.routing.auto` at runtime (overrides config) | +| `:route classes` | Show the active class → model mapping | +| `:route check ` | Print which class a given text would be routed to (debug aid) | +| `:fallback on` / `:fallback off` | Toggle `cfg.routing.cloud_fallback` at runtime | + +`:help` updated. + +--- + +## 8. Migration from Phase 4 + +User-visible: +- New `:route` and `:fallback` meta commands. +- With `cfg.routing.auto`, the active model may CHANGE per-request as + the heuristic fires. Prompt color tag could vary (Phase 6 maybe). +- With `cfg.context.summarize_on_evict`, eviction now spends a fast- + model round-trip instead of silently dropping turns. + +Existing configs without `routing` or `context.summarize_on_evict` +continue exactly as Phase 4 — defaults are OFF. + +Substrate (PHASE0.md §3) invariants: unchanged. The `CMD:` extraction +marker, `cd` interception, and the entire system-prompt suffix order +from Phase 4 stay the same. + +--- + +## 9. Out of Scope (Phase 5) + +Per PHASE0.md §11 these belong to Phase 6: +- Tree-sitter syntax highlighting hooks +- Diff-aware code injection +- Project-level context (file tree summary) + +Specifically out of Phase 5: +- LLM-based classification (heuristics-only v1). +- Multi-hop fallback chains (one retry only). +- Per-class temperature overrides (use the model preset's default). +- Cost accounting for cloud calls (Q-list candidate). +- Auto-router learning from user `:model` overrides (Phase 6+). + +--- + +## 10. Open Questions + +| # | Question | Impact | Resolve by | +|---|---|---|---| +| Q37 | Should routing apply to `:ask ` (explicit AI route) the same way it does to bare prompts? Yes seems obvious but worth documenting. | repl.lua | Phase 5 (plan) | +| Q38 | Summary turn placement: index 1 (right after a putative system prompt — but to_messages prepends the system prompt fresh each call) vs index 0 in self.turns. Index 1 in self.turns means the system message comes from `to_messages()` and the `_summary` turn lives next as a `role:"system"` message. Strict templates may reject system/system back-to-back. | context.lua | Phase 5 (analyze) | +| Q39 | When fallback fires AND the user is in Norris mode, does the fallback model also drive the planning loop? Or single-request-only fallback? v1 says single-request only (Norris stays on its configured model). | repl.lua + safety.lua | Phase 5 (plan) | +| Q40 | Summarizer recursion: the summary itself might be summarized later when it grows past max_summary_chars. Does the re-summarize lose fidelity? Probably yes; acceptable trade-off. Note the lossy-by-design contract in §6. | context.lua | Phase 5 (verify) | +| Q41 | Eligibility patterns for fallback: should HTTP 408 (Request Timeout, distinct from Timeout was reached) be matched? Some servers emit it. Default: yes — pattern `HTTP 408`. Phase 7 verify will adjust. | repl.lua | Phase 7 (verify) | +| Q42 | Auto-router decisions inside the tool-call sub-loop: does each sub-iteration re-classify, or does the first user turn fix the model for the whole sub-loop? Proposal: fix at sub-loop entry — model switching mid-tool-call would confuse the model AND cost tokens by rebuilding context. | repl.lua | Phase 5 (plan) | + +--- + +## 11. Implementation Plan (commit-by-commit) + +Five commits expected: + +1. **`router.lua` — `classify_model`.** Pure-Lua heuristics; no IO. Returns + model name or nil. Module-local pattern set so tests can introspect. + **Test in isolation**: ~30-case corpus of (input → expected class). + +2. **`context.lua` — eviction callback.** Add `opts.summarize_fn`, + `_summary` index-1 turn convention, `to_messages()` rendering + (which Just Works since `_summary` turns have `role` + `content`). + **Test in isolation**: mock summarize_fn returning "(summary N)", + build a context that exceeds budget, verify the summary turn + appears and accumulates. + +3. **`repl.lua` — fallback + routing wiring.** Pre-broker + classify_model hook (gated by cfg.routing.auto); post-error + fallback retry (gated by cfg.routing.cloud_fallback); wire + summarize_fn at Context.new time. **Test against hossenfelder**: + prompt classified as "code" → routes to deep; deliberately + misconfigure local endpoint → fallback fires. + +4. **`:route` and `:fallback` meta commands.** Standalone — config + toggles via runtime cmds. **End-to-end**: boot, `:route on`, + issue a query, observe routing status; `:route off`, query + again, no routing. + +5. **`config.lua` — routing + summarize_on_evict example.** + Documentation-only; commented-out example block. Final commit. + +### Risk / non-obvious + +- **Heuristic false positives**: a normal conversational question + containing the word "explain" gets routed to cloud. Conservative + defaults (`reasoning → nil` by default? then user opts in + explicitly per class) might be safer. Default mapping in §4 is + aggressive; tone down at plan if user prefers. +- **Active-model state after routing**: the per-request routing + switches `active_cfg` momentarily. The `prompt()` function reads + `active_name` which IS reverted post-request, so the prompt label + stays accurate. +- **Fallback during streaming**: if the local broker fails MID-stream + (e.g. emits some text then 5xx), the user has already seen partial + text. Retrying via cloud means duplicated prefix. v1 only retries + on errors BEFORE any deltas arrived (we can detect by tracking + whether on_delta was called). +- **Summarize during Norris**: Norris's planning loop generates many + turns. Eviction during Norris means summarizing mid-plan — the + model loses context about its earlier steps. Risky. v1 disables + summarize when ctx.norris_active. +- **Memory items + summary turn**: both are dynamic system-context + additions. The summary is `role:"system"` in turns[1]; memory + is the `[background]` block in the actual system message. + Compatible — no overlap. + +--- + +*End of Phase 5 Manifest — aish*