# aish — Phase 5 Manifest **Project:** aish — AI-augmented conversational shell **Document:** Phase 5 Requirements, Architecture & Design Decisions **Status:** Formulate (pre-analyze) **Date:** 2026-05-13 PHASE0 is the locked substrate; PHASE1-4 are layered on top. This manifest specifies what Phase 5 adds — **multi-model routing**, **cloud fallback**, and **context summarization on eviction**. --- ## 1. Scope of Phase 5 Three pillars per PHASE0.md §11 row 5: 1. **Multi-model routing by task type** — `router.lua` extended with a per-request `classify_model(text, cfg)` that suggests a model preset based on lightweight heuristics over the user input. Opt-in via `cfg.routing.auto = true`; default off (explicit `:model` stays the only switch). 2. **Cloud fallback on local failure** — when the active broker call returns `nil, err` for a transport reason that looks like "local backend down" (HTTP 502 / 503 / 404 model-not-found / libcurl connection-refused / timeout), automatically retry once against the configured `cloud` preset, surfacing a status line so the user knows what happened. Opt-in via `cfg.routing.cloud_fallback = true`; default off (single-shot only). 3. **Context summarization on eviction** — when `context.enforce_budget()` would evict the oldest turn pair, instead send those turns to the `fast` model (or `cfg.context.summarizer_model`) with "summarize these turns in 2-3 sentences", then replace them with a synthetic `role:"system"`-adjacent turn carrying the summary. Subsequent evictions append to or re-summarize the rolling summary. Opt-in via `cfg.context.summarize_on_evict = true`; default off (Phase 0 silent eviction stays the default). **Phase 5 is done when:** - With `cfg.routing.auto = true`, a prompt like "explain this Python traceback ..." gets routed to `deep` while "ls /tmp" or "what time is it?" stays on `fast` — visible status `[aish] routed to deep`. - With `cfg.routing.cloud_fallback = true`, killing the local llama.cpp upstream and asking a question yields a single retry on the cloud preset + a status line. - With `cfg.context.summarize_on_evict = true`, a long conversation that exceeds `max_turns` no longer silently drops history — the evicted span is summarized into a single rolling turn the model still sees. - Existing configs without `cfg.routing` or `cfg.context.summarize_on_evict` behave exactly like Phase 4 (Phase 4 regression coverage). --- ## 2. Technology Decisions (delta from Phase 4) | Decision | Choice | Rationale | |---|---|---| | Routing trigger | Per-request, in `repl.ask_ai`, BEFORE the broker call | Same hook point as the tool-sub-loop entry. Decision is one function call (`router.classify_model`) that returns the resolved (name, cfg) pair OR nil = keep current. | | Classification mechanism | **Pure-Lua heuristics** in `router.classify_model` — keyword/length thresholds, no LLM call | Fast (no network), deterministic, debuggable. An LLM-based classifier is overkill v1; can be added in Phase 6+ if heuristics drift. | | Routing classes (v1) | `code`, `reasoning`, `default` → mapped to model presets via `cfg.routing.classes` | Three is enough for the first cut. The defaults map `code → deep`, `reasoning → cloud`, `default → cfg.default_model`. User can remap. | | Fallback trigger | Transport-error pattern match against `err` string — HTTP 5xx, model_not_found, "Connection refused", "Couldn't resolve host", "Timeout was reached" | These are the four shapes the broker actually emits. Library-error patterns are stable enough that string-match is fine for v1. | | Fallback target | `cfg.routing.fallback_model` (default `"cloud"` when present) | One-hop fallback only; if cloud also fails, surface the error normally. No retry loops. | | Fallback announcement | Status line `[aish] local failed (); retrying via ` | Visibility — user always knows when a fallback fired. | | Summarize trigger | Inside `context.enforce_budget()`, when it would otherwise `table.remove` | Same place the eviction status fires. The summarize is a *replacement* not an addition; total turn count stays bounded. | | Summary turn shape | Single rolling `{role = "system", content = "[earlier conversation]\n", _summary = true}` turn at index 1 (after the system prompt) | One synthetic turn carries all evicted history. New evictions either *append* to it (cheap) or trigger a re-summarize when the summary itself exceeds a char cap (default 2000). | | Summary model | `cfg.context.summarizer_model` (default `"fast"`) | Same pattern as `cfg.memory.summarizer_model`. Fast model is cheap enough to summarize on every eviction. | | Summary failure handling | If broker returns nil, fall back to *silent eviction* (Phase 0 behavior) and status-log once. Don't block the user's main request. | Best-effort; never let summarization break the REPL. | --- ## 3. Module Changes | File | State after Phase 4 | Phase 5 changes | |---|---|---| | `router.lua` | `classify(line, config)` → `(kind, payload)` for shell/AI/meta dispatch | Add `M.classify_model(text, cfg) -> name | nil`. Heuristics: line length > N, presence of code-fence backticks, keywords like "traceback", "stacktrace", "explain", "why does", etc. Returns the model NAME (string) or nil = keep current. | | `context.lua` | turns + memory_items + Norris suffix | Extend `enforce_budget()` to invoke a callback (passed in via `Context.new(opts.summarize_fn)`) when about to evict. If callback returns text, prepend a `{role="system", _summary=true, content=...}` turn (replacing prior `_summary` turn if present). If callback returns nil, fall back to silent eviction. The summarize callback itself lives in repl.lua (because it needs broker.chat). | | `repl.lua` | tool-sub-loop + meta + memory injection | (a) Pre-broker hook: if `cfg.routing.auto`, call `router.classify_model(text, cfg)` and switch `active_cfg` for THIS request only (revert after). (b) Post-broker error hook: if err matches a fallback pattern AND `cfg.routing.cloud_fallback`, retry against the fallback model once. (c) Wire `Context.new` with a `summarize_fn = function(turns) ... end` closure that calls `broker.chat(cfg.models[cfg.context.summarizer_model], ..., {max_tokens=300})`. | | `broker.lua` | streaming + opts.tools/max_tokens/timeout_ms | Unchanged — Phase 5 composes on top of the existing surface. | | `config.lua` | example with mcp/safety/memory blocks | Add commented-out `routing = {...}` and `context.summarize_on_evict = true` example. | No new module files. All Phase 5 functionality grows existing files — mostly `repl.lua` and `router.lua`. --- ## 4. Routing Heuristics (v1) `router.classify_model(text, cfg)` returns a model NAME (looked up in `cfg.routing.classes`) or `nil` (use the user-set active model). Heuristics, in order — first hit wins: 1. **Code class** if any of: - Triple-backtick code fence anywhere - Token "traceback" / "stacktrace" / "stack trace" (case-insensitive) - Token "error:" or "exception:" near beginning - Text contains a path-like `./|/usr|~/` + `.py|.lua|.c|.js|.go|.rs` - More than 4 lines AND has indentation (looks like a paste) 2. **Reasoning class** if any of: - Token "explain" / "why" / "how does" / "compare" - Question mark + > 100 chars total 3. **Default class** otherwise. Each class maps to a model name via `cfg.routing.classes`: ```lua routing = { auto = true, classes = { code = "deep", -- code questions to deep reasoning = "cloud", -- reasoning to cloud (best quality) default = nil, -- nil = keep current active model }, cloud_fallback = true, fallback_model = "cloud", } ``` When `auto = false`, `classify_model` returns nil always — equivalent to not setting a routing block. The heuristic functions live behind the flag. --- ## 5. Cloud Fallback Flow In `repl.ask_ai` after the broker call: ```lua local ok, err = broker.chat_stream(active_cfg, msgs, on_delta, opts) if not ok and should_fallback(err, cfg) then renderer.status(("local %s failed (%s); retrying via %s") :format(active_name, fallback_reason(err), cfg.routing.fallback_model)) local fb_cfg = cfg.models[cfg.routing.fallback_model] if fb_cfg then ok, err = broker.chat_stream(fb_cfg, msgs, on_delta, opts) end end ``` `should_fallback(err, cfg)` matches `err` against fallback patterns ONLY when `cfg.routing.cloud_fallback == true`. Otherwise returns false. ### Fallback-eligible error patterns | Pattern | Meaning | |---|---| | `HTTP 5%d%d` | server-side error (502 Bad Gateway, 503 Unavailable, 504 Timeout) | | `HTTP 404.*model_not_found` | the routed model isn't loaded on the local backend | | `Couldn'?t resolve host` | DNS / unreachable local broker | | `Connection refused` | broker not listening | | `Timeout was reached` | broker too slow | Errors NOT matched (and therefore NOT retried): - HTTP 401 / 403 (auth failure — won't get better on cloud) - HTTP 400 (bad request — schema issue) - Lua-level errors (broker pipeline bug, not transport) --- ## 6. Context Summarization on Eviction `Context.new(opts)` accepts an optional `summarize_fn(turns) -> string | nil` closure. When set AND `enforce_budget` would evict, the callback is invoked with the evicted slice; the returned summary (if non-nil) replaces the rolling summary turn. ### Storage shape Index 1 of `ctx.turns` is reserved for the summary turn when present: ```lua { role = "system", _summary = true, content = "[earlier conversation summary]\n" } ``` `to_messages()` renders this normally (system role; model sees it). ### Summary update flow 1. enforce_budget identifies the oldest 2 turns to evict (user + assistant). 2. If `summarize_fn` is set, call it with those 2 turns. 3. If summary text returned: - If `ctx.turns[1]._summary` exists, append to its content (truncate at `max_summary_chars` default 2000 — re-summarize via same callback if exceeded). - Else insert a new `_summary` turn at index 1. 4. Remove the evicted turns from `ctx.turns`. 5. If callback returned nil → silent eviction (Phase 0 behavior). ### Failure handling Inside the callback (in `repl.lua`): ```lua local summary, err = broker.chat(summarizer_cfg, { {role="system", content="Summarize the following conversation in 2-3 sentences."}, {role="user", content=render_turns_compact(evicted)}, }, {max_tokens=300, timeout_ms=30000}) return summary -- nil propagates; context.lua falls back to silent eviction ``` --- ## 7. Meta Commands (Phase 5 additions) | Command | Action | |---|---| | `:route on` / `:route off` | Toggle `cfg.routing.auto` at runtime (overrides config) | | `:route classes` | Show the active class → model mapping | | `:route check ` | Print which class a given text would be routed to (debug aid) | | `:fallback on` / `:fallback off` | Toggle `cfg.routing.cloud_fallback` at runtime | `:help` updated. --- ## 8. Migration from Phase 4 User-visible: - New `:route` and `:fallback` meta commands. - With `cfg.routing.auto`, the active model may CHANGE per-request as the heuristic fires. Prompt color tag could vary (Phase 6 maybe). - With `cfg.context.summarize_on_evict`, eviction now spends a fast- model round-trip instead of silently dropping turns. Existing configs without `routing` or `context.summarize_on_evict` continue exactly as Phase 4 — defaults are OFF. Substrate (PHASE0.md §3) invariants: unchanged. The `CMD:` extraction marker, `cd` interception, and the entire system-prompt suffix order from Phase 4 stay the same. --- ## 9. Out of Scope (Phase 5) Per PHASE0.md §11 these belong to Phase 6: - Tree-sitter syntax highlighting hooks - Diff-aware code injection - Project-level context (file tree summary) Specifically out of Phase 5: - LLM-based classification (heuristics-only v1). - Multi-hop fallback chains (one retry only). - Per-class temperature overrides (use the model preset's default). - Cost accounting for cloud calls (Q-list candidate). - Auto-router learning from user `:model` overrides (Phase 6+). --- ## 10. Open Questions | # | Question | Impact | Resolve by | |---|---|---|---| | Q37 | Should routing apply to `:ask ` (explicit AI route) the same way it does to bare prompts? Yes seems obvious but worth documenting. | repl.lua | Phase 5 (plan) | | Q38 | Summary turn placement: index 1 (right after a putative system prompt — but to_messages prepends the system prompt fresh each call) vs index 0 in self.turns. Index 1 in self.turns means the system message comes from `to_messages()` and the `_summary` turn lives next as a `role:"system"` message. Strict templates may reject system/system back-to-back. | context.lua | Phase 5 (analyze) | | Q39 | When fallback fires AND the user is in Norris mode, does the fallback model also drive the planning loop? Or single-request-only fallback? v1 says single-request only (Norris stays on its configured model). | repl.lua + safety.lua | Phase 5 (plan) | | Q40 | Summarizer recursion: the summary itself might be summarized later when it grows past max_summary_chars. Does the re-summarize lose fidelity? Probably yes; acceptable trade-off. Note the lossy-by-design contract in §6. | context.lua | Phase 5 (verify) | | Q41 | Eligibility patterns for fallback: should HTTP 408 (Request Timeout, distinct from Timeout was reached) be matched? Some servers emit it. Default: yes — pattern `HTTP 408`. Phase 7 verify will adjust. | repl.lua | Phase 7 (verify) | | Q42 | Auto-router decisions inside the tool-call sub-loop: does each sub-iteration re-classify, or does the first user turn fix the model for the whole sub-loop? Proposal: fix at sub-loop entry — model switching mid-tool-call would confuse the model AND cost tokens by rebuilding context. | repl.lua | Phase 5 (plan) | --- ## 11. Implementation Plan (commit-by-commit) Five commits expected: 1. **`router.lua` — `classify_model`.** Pure-Lua heuristics; no IO. Returns model name or nil. Module-local pattern set so tests can introspect. **Test in isolation**: ~30-case corpus of (input → expected class). 2. **`context.lua` — eviction callback.** Add `opts.summarize_fn`, `_summary` index-1 turn convention, `to_messages()` rendering (which Just Works since `_summary` turns have `role` + `content`). **Test in isolation**: mock summarize_fn returning "(summary N)", build a context that exceeds budget, verify the summary turn appears and accumulates. 3. **`repl.lua` — fallback + routing wiring.** Pre-broker classify_model hook (gated by cfg.routing.auto); post-error fallback retry (gated by cfg.routing.cloud_fallback); wire summarize_fn at Context.new time. **Test against hossenfelder**: prompt classified as "code" → routes to deep; deliberately misconfigure local endpoint → fallback fires. 4. **`:route` and `:fallback` meta commands.** Standalone — config toggles via runtime cmds. **End-to-end**: boot, `:route on`, issue a query, observe routing status; `:route off`, query again, no routing. 5. **`config.lua` — routing + summarize_on_evict example.** Documentation-only; commented-out example block. Final commit. ### Risk / non-obvious - **Heuristic false positives**: a normal conversational question containing the word "explain" gets routed to cloud. Conservative defaults (`reasoning → nil` by default? then user opts in explicitly per class) might be safer. Default mapping in §4 is aggressive; tone down at plan if user prefers. - **Active-model state after routing**: the per-request routing switches `active_cfg` momentarily. The `prompt()` function reads `active_name` which IS reverted post-request, so the prompt label stays accurate. - **Fallback during streaming**: if the local broker fails MID-stream (e.g. emits some text then 5xx), the user has already seen partial text. Retrying via cloud means duplicated prefix. v1 only retries on errors BEFORE any deltas arrived (we can detect by tracking whether on_delta was called). - **Summarize during Norris**: Norris's planning loop generates many turns. Eviction during Norris means summarizing mid-plan — the model loses context about its earlier steps. Risky. v1 disables summarize when ctx.norris_active. - **Memory items + summary turn**: both are dynamic system-context additions. The summary is `role:"system"` in turns[1]; memory is the `[background]` block in the actual system message. Compatible — no overlap. --- *End of Phase 5 Manifest — aish*