docs/PHASE5: formulate — multi-model routing + cloud fallback + summarize-on-evict
Phase 5 formulate manifest. Three pillars per PHASE0 §11 row 5:
heuristic-based per-request model routing, single-hop cloud fallback
on local transport failure, and fast-model summarization at sliding-
window eviction time.
Resolutions baked in via §2:
- Routing trigger: per-request in repl.ask_ai, gated by
cfg.routing.auto (default off)
- Classification: pure-Lua heuristics (length, keywords, code-fence
detection, exception markers) — no LLM probe in v1
- Classes: code → deep, reasoning → cloud, default → keep active
- Fallback trigger: string-match on err for HTTP 5xx /
model_not_found / "Connection refused" / DNS / timeout
- Fallback: one retry against cfg.routing.fallback_model (default
"cloud" if configured); status line on every retry
- Summarize: enforce_budget invokes summarize_fn callback wired
by repl.lua to broker.chat with the fast model
- Summary turn: single rolling _summary at turns[1], appended to
on each eviction, re-summarized when it exceeds max_summary_chars
Open questions (Q37-Q42) in §10:
Q37 routing for :ask explicit ask
Q38 summary turn vs system-role alternation
Q39 fallback under Norris (proposal: single-request only)
Q40 summary re-summarize fidelity loss (lossy by design)
Q41 HTTP 408 pattern eligibility (default yes)
Q42 routing inside tool-call sub-loop (proposal: fix at entry)
5-commit roadmap in §11. No new module files; mostly repl.lua and
router.lua growth.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+329
@@ -0,0 +1,329 @@
|
||||
# aish — Phase 5 Manifest
|
||||
|
||||
**Project:** aish — AI-augmented conversational shell
|
||||
**Document:** Phase 5 Requirements, Architecture & Design Decisions
|
||||
**Status:** Formulate (pre-analyze)
|
||||
**Date:** 2026-05-13
|
||||
|
||||
PHASE0 is the locked substrate; PHASE1-4 are layered on top. This manifest
|
||||
specifies what Phase 5 adds — **multi-model routing**, **cloud fallback**,
|
||||
and **context summarization on eviction**.
|
||||
|
||||
---
|
||||
|
||||
## 1. Scope of Phase 5
|
||||
|
||||
Three pillars per PHASE0.md §11 row 5:
|
||||
|
||||
1. **Multi-model routing by task type** — `router.lua` extended with a
|
||||
per-request `classify_model(text, cfg)` that suggests a model
|
||||
preset based on lightweight heuristics over the user input.
|
||||
Opt-in via `cfg.routing.auto = true`; default off (explicit `:model`
|
||||
stays the only switch).
|
||||
|
||||
2. **Cloud fallback on local failure** — when the active broker call
|
||||
returns `nil, err` for a transport reason that looks like
|
||||
"local backend down" (HTTP 502 / 503 / 404 model-not-found /
|
||||
libcurl connection-refused / timeout), automatically retry once
|
||||
against the configured `cloud` preset, surfacing a status line so
|
||||
the user knows what happened. Opt-in via `cfg.routing.cloud_fallback = true`;
|
||||
default off (single-shot only).
|
||||
|
||||
3. **Context summarization on eviction** — when
|
||||
`context.enforce_budget()` would evict the oldest turn pair, instead
|
||||
send those turns to the `fast` model (or `cfg.context.summarizer_model`)
|
||||
with "summarize these turns in 2-3 sentences", then replace them
|
||||
with a synthetic `role:"system"`-adjacent turn carrying the summary.
|
||||
Subsequent evictions append to or re-summarize the rolling summary.
|
||||
Opt-in via `cfg.context.summarize_on_evict = true`; default off
|
||||
(Phase 0 silent eviction stays the default).
|
||||
|
||||
**Phase 5 is done when:**
|
||||
|
||||
- With `cfg.routing.auto = true`, a prompt like "explain this Python
|
||||
traceback ..." gets routed to `deep` while "ls /tmp" or "what time
|
||||
is it?" stays on `fast` — visible status `[aish] routed to deep`.
|
||||
- With `cfg.routing.cloud_fallback = true`, killing the local
|
||||
llama.cpp upstream and asking a question yields a single retry on
|
||||
the cloud preset + a status line.
|
||||
- With `cfg.context.summarize_on_evict = true`, a long conversation
|
||||
that exceeds `max_turns` no longer silently drops history — the
|
||||
evicted span is summarized into a single rolling turn the model
|
||||
still sees.
|
||||
- Existing configs without `cfg.routing` or `cfg.context.summarize_on_evict`
|
||||
behave exactly like Phase 4 (Phase 4 regression coverage).
|
||||
|
||||
---
|
||||
|
||||
## 2. Technology Decisions (delta from Phase 4)
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|---|---|---|
|
||||
| Routing trigger | Per-request, in `repl.ask_ai`, BEFORE the broker call | Same hook point as the tool-sub-loop entry. Decision is one function call (`router.classify_model`) that returns the resolved (name, cfg) pair OR nil = keep current. |
|
||||
| Classification mechanism | **Pure-Lua heuristics** in `router.classify_model` — keyword/length thresholds, no LLM call | Fast (no network), deterministic, debuggable. An LLM-based classifier is overkill v1; can be added in Phase 6+ if heuristics drift. |
|
||||
| Routing classes (v1) | `code`, `reasoning`, `default` → mapped to model presets via `cfg.routing.classes` | Three is enough for the first cut. The defaults map `code → deep`, `reasoning → cloud`, `default → cfg.default_model`. User can remap. |
|
||||
| Fallback trigger | Transport-error pattern match against `err` string — HTTP 5xx, model_not_found, "Connection refused", "Couldn't resolve host", "Timeout was reached" | These are the four shapes the broker actually emits. Library-error patterns are stable enough that string-match is fine for v1. |
|
||||
| Fallback target | `cfg.routing.fallback_model` (default `"cloud"` when present) | One-hop fallback only; if cloud also fails, surface the error normally. No retry loops. |
|
||||
| Fallback announcement | Status line `[aish] local <name> failed (<reason>); retrying via <fallback_name>` | Visibility — user always knows when a fallback fired. |
|
||||
| Summarize trigger | Inside `context.enforce_budget()`, when it would otherwise `table.remove` | Same place the eviction status fires. The summarize is a *replacement* not an addition; total turn count stays bounded. |
|
||||
| Summary turn shape | Single rolling `{role = "system", content = "[earlier conversation]\n<summary>", _summary = true}` turn at index 1 (after the system prompt) | One synthetic turn carries all evicted history. New evictions either *append* to it (cheap) or trigger a re-summarize when the summary itself exceeds a char cap (default 2000). |
|
||||
| Summary model | `cfg.context.summarizer_model` (default `"fast"`) | Same pattern as `cfg.memory.summarizer_model`. Fast model is cheap enough to summarize on every eviction. |
|
||||
| Summary failure handling | If broker returns nil, fall back to *silent eviction* (Phase 0 behavior) and status-log once. Don't block the user's main request. | Best-effort; never let summarization break the REPL. |
|
||||
|
||||
---
|
||||
|
||||
## 3. Module Changes
|
||||
|
||||
| File | State after Phase 4 | Phase 5 changes |
|
||||
|---|---|---|
|
||||
| `router.lua` | `classify(line, config)` → `(kind, payload)` for shell/AI/meta dispatch | Add `M.classify_model(text, cfg) -> name | nil`. Heuristics: line length > N, presence of code-fence backticks, keywords like "traceback", "stacktrace", "explain", "why does", etc. Returns the model NAME (string) or nil = keep current. |
|
||||
| `context.lua` | turns + memory_items + Norris suffix | Extend `enforce_budget()` to invoke a callback (passed in via `Context.new(opts.summarize_fn)`) when about to evict. If callback returns text, prepend a `{role="system", _summary=true, content=...}` turn (replacing prior `_summary` turn if present). If callback returns nil, fall back to silent eviction. The summarize callback itself lives in repl.lua (because it needs broker.chat). |
|
||||
| `repl.lua` | tool-sub-loop + meta + memory injection | (a) Pre-broker hook: if `cfg.routing.auto`, call `router.classify_model(text, cfg)` and switch `active_cfg` for THIS request only (revert after). (b) Post-broker error hook: if err matches a fallback pattern AND `cfg.routing.cloud_fallback`, retry against the fallback model once. (c) Wire `Context.new` with a `summarize_fn = function(turns) ... end` closure that calls `broker.chat(cfg.models[cfg.context.summarizer_model], ..., {max_tokens=300})`. |
|
||||
| `broker.lua` | streaming + opts.tools/max_tokens/timeout_ms | Unchanged — Phase 5 composes on top of the existing surface. |
|
||||
| `config.lua` | example with mcp/safety/memory blocks | Add commented-out `routing = {...}` and `context.summarize_on_evict = true` example. |
|
||||
|
||||
No new module files. All Phase 5 functionality grows existing files —
|
||||
mostly `repl.lua` and `router.lua`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Routing Heuristics (v1)
|
||||
|
||||
`router.classify_model(text, cfg)` returns a model NAME (looked up in
|
||||
`cfg.routing.classes`) or `nil` (use the user-set active model).
|
||||
|
||||
Heuristics, in order — first hit wins:
|
||||
|
||||
1. **Code class** if any of:
|
||||
- Triple-backtick code fence anywhere
|
||||
- Token "traceback" / "stacktrace" / "stack trace" (case-insensitive)
|
||||
- Token "error:" or "exception:" near beginning
|
||||
- Text contains a path-like `./|/usr|~/` + `.py|.lua|.c|.js|.go|.rs`
|
||||
- More than 4 lines AND has indentation (looks like a paste)
|
||||
|
||||
2. **Reasoning class** if any of:
|
||||
- Token "explain" / "why" / "how does" / "compare"
|
||||
- Question mark + > 100 chars total
|
||||
|
||||
3. **Default class** otherwise.
|
||||
|
||||
Each class maps to a model name via `cfg.routing.classes`:
|
||||
|
||||
```lua
|
||||
routing = {
|
||||
auto = true,
|
||||
classes = {
|
||||
code = "deep", -- code questions to deep
|
||||
reasoning = "cloud", -- reasoning to cloud (best quality)
|
||||
default = nil, -- nil = keep current active model
|
||||
},
|
||||
cloud_fallback = true,
|
||||
fallback_model = "cloud",
|
||||
}
|
||||
```
|
||||
|
||||
When `auto = false`, `classify_model` returns nil always — equivalent to
|
||||
not setting a routing block. The heuristic functions live behind the
|
||||
flag.
|
||||
|
||||
---
|
||||
|
||||
## 5. Cloud Fallback Flow
|
||||
|
||||
In `repl.ask_ai` after the broker call:
|
||||
|
||||
```lua
|
||||
local ok, err = broker.chat_stream(active_cfg, msgs, on_delta, opts)
|
||||
if not ok and should_fallback(err, cfg) then
|
||||
renderer.status(("local %s failed (%s); retrying via %s")
|
||||
:format(active_name, fallback_reason(err),
|
||||
cfg.routing.fallback_model))
|
||||
local fb_cfg = cfg.models[cfg.routing.fallback_model]
|
||||
if fb_cfg then
|
||||
ok, err = broker.chat_stream(fb_cfg, msgs, on_delta, opts)
|
||||
end
|
||||
end
|
||||
```
|
||||
|
||||
`should_fallback(err, cfg)` matches `err` against fallback patterns
|
||||
ONLY when `cfg.routing.cloud_fallback == true`. Otherwise returns false.
|
||||
|
||||
### Fallback-eligible error patterns
|
||||
|
||||
| Pattern | Meaning |
|
||||
|---|---|
|
||||
| `HTTP 5%d%d` | server-side error (502 Bad Gateway, 503 Unavailable, 504 Timeout) |
|
||||
| `HTTP 404.*model_not_found` | the routed model isn't loaded on the local backend |
|
||||
| `Couldn'?t resolve host` | DNS / unreachable local broker |
|
||||
| `Connection refused` | broker not listening |
|
||||
| `Timeout was reached` | broker too slow |
|
||||
|
||||
Errors NOT matched (and therefore NOT retried):
|
||||
- HTTP 401 / 403 (auth failure — won't get better on cloud)
|
||||
- HTTP 400 (bad request — schema issue)
|
||||
- Lua-level errors (broker pipeline bug, not transport)
|
||||
|
||||
---
|
||||
|
||||
## 6. Context Summarization on Eviction
|
||||
|
||||
`Context.new(opts)` accepts an optional `summarize_fn(turns) -> string |
|
||||
nil` closure. When set AND `enforce_budget` would evict, the callback
|
||||
is invoked with the evicted slice; the returned summary (if non-nil)
|
||||
replaces the rolling summary turn.
|
||||
|
||||
### Storage shape
|
||||
|
||||
Index 1 of `ctx.turns` is reserved for the summary turn when present:
|
||||
|
||||
```lua
|
||||
{ role = "system", _summary = true,
|
||||
content = "[earlier conversation summary]\n<summary text>" }
|
||||
```
|
||||
|
||||
`to_messages()` renders this normally (system role; model sees it).
|
||||
|
||||
### Summary update flow
|
||||
|
||||
1. enforce_budget identifies the oldest 2 turns to evict (user + assistant).
|
||||
2. If `summarize_fn` is set, call it with those 2 turns.
|
||||
3. If summary text returned:
|
||||
- If `ctx.turns[1]._summary` exists, append to its content
|
||||
(truncate at `max_summary_chars` default 2000 — re-summarize via
|
||||
same callback if exceeded).
|
||||
- Else insert a new `_summary` turn at index 1.
|
||||
4. Remove the evicted turns from `ctx.turns`.
|
||||
5. If callback returned nil → silent eviction (Phase 0 behavior).
|
||||
|
||||
### Failure handling
|
||||
|
||||
Inside the callback (in `repl.lua`):
|
||||
|
||||
```lua
|
||||
local summary, err = broker.chat(summarizer_cfg, {
|
||||
{role="system", content="Summarize the following conversation in 2-3 sentences."},
|
||||
{role="user", content=render_turns_compact(evicted)},
|
||||
}, {max_tokens=300, timeout_ms=30000})
|
||||
return summary -- nil propagates; context.lua falls back to silent eviction
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Meta Commands (Phase 5 additions)
|
||||
|
||||
| Command | Action |
|
||||
|---|---|
|
||||
| `:route on` / `:route off` | Toggle `cfg.routing.auto` at runtime (overrides config) |
|
||||
| `:route classes` | Show the active class → model mapping |
|
||||
| `:route check <text>` | Print which class a given text would be routed to (debug aid) |
|
||||
| `:fallback on` / `:fallback off` | Toggle `cfg.routing.cloud_fallback` at runtime |
|
||||
|
||||
`:help` updated.
|
||||
|
||||
---
|
||||
|
||||
## 8. Migration from Phase 4
|
||||
|
||||
User-visible:
|
||||
- New `:route` and `:fallback` meta commands.
|
||||
- With `cfg.routing.auto`, the active model may CHANGE per-request as
|
||||
the heuristic fires. Prompt color tag could vary (Phase 6 maybe).
|
||||
- With `cfg.context.summarize_on_evict`, eviction now spends a fast-
|
||||
model round-trip instead of silently dropping turns.
|
||||
|
||||
Existing configs without `routing` or `context.summarize_on_evict`
|
||||
continue exactly as Phase 4 — defaults are OFF.
|
||||
|
||||
Substrate (PHASE0.md §3) invariants: unchanged. The `CMD:` extraction
|
||||
marker, `cd` interception, and the entire system-prompt suffix order
|
||||
from Phase 4 stay the same.
|
||||
|
||||
---
|
||||
|
||||
## 9. Out of Scope (Phase 5)
|
||||
|
||||
Per PHASE0.md §11 these belong to Phase 6:
|
||||
- Tree-sitter syntax highlighting hooks
|
||||
- Diff-aware code injection
|
||||
- Project-level context (file tree summary)
|
||||
|
||||
Specifically out of Phase 5:
|
||||
- LLM-based classification (heuristics-only v1).
|
||||
- Multi-hop fallback chains (one retry only).
|
||||
- Per-class temperature overrides (use the model preset's default).
|
||||
- Cost accounting for cloud calls (Q-list candidate).
|
||||
- Auto-router learning from user `:model` overrides (Phase 6+).
|
||||
|
||||
---
|
||||
|
||||
## 10. Open Questions
|
||||
|
||||
| # | Question | Impact | Resolve by |
|
||||
|---|---|---|---|
|
||||
| Q37 | Should routing apply to `:ask <text>` (explicit AI route) the same way it does to bare prompts? Yes seems obvious but worth documenting. | repl.lua | Phase 5 (plan) |
|
||||
| Q38 | Summary turn placement: index 1 (right after a putative system prompt — but to_messages prepends the system prompt fresh each call) vs index 0 in self.turns. Index 1 in self.turns means the system message comes from `to_messages()` and the `_summary` turn lives next as a `role:"system"` message. Strict templates may reject system/system back-to-back. | context.lua | Phase 5 (analyze) |
|
||||
| Q39 | When fallback fires AND the user is in Norris mode, does the fallback model also drive the planning loop? Or single-request-only fallback? v1 says single-request only (Norris stays on its configured model). | repl.lua + safety.lua | Phase 5 (plan) |
|
||||
| Q40 | Summarizer recursion: the summary itself might be summarized later when it grows past max_summary_chars. Does the re-summarize lose fidelity? Probably yes; acceptable trade-off. Note the lossy-by-design contract in §6. | context.lua | Phase 5 (verify) |
|
||||
| Q41 | Eligibility patterns for fallback: should HTTP 408 (Request Timeout, distinct from Timeout was reached) be matched? Some servers emit it. Default: yes — pattern `HTTP 408`. Phase 7 verify will adjust. | repl.lua | Phase 7 (verify) |
|
||||
| Q42 | Auto-router decisions inside the tool-call sub-loop: does each sub-iteration re-classify, or does the first user turn fix the model for the whole sub-loop? Proposal: fix at sub-loop entry — model switching mid-tool-call would confuse the model AND cost tokens by rebuilding context. | repl.lua | Phase 5 (plan) |
|
||||
|
||||
---
|
||||
|
||||
## 11. Implementation Plan (commit-by-commit)
|
||||
|
||||
Five commits expected:
|
||||
|
||||
1. **`router.lua` — `classify_model`.** Pure-Lua heuristics; no IO. Returns
|
||||
model name or nil. Module-local pattern set so tests can introspect.
|
||||
**Test in isolation**: ~30-case corpus of (input → expected class).
|
||||
|
||||
2. **`context.lua` — eviction callback.** Add `opts.summarize_fn`,
|
||||
`_summary` index-1 turn convention, `to_messages()` rendering
|
||||
(which Just Works since `_summary` turns have `role` + `content`).
|
||||
**Test in isolation**: mock summarize_fn returning "(summary N)",
|
||||
build a context that exceeds budget, verify the summary turn
|
||||
appears and accumulates.
|
||||
|
||||
3. **`repl.lua` — fallback + routing wiring.** Pre-broker
|
||||
classify_model hook (gated by cfg.routing.auto); post-error
|
||||
fallback retry (gated by cfg.routing.cloud_fallback); wire
|
||||
summarize_fn at Context.new time. **Test against hossenfelder**:
|
||||
prompt classified as "code" → routes to deep; deliberately
|
||||
misconfigure local endpoint → fallback fires.
|
||||
|
||||
4. **`:route` and `:fallback` meta commands.** Standalone — config
|
||||
toggles via runtime cmds. **End-to-end**: boot, `:route on`,
|
||||
issue a query, observe routing status; `:route off`, query
|
||||
again, no routing.
|
||||
|
||||
5. **`config.lua` — routing + summarize_on_evict example.**
|
||||
Documentation-only; commented-out example block. Final commit.
|
||||
|
||||
### Risk / non-obvious
|
||||
|
||||
- **Heuristic false positives**: a normal conversational question
|
||||
containing the word "explain" gets routed to cloud. Conservative
|
||||
defaults (`reasoning → nil` by default? then user opts in
|
||||
explicitly per class) might be safer. Default mapping in §4 is
|
||||
aggressive; tone down at plan if user prefers.
|
||||
- **Active-model state after routing**: the per-request routing
|
||||
switches `active_cfg` momentarily. The `prompt()` function reads
|
||||
`active_name` which IS reverted post-request, so the prompt label
|
||||
stays accurate.
|
||||
- **Fallback during streaming**: if the local broker fails MID-stream
|
||||
(e.g. emits some text then 5xx), the user has already seen partial
|
||||
text. Retrying via cloud means duplicated prefix. v1 only retries
|
||||
on errors BEFORE any deltas arrived (we can detect by tracking
|
||||
whether on_delta was called).
|
||||
- **Summarize during Norris**: Norris's planning loop generates many
|
||||
turns. Eviction during Norris means summarizing mid-plan — the
|
||||
model loses context about its earlier steps. Risky. v1 disables
|
||||
summarize when ctx.norris_active.
|
||||
- **Memory items + summary turn**: both are dynamic system-context
|
||||
additions. The summary is `role:"system"` in turns[1]; memory
|
||||
is the `[background]` block in the actual system message.
|
||||
Compatible — no overlap.
|
||||
|
||||
---
|
||||
|
||||
*End of Phase 5 Manifest — aish*
|
||||
Reference in New Issue
Block a user