2e389c1475
Independent review found 1 BLOCKER + 5 CONCERNs + 4 NITs. Resolutions:
B1 BLOCKER: summary callback signature was inconsistent across §3 and §6.
Canonical now: summarize_fn(prior_summary, evicted_turns) -> string|nil
dispatching on the two args:
(nil, [turns]) — first-time summarize
(str, [turns]) — additive (extend prior summary with new evictions)
(str, nil) — compress (re-summarize the prior summary itself)
C1: re-summarize trigger now uses the (str, nil) compress signal
rather than degenerate (str, {}).
C2: routing decision is taken once on entry to ask_ai. The chosen
active_cfg is used for every tool-sub-loop iteration. Original
active_cfg restored after ask_ai returns.
C3: AUTO-routing does NOT fire inside the Norris loop. Model fixed
at :norris launch time; planner stays on it for every iteration.
Q39 resolved. Per-iteration fallback still gated by
cfg.routing.fallback — retries the failing call against cloud
without permanently switching the planner.
C4: Summary block suppressed in Norris (mirrors Phase 4 R-C1 for
the [background] block). Both are "earlier context" the planner
generally doesn't need.
C5: Fallback pattern coverage expanded — added HTTP 408 (Q41
resolved) and "Operation timed out" (libcurl version variant).
Dropped "HTTP response code said error" from A2 — FAILONERROR
was removed in Phase 4 f26cbd9.
NITs folded:
N1 :route check <text> always runs heuristic; suffix
"(routing currently disabled)" when cfg.routing.auto = false
N2 reasoning → nil by default (not → "cloud"); user explicitly
opts in to map reasoning to a paid model. Same cost-safety
rationale as confirm_cmd default true.
N3 "Retry only when no deltas have arrived" promoted to §5
normative rule (was in §11 risk row).
N4 cfg.routing.cloud_fallback renamed cfg.routing.fallback to
align with the :fallback meta verb.
Reviewer verdict: commit #1 (router.classify_model) is implement-
ready; B1/C1 resolution required before commit #2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
441 lines
23 KiB
Markdown
441 lines
23 KiB
Markdown
# aish — Phase 5 Manifest
|
|
|
|
**Project:** aish — AI-augmented conversational shell
|
|
**Document:** Phase 5 Requirements, Architecture & Design Decisions
|
|
**Status:** Plan (review fold-in 2026-05-13 — callback signature, Norris suppression, cost defaults resolved)
|
|
**Date:** 2026-05-13
|
|
|
|
**Review fold-in (2026-05-13):**
|
|
|
|
R-B1. **Summary callback signature canonical**: the closure is
|
|
`summarize_fn(prior_summary, evicted_turns) -> string | nil`.
|
|
`prior_summary` is `nil` on the first ever summarize; otherwise
|
|
the current `ctx.summary` string. `evicted_turns` is `nil` for
|
|
the re-summarize-compress trigger (C1 resolution); otherwise the
|
|
array of evicted turn tables. The closure dispatches:
|
|
first-time: prior=nil, evicted=[...] → "summarize these turns"
|
|
additive: prior=str, evicted=[...] → "extend the prior summary"
|
|
compress: prior=str, evicted=nil → "compress the prior summary"
|
|
|
|
R-C2. **Routing taken once per ask_ai**: the model decision happens
|
|
on entry to `ask_ai`. The chosen `active_cfg` is used for every
|
|
iteration of the tool-call sub-loop. Original `active_cfg` is
|
|
restored after `ask_ai` returns. NOT per-broker-call.
|
|
|
|
R-C3. **AUTO-routing does NOT fire inside Norris**: `run_norris`
|
|
operates on a fixed model (whatever the user set via `:model`
|
|
before launching). The auto-router would otherwise switch models
|
|
mid-plan, which loses planning continuity and costs tokens
|
|
rebuilding context. State explicit in §4 + §10.
|
|
|
|
R-C4. **Summary block suppressed under Norris**: mirrors Phase 4
|
|
R-C1 ([background] suppression). Both blocks are "earlier context"
|
|
the planner generally doesn't need mid-iteration. §6 + §3 reflect.
|
|
|
|
R-C5. **Fallback pattern coverage**:
|
|
- Add `HTTP 408` to §5 patterns (Q41 moves from open to resolved).
|
|
- Add `Operation timed out` (curl variant of "Timeout was reached").
|
|
- Drop "HTTP response code said error" from A2 — FAILONERROR was
|
|
removed in Phase 4 commit `f26cbd9`, this shape no longer fires.
|
|
|
|
NITs folded:
|
|
N1. `:route check <text>` always runs the heuristic regardless of
|
|
`cfg.routing.auto` — debug aid surfaces the class + would-be
|
|
model + "(routing currently disabled)" suffix when auto is off.
|
|
N2. **`reasoning → nil` by default** — the v1 heuristic that maps
|
|
"explain" / "why" / "how does" to a model is too aggressive
|
|
paired with `nil = keep current` semantics. User must
|
|
EXPLICITLY map `routing.classes.reasoning = "cloud"` to send
|
|
reasoning prompts to paid API. Same cost-safety rationale as
|
|
`cfg.routing.auto = false`.
|
|
N3. "Retry only when no deltas have arrived" promoted to normative
|
|
rule in §5 (was in §11 risk row).
|
|
N4. Config key renamed `cfg.routing.cloud_fallback` →
|
|
`cfg.routing.fallback` to align with the `:fallback` meta verb.
|
|
Single-source naming.
|
|
|
|
**Analyze findings (2026-05-13):**
|
|
|
|
**Analyze findings (2026-05-13):**
|
|
|
|
A1. **router.lua surface clean** — already a pure-Lua module with
|
|
`M.classify(line, config) -> (kind, payload)`. Adding
|
|
`M.classify_model(text, cfg) -> name | nil` is a natural sibling.
|
|
No structural refactor.
|
|
|
|
A2. **broker error message shapes** all carry transport-stage prefixes
|
|
that the fallback matcher must account for. The actual shapes
|
|
callers see:
|
|
"transport: HTTP %d%d%d: <body-snippet>" (from post_sse status>=400)
|
|
"transport: Timeout was reached"
|
|
"transport: Couldn't resolve host"
|
|
"transport: Connection refused"
|
|
"transport: HTTP response code said error" (rare; from FAILONERROR)
|
|
"api: <error.message>" (SSE-framed error envelope)
|
|
"broker: model_cfg.endpoint and .model required" (config bug)
|
|
Fallback patterns in §5 should match against the "transport: "
|
|
prefix explicitly. "api: ..." errors don't fall back (they're
|
|
semantic — bad request shape, not server failure). "broker: ..."
|
|
errors don't fall back either (config bug).
|
|
|
|
A3. **Q38 resolved at analyze** — placing the rolling summary as
|
|
`turns[1]` with `role:"system"` would produce system/system
|
|
back-to-back in to_messages output (msg[1] is the composed
|
|
system prompt; msg[2] would be the summary as another system
|
|
message). Strict templates may reject this same way they reject
|
|
user/user (PHASE0 §6). Resolution: render the summary INSIDE the
|
|
composed system message (same pattern as the [background] and
|
|
NORRIS blocks). Storage stays simple — keep `_summary` text on
|
|
`ctx.summary` (NOT in `ctx.turns`), append to the system prompt
|
|
in `to_messages` alongside the [background] and NORRIS blocks.
|
|
§6 + §3 reflect.
|
|
|
|
PHASE0 is the locked substrate; PHASE1-4 are layered on top. This manifest
|
|
specifies what Phase 5 adds — **multi-model routing**, **cloud fallback**,
|
|
and **context summarization on eviction**.
|
|
|
|
---
|
|
|
|
## 1. Scope of Phase 5
|
|
|
|
Three pillars per PHASE0.md §11 row 5:
|
|
|
|
1. **Multi-model routing by task type** — `router.lua` extended with a
|
|
per-request `classify_model(text, cfg)` that suggests a model
|
|
preset based on lightweight heuristics over the user input.
|
|
Opt-in via `cfg.routing.auto = true`; default off (explicit `:model`
|
|
stays the only switch).
|
|
|
|
2. **Cloud fallback on local failure** — when the active broker call
|
|
returns `nil, err` for a transport reason that looks like
|
|
"local backend down" (HTTP 502 / 503 / 404 model-not-found /
|
|
libcurl connection-refused / timeout), automatically retry once
|
|
against the configured `cloud` preset, surfacing a status line so
|
|
the user knows what happened. Opt-in via `cfg.routing.cloud_fallback = true`;
|
|
default off (single-shot only).
|
|
|
|
3. **Context summarization on eviction** — when
|
|
`context.enforce_budget()` would evict the oldest turn pair, instead
|
|
send those turns to the `fast` model (or `cfg.context.summarizer_model`)
|
|
with "summarize these turns in 2-3 sentences", then replace them
|
|
with a synthetic `role:"system"`-adjacent turn carrying the summary.
|
|
Subsequent evictions append to or re-summarize the rolling summary.
|
|
Opt-in via `cfg.context.summarize_on_evict = true`; default off
|
|
(Phase 0 silent eviction stays the default).
|
|
|
|
**Phase 5 is done when:**
|
|
|
|
- With `cfg.routing.auto = true`, a prompt like "explain this Python
|
|
traceback ..." gets routed to `deep` while "ls /tmp" or "what time
|
|
is it?" stays on `fast` — visible status `[aish] routed to deep`.
|
|
- With `cfg.routing.cloud_fallback = true`, killing the local
|
|
llama.cpp upstream and asking a question yields a single retry on
|
|
the cloud preset + a status line.
|
|
- With `cfg.context.summarize_on_evict = true`, a long conversation
|
|
that exceeds `max_turns` no longer silently drops history — the
|
|
evicted span is summarized into a single rolling turn the model
|
|
still sees.
|
|
- Existing configs without `cfg.routing` or `cfg.context.summarize_on_evict`
|
|
behave exactly like Phase 4 (Phase 4 regression coverage).
|
|
|
|
---
|
|
|
|
## 2. Technology Decisions (delta from Phase 4)
|
|
|
|
| Decision | Choice | Rationale |
|
|
|---|---|---|
|
|
| Routing trigger | Per-request, in `repl.ask_ai`, BEFORE the broker call | Same hook point as the tool-sub-loop entry. Decision is one function call (`router.classify_model`) that returns the resolved (name, cfg) pair OR nil = keep current. |
|
|
| Classification mechanism | **Pure-Lua heuristics** in `router.classify_model` — keyword/length thresholds, no LLM call | Fast (no network), deterministic, debuggable. An LLM-based classifier is overkill v1; can be added in Phase 6+ if heuristics drift. |
|
|
| Routing classes (v1) | `code`, `reasoning`, `default` → mapped to model presets via `cfg.routing.classes` | Three classes for the first cut. **Defaults (N2 fold-in)**: `code → "deep"`, `reasoning → nil` (heuristic still fires but no override unless user maps it), `default → nil`. The aggressive `reasoning → "cloud"` default sent ordinary "why does ..." prompts to a paid API; user must opt in explicitly to pay for reasoning. Same cost-safety rationale as `cfg.routing.auto = false`. |
|
|
| Routing cost-safety | `cfg.routing.auto = false` default | Same rationale as `confirm_cmd = true` and `llm_second_opinion = true`: a default-on routing maps "explain ..." prompts to whatever class maps to `"cloud"`, spending paid-API tokens on prompts the user typed for what they thought was their local model. Default off; user opts in. |
|
|
| Fallback trigger | Transport-error pattern match against `err` string — HTTP 5xx, model_not_found, "Connection refused", "Couldn't resolve host", "Timeout was reached" | These are the four shapes the broker actually emits. Library-error patterns are stable enough that string-match is fine for v1. |
|
|
| Fallback target | `cfg.routing.fallback_model` (default `"cloud"` when present) | One-hop fallback only; if cloud also fails, surface the error normally. No retry loops. |
|
|
| Fallback timing | **Only retry when no deltas have arrived yet** (N3 fold-in) | If the local broker emits partial text then 5xx's mid-stream, the user has seen prose; retrying via cloud would duplicate the prefix and confuse the user. The retry path checks an `any_delta` flag in the on_delta callback; only retries when false. |
|
|
| Fallback announcement | Status line `[aish] local <name> failed (<reason>); retrying via <fallback_name>` | Visibility — user always knows when a fallback fired. |
|
|
| Summarize trigger | Inside `context.enforce_budget()`, when it would otherwise `table.remove` | Same place the eviction status fires. The summarize is a *replacement* not an addition; total turn count stays bounded. |
|
|
| Summary turn shape | Single rolling `{role = "system", content = "[earlier conversation]\n<summary>", _summary = true}` turn at index 1 (after the system prompt) | One synthetic turn carries all evicted history. New evictions either *append* to it (cheap) or trigger a re-summarize when the summary itself exceeds a char cap (default 2000). |
|
|
| Summary model | `cfg.context.summarizer_model` (default `"fast"`) | Same pattern as `cfg.memory.summarizer_model`. Fast model is cheap enough to summarize on every eviction. |
|
|
| Summary failure handling | If broker returns nil, fall back to *silent eviction* (Phase 0 behavior) and status-log once. Don't block the user's main request. | Best-effort; never let summarization break the REPL. |
|
|
|
|
---
|
|
|
|
## 3. Module Changes
|
|
|
|
| File | State after Phase 4 | Phase 5 changes |
|
|
|---|---|---|
|
|
| `router.lua` | `classify(line, config)` → `(kind, payload)` for shell/AI/meta dispatch | Add `M.classify_model(text, cfg) -> name | nil`. Heuristics: line length > N, presence of code-fence backticks, keywords like "traceback", "stacktrace", "explain", "why does", etc. Returns the model NAME (string) or nil = keep current. |
|
|
| `context.lua` | turns + memory_items + Norris suffix | Extend `enforce_budget()` to invoke a callback (passed via `Context.new(opts.summarize_fn)`) when about to evict. Store the returned summary as `ctx.summary` (string) — NOT a turn (A3 — avoids system/system alternation). `to_messages` composes it into the system message alongside `[background]` and NORRIS, between them: `system → [background] → [earlier summary] → NORRIS`. New evictions append to `ctx.summary`; when its length exceeds `max_summary_chars` (default 2000), the callback is invoked AGAIN with `(prior_summary, new_evicted_turns)` to re-summarize. Silent eviction is the fallback when the callback returns nil. |
|
|
| `repl.lua` | tool-sub-loop + meta + memory injection | (a) Pre-broker hook: if `cfg.routing.auto`, call `router.classify_model(text, cfg)` and switch `active_cfg` for THIS request only (revert after). (b) Post-broker error hook: if err matches a fallback pattern AND `cfg.routing.cloud_fallback`, retry against the fallback model once. (c) Wire `Context.new` with a `summarize_fn = function(turns) ... end` closure that calls `broker.chat(cfg.models[cfg.context.summarizer_model], ..., {max_tokens=300})`. |
|
|
| `broker.lua` | streaming + opts.tools/max_tokens/timeout_ms | Unchanged — Phase 5 composes on top of the existing surface. |
|
|
| `config.lua` | example with mcp/safety/memory blocks | Add commented-out `routing = {...}` and `context.summarize_on_evict = true` example. |
|
|
|
|
No new module files. All Phase 5 functionality grows existing files —
|
|
mostly `repl.lua` and `router.lua`.
|
|
|
|
---
|
|
|
|
## 4. Routing Heuristics (v1)
|
|
|
|
`router.classify_model(text, cfg)` returns a model NAME (looked up in
|
|
`cfg.routing.classes`) or `nil` (use the user-set active model).
|
|
|
|
Heuristics, in order — first hit wins:
|
|
|
|
1. **Code class** if any of:
|
|
- Triple-backtick code fence anywhere
|
|
- Token "traceback" / "stacktrace" / "stack trace" (case-insensitive)
|
|
- Token "error:" or "exception:" near beginning
|
|
- Text contains a path-like `./|/usr|~/` + `.py|.lua|.c|.js|.go|.rs`
|
|
- More than 4 lines AND has indentation (looks like a paste)
|
|
|
|
2. **Reasoning class** if any of:
|
|
- Token "explain" / "why" / "how does" / "compare"
|
|
- Question mark + > 100 chars total
|
|
|
|
3. **Default class** otherwise.
|
|
|
|
Each class maps to a model name via `cfg.routing.classes`:
|
|
|
|
```lua
|
|
routing = {
|
|
auto = true,
|
|
classes = {
|
|
code = "deep", -- code questions to deep
|
|
reasoning = "cloud", -- reasoning to cloud (best quality)
|
|
default = nil, -- nil = keep current active model
|
|
},
|
|
cloud_fallback = true,
|
|
fallback_model = "cloud",
|
|
}
|
|
```
|
|
|
|
When `auto = false`, `classify_model` returns nil always — equivalent to
|
|
not setting a routing block. The heuristic functions live behind the
|
|
flag.
|
|
|
|
---
|
|
|
|
## 5. Cloud Fallback Flow
|
|
|
|
In `repl.ask_ai` after the broker call:
|
|
|
|
```lua
|
|
local ok, err = broker.chat_stream(active_cfg, msgs, on_delta, opts)
|
|
if not ok and should_fallback(err, cfg) then
|
|
renderer.status(("local %s failed (%s); retrying via %s")
|
|
:format(active_name, fallback_reason(err),
|
|
cfg.routing.fallback_model))
|
|
local fb_cfg = cfg.models[cfg.routing.fallback_model]
|
|
if fb_cfg then
|
|
ok, err = broker.chat_stream(fb_cfg, msgs, on_delta, opts)
|
|
end
|
|
end
|
|
```
|
|
|
|
`should_fallback(err, cfg)` matches `err` against fallback patterns
|
|
ONLY when `cfg.routing.cloud_fallback == true`. Otherwise returns false.
|
|
|
|
### Fallback-eligible error patterns
|
|
|
|
All patterns match against the err string AS IT ARRIVES from broker.lua,
|
|
which is prefixed `"transport: "` for libcurl/HTTP issues (A2 confirmed).
|
|
The matcher strips the prefix before testing.
|
|
|
|
| Pattern (after prefix strip) | Meaning |
|
|
|---|---|
|
|
| `^HTTP 5%d%d` | server-side error (502 Bad Gateway, 503 Unavailable, 504 Timeout) |
|
|
| `^HTTP 404.*model_not_found` | the routed model isn't loaded on the local backend |
|
|
| `^HTTP 408` | Request Timeout (gateway-level; some proxies emit this — Q41 resolved) |
|
|
| `Couldn'?t resolve host` | DNS / unreachable local broker |
|
|
| `Connection refused` | broker not listening |
|
|
| `Timeout was reached` | libcurl's internal timeout phrasing |
|
|
| `Operation timed out` | curl variant of timeout (libcurl version-dependent) |
|
|
|
|
Errors NOT matched (NOT retried):
|
|
- HTTP 401 / 403 (auth failure — won't get better on cloud)
|
|
- HTTP 400 (bad request — schema issue)
|
|
- `^api:` errors (semantic — bad request shape)
|
|
- `^broker:` errors (config bug — endpoint/model missing)
|
|
- Lua-level errors (broker pipeline bug, not transport)
|
|
|
|
---
|
|
|
|
## 6. Context Summarization on Eviction
|
|
|
|
`Context.new(opts)` accepts an optional `summarize_fn(turns) -> string |
|
|
nil` closure. When set AND `enforce_budget` would evict, the callback
|
|
is invoked with the evicted slice; the returned summary (if non-nil)
|
|
replaces the rolling summary turn.
|
|
|
|
### Storage shape (post-A3 resolution)
|
|
|
|
The rolling summary lives on `ctx.summary` (a string), NOT in `ctx.turns`:
|
|
|
|
```lua
|
|
ctx.summary = "Earlier conversation: user discussed X, asked about Y, "
|
|
.. "agreed to Z. Later asked..."
|
|
```
|
|
|
|
`to_messages()` composes it into the system message between `[background]`
|
|
and the NORRIS suffix:
|
|
|
|
```
|
|
DEFAULT_SYSTEM_PROMPT
|
|
|
|
[background] (memory items)
|
|
- (fact) ...
|
|
|
|
[earlier conversation summary]
|
|
<ctx.summary>
|
|
|
|
[NORRIS MODE] (if active)
|
|
...
|
|
```
|
|
|
|
No new role:"system" message at turns[1] — avoids system/system alternation.
|
|
|
|
### Summary update flow
|
|
|
|
1. enforce_budget identifies the oldest 2 turns to evict (user + assistant).
|
|
2. If `summarize_fn` is set, call it with `(prior_summary, evicted_turns)`.
|
|
3. If summary text returned:
|
|
- Replace `ctx.summary` with the new text.
|
|
- If `#ctx.summary > max_summary_chars` (default 2000), invoke the
|
|
callback once more with `(ctx.summary, {})` to re-summarize for
|
|
compactness. Lossy by design — Q40 documents this trade-off.
|
|
4. Remove the evicted turns from `ctx.turns`.
|
|
5. If callback returned nil → silent eviction; `ctx.summary` unchanged.
|
|
|
|
### Failure handling
|
|
|
|
Inside the callback (in `repl.lua`):
|
|
|
|
```lua
|
|
local summary, err = broker.chat(summarizer_cfg, {
|
|
{role="system", content="Summarize the following conversation in 2-3 sentences."},
|
|
{role="user", content=render_turns_compact(evicted)},
|
|
}, {max_tokens=300, timeout_ms=30000})
|
|
return summary -- nil propagates; context.lua falls back to silent eviction
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Meta Commands (Phase 5 additions)
|
|
|
|
| Command | Action |
|
|
|---|---|
|
|
| `:route on` / `:route off` | Toggle `cfg.routing.auto` at runtime (overrides config) |
|
|
| `:route classes` | Show the active class → model mapping |
|
|
| `:route check <text>` | Print which class a given text would be routed to (debug aid) |
|
|
| `:fallback on` / `:fallback off` | Toggle `cfg.routing.cloud_fallback` at runtime |
|
|
|
|
`:help` updated.
|
|
|
|
---
|
|
|
|
## 8. Migration from Phase 4
|
|
|
|
User-visible:
|
|
- New `:route` and `:fallback` meta commands.
|
|
- With `cfg.routing.auto`, the active model may CHANGE per-request as
|
|
the heuristic fires. Prompt color tag could vary (Phase 6 maybe).
|
|
- With `cfg.context.summarize_on_evict`, eviction now spends a fast-
|
|
model round-trip instead of silently dropping turns.
|
|
|
|
Existing configs without `routing` or `context.summarize_on_evict`
|
|
continue exactly as Phase 4 — defaults are OFF.
|
|
|
|
Substrate (PHASE0.md §3) invariants: unchanged. The `CMD:` extraction
|
|
marker, `cd` interception, and the entire system-prompt suffix order
|
|
from Phase 4 stay the same.
|
|
|
|
---
|
|
|
|
## 9. Out of Scope (Phase 5)
|
|
|
|
Per PHASE0.md §11 these belong to Phase 6:
|
|
- Tree-sitter syntax highlighting hooks
|
|
- Diff-aware code injection
|
|
- Project-level context (file tree summary)
|
|
|
|
Specifically out of Phase 5:
|
|
- LLM-based classification (heuristics-only v1).
|
|
- Multi-hop fallback chains (one retry only).
|
|
- Per-class temperature overrides (use the model preset's default).
|
|
- Cost accounting for cloud calls (Q-list candidate).
|
|
- Auto-router learning from user `:model` overrides (Phase 6+).
|
|
|
|
---
|
|
|
|
## 10. Open Questions
|
|
|
|
| # | Question | Impact | Resolve by |
|
|
|---|---|---|---|
|
|
| Q37 | Should routing apply to `:ask <text>` (explicit AI route) the same way it does to bare prompts? Yes seems obvious but worth documenting. | repl.lua | Phase 5 (plan) |
|
|
| Q38 | ~~Summary turn placement: index 1 vs index 0~~ | context.lua | **Resolved at analyze (A3)**: NEITHER — summary lives on `ctx.summary` (string) and composes into the SYSTEM MESSAGE alongside [background] and NORRIS suffix. No new role:"system" message; no alternation risk. |
|
|
| Q39 | ~~Fallback under Norris~~ | repl.lua + safety.lua | **Resolved at review (R-C3)**: AUTO-routing does NOT fire inside the Norris loop. The model is fixed at `:norris <goal>` launch time; the planner stays on it for every iteration. Per-iteration fallback (if a local broker call inside Norris fails) is still gated by `cfg.routing.fallback`; that retries the failed call against cloud but doesn't permanently switch the planner. |
|
|
| Q40 | Summarizer recursion: the summary itself might be summarized later when it grows past max_summary_chars. Does the re-summarize lose fidelity? Probably yes; acceptable trade-off. Note the lossy-by-design contract in §6. | context.lua | Phase 5 (verify) |
|
|
| Q41 | ~~HTTP 408 / Operation timed out eligibility~~ | repl.lua | **Resolved at review (R-C5)**: both added to §5 patterns. |
|
|
| Q42 | Auto-router decisions inside the tool-call sub-loop: does each sub-iteration re-classify, or does the first user turn fix the model for the whole sub-loop? Proposal: fix at sub-loop entry — model switching mid-tool-call would confuse the model AND cost tokens by rebuilding context. | repl.lua | Phase 5 (plan) |
|
|
|
|
---
|
|
|
|
## 11. Implementation Plan (commit-by-commit)
|
|
|
|
Five commits expected:
|
|
|
|
1. **`router.lua` — `classify_model`.** Pure-Lua heuristics; no IO. Returns
|
|
model name or nil. Module-local pattern set so tests can introspect.
|
|
**Test in isolation**: ~30-case corpus of (input → expected class).
|
|
|
|
2. **`context.lua` — eviction callback.** Add `opts.summarize_fn`,
|
|
`_summary` index-1 turn convention, `to_messages()` rendering
|
|
(which Just Works since `_summary` turns have `role` + `content`).
|
|
**Test in isolation**: mock summarize_fn returning "(summary N)",
|
|
build a context that exceeds budget, verify the summary turn
|
|
appears and accumulates.
|
|
|
|
3. **`repl.lua` — fallback + routing wiring.** Pre-broker
|
|
classify_model hook (gated by cfg.routing.auto); post-error
|
|
fallback retry (gated by cfg.routing.cloud_fallback); wire
|
|
summarize_fn at Context.new time. **Test against hossenfelder**:
|
|
prompt classified as "code" → routes to deep; deliberately
|
|
misconfigure local endpoint → fallback fires.
|
|
|
|
4. **`:route` and `:fallback` meta commands.** Standalone — config
|
|
toggles via runtime cmds. **End-to-end**: boot, `:route on`,
|
|
issue a query, observe routing status; `:route off`, query
|
|
again, no routing.
|
|
|
|
5. **`config.lua` — routing + summarize_on_evict example.**
|
|
Documentation-only; commented-out example block. Final commit.
|
|
|
|
### Risk / non-obvious
|
|
|
|
- **Heuristic false positives**: a normal conversational question
|
|
containing the word "explain" gets routed to cloud. Conservative
|
|
defaults (`reasoning → nil` by default? then user opts in
|
|
explicitly per class) might be safer. Default mapping in §4 is
|
|
aggressive; tone down at plan if user prefers.
|
|
- **Active-model state after routing**: the per-request routing
|
|
switches `active_cfg` momentarily. The `prompt()` function reads
|
|
`active_name` which IS reverted post-request, so the prompt label
|
|
stays accurate.
|
|
- **Fallback during streaming**: if the local broker fails MID-stream
|
|
(e.g. emits some text then 5xx), the user has already seen partial
|
|
text. Retrying via cloud means duplicated prefix. v1 only retries
|
|
on errors BEFORE any deltas arrived (we can detect by tracking
|
|
whether on_delta was called).
|
|
- **Summarize during Norris**: Norris's planning loop generates many
|
|
turns. Eviction during Norris means summarizing mid-plan — the
|
|
model loses context about its earlier steps. Risky. v1 disables
|
|
summarize when ctx.norris_active.
|
|
- **Memory items + summary turn**: both are dynamic system-context
|
|
additions. The summary is `role:"system"` in turns[1]; memory
|
|
is the `[background]` block in the actual system message.
|
|
Compatible — no overlap.
|
|
|
|
---
|
|
|
|
*End of Phase 5 Manifest — aish*
|