diff --git a/docs/PHASE0.md b/docs/PHASE0.md index 810b52b..14d76a0 100644 --- a/docs/PHASE0.md +++ b/docs/PHASE0.md @@ -325,6 +325,7 @@ from somewhere else. | **7** | Cost / usage observability: broker captures `usage` + `cost`; per-session accumulator on ctx; `:cost` reporter; optional warn thresholds | | **8** | Accurate tokenization: per-endpoint `/tokenize` probe (cached); `broker.token_count`; `Context:estimate_tokens` widened; `:cost detail` est-vs-actual annotation | | **9** | Project-local config overlay (`.aish.lua` walk-up from cwd to $HOME, sha256-pinned trust prompt, shallow merge over user config); `:config show` meta | +| **10** | Cloud preplanner + local executor split for Norris (`cfg.norris.preplanner` emits TASK list once; `cfg.norris.executor` runs each step); `extract_task_lines`; `ctx.norris_tasks` anchor (survives eviction); cost category `"norris-preplan"` | --- diff --git a/docs/PHASE10.md b/docs/PHASE10.md new file mode 100644 index 0000000..7dd5edf --- /dev/null +++ b/docs/PHASE10.md @@ -0,0 +1,331 @@ +# aish — Phase 10 Manifest + +**Project:** aish — AI-augmented conversational shell +**Document:** Phase 10 Requirements, Architecture & Design Decisions +**Status:** Formulate (pre-analyze) +**Date:** 2026-05-17 + +PHASE0 is the locked substrate; PHASE1-9 are layered on top. This +manifest specifies what Phase 10 adds — **Cloud preplanner → local +executor split** for Norris autonomous mode. Resolves Gitea issue #89. + +Today Norris runs entirely on ONE model: pick cloud (capable but slow +per step + costs per step) OR local (fast + free per step but easily +distracted on multi-step planning). Phase 10 splits the planning and +execution roles: cloud emits a TASK list ONCE per Norris session; +local model executes each task. Most tasks are simple shell ops the +local model handles fine; cloud is used only at the planning layer +that benefits from its reasoning. + +PHASE0 §11 amendment to add Phase 10 row lands in the same commit +as this formulate doc. + +--- + +## 1. Scope of Phase 10 + +Four pillars: + +1. **Preplan call** — on `:norris ` launch, if `cfg.norris.preplanner` + names a configured model preset, fire ONE broker.chat call against + that preset with a system-prompt asking for `TASK: ` lines. + Parse them into a list; cap at `cfg.norris.tasks_max` (default 16). + Stash the list + current index on ctx (separate from ctx.turns so + eviction can't lose them — mirrors the ctx.norris_goal anchor). + +2. **Executor loop** — `safety.norris_step` already iterates per-step; + extend its prompt to include the CURRENT task. Synthesize a user- + turn-shaped `[task k/N] ` block fed alongside the + existing NORRIS suffix. When all tasks consumed (or executor signals + GOAL: complete early), Norris exits. + +3. **Cost + secrets composition** — preplan call goes through the + normal scrub_messages + on_delta usage callbacks. Category + `"norris-preplan"`; executor steps keep `"norris"`. `:cost detail` + surfaces both as separate rows. + +4. **Graceful fall-back** — if `cfg.norris.preplanner` is unset OR + the preplan call fails (transport err, parse failure, empty list), + Norris runs as today: single model handles both planning and + execution via the existing in-loop reasoning. No regression for + users without Phase 10 config. + +**Phase 10 is done when:** + +- `:norris find files larger than 10MB in /var/log and report sizes` + launched with `cfg.norris.preplanner = "cloud"` + `cfg.norris.executor + = "fast"`: + 1. Cloud emits a TASK list (e.g., `TASK: find /var/log -size +10M`; + `TASK: stat -c "%n %s" `; `TASK: format and report`). + 2. Status: `[aish] preplanned 3 tasks via cloud` + 3. Per-step execution by `fast`: each step shows the task it's + working on; existing HALT protocol still gates destructive ops. +- Without `cfg.norris.preplanner`, Norris behaves exactly as Phase 6 + (no regression for existing users). +- Preplan failure (broken cloud endpoint) → status log + fall back + to single-model Norris. +- `:cost detail` after a Norris session shows BOTH + `cloud / norris-preplan` (one row) and ` / norris` + (one row). + +--- + +## 2. Technology Decisions (delta from Phase 9) + +| Decision | Choice | Rationale | +|---|---|---| +| Preplan trigger | ONCE at `:norris ` launch (run_norris in repl.lua) | One round-trip per Norris session keeps cost predictable. Re-planning mid-flight deferred to a future iteration. | +| Preplan model selection | `cfg.norris.preplanner` (string; matches a key in cfg.models) | Same shape as `cfg.safety.llm_model`. Optional; absent = no split, existing behavior. | +| Executor model selection | `cfg.norris.executor` (string; matches cfg.models key) | Optional; absent = active_cfg (the user's `:model` choice at launch — existing behavior). | +| Preplan system prompt | Static template baked into safety.lua: "Decompose the goal into single-step imperative TASKs. Output format: TASK: . Maximum N tasks." with N = cfg.norris.tasks_max | Predictable parse; small surface. Override via cfg.norris.preplan_system if user wants. | +| TASK line parsing | `^TASK:%s*(.+)$` per line; trim whitespace; filter empty | Same shape as the existing CMD: / DELEGATE: / CMD&: extractors in executor.lua. Trivially adapt extract_*_lines. | +| Task storage | `ctx.norris_tasks = { current = 1, list = {...} }` (NEW field, separate from ctx.turns) | Survives eviction (mirrors ctx.norris_goal anchor); cleared at Norris exit. | +| Step-prompt synthesis | `safety.norris_step` reads `ctx.norris_tasks.list[current]` and prepends `[task k/N] ` to the rendered messages (system block? or synth user turn?). Decision: prepend to the NORRIS suffix already in the system prompt. | Keeps user-turn alternation legal; NORRIS suffix already exists and is per-turn re-composed. | +| Per-task advance | After `safety.norris_step` returns "continue", repl.lua's run_norris bumps `ctx.norris_tasks.current`. When current > #list, Norris exits with status "tasks_complete". | Same as the existing step counter; just tied to the task list now. | +| Goal anchor + task layered together | Both visible in the NORRIS suffix: `goal:` line (existing) + `current task k/N:` line (new) | Planner-executor still sees the global goal AND the current focus. | +| Preplan parse failure | Status log + fall back to single-model Norris (no tasks) | Robust; user can re-launch :norris if preplan was wonky. | +| Preplan empty result | Same as parse failure — fall back | Robust. | +| tasks_max cap | Default 16; cfg.norris.tasks_max overrides | Bounded blast radius; matches the existing max_norris_steps cap intent. | +| Cost category | "norris-preplan" for the preplan call; "norris" for executor steps (existing) | `:cost detail` surfaces them as separate rows. | +| Secrets/scrub | Preplan call goes through scrub_messages + rehydrate (matches all other broker calls in repl.lua) | No special-case. | +| Norris HALT protocol | Unchanged — per executor step | Existing safety.is_destructive + halt-proceed/skip/abort still gates. | +| Skip semantics | If user halts and skips at task k, advance to task k+1 (NOT re-try) | Predictable; user can :norris off + relaunch with refined goal if they need full re-plan. | + +--- + +## 3. Module Changes + +| File | State after Phase 9 | Phase 10 changes | +|---|---|---| +| `repl.lua` | `run_norris(goal)` builds helpers, runs while loop calling safety.norris_step | Pre-loop: if `cfg.norris.preplanner` set, fire one broker.chat against that preset; parse TASK lines; set `ctx.norris_tasks`. Per-iteration: bump `ctx.norris_tasks.current` after each non-terminal result; exit "tasks_complete" when exhausted. | +| `safety.lua` | norris_step composes the NORRIS suffix; uses model_cfg for broker call | Read `ctx.norris_tasks` if set; embed `[task k/N] ` into the suffix template OR pass via opts. Use `cfg.norris.executor` (resolved by repl.lua at run_norris launch) for the per-step broker call. | +| `context.lua` | system prompt composition + ctx.norris_active/norris_goal/norris_consecutive_skips | Add `ctx.norris_tasks` field (table or nil); clear on :reset (matches norris_goal lifecycle). NORRIS_SUFFIX_TEMPLATE extended to optionally show current task. | +| `executor.lua` | extract_cmd_lines, extract_cmd_bg_lines, extract_delegate_lines | Add `extract_task_lines(text)` — pure function. | +| `config.lua` | Phase 9 .aish.lua header + existing example blocks | Add commented-out `norris = { preplanner = "cloud", executor = "fast", tasks_max = 16 }` block. | +| `docs/PHASE0.md` | §11 lists phases 0-9 | Amendment: add Phase 10 row. | + +No new module files. + +--- + +## 4. Pillar 1 — Preplan call + +```lua +-- repl.lua run_norris, pre-loop block: +local tasks +if config.norris and config.norris.preplanner then + local pre_name = config.norris.preplanner + local pre_cfg = config.models and config.models[pre_name] + if pre_cfg then + local sys = (config.norris and config.norris.preplan_system) or [[ +You are a task decomposer. Given the user's goal, decompose it into a +sequence of single-step imperative TASKs. Output format: one TASK per +line, EXACTLY this shape: + + TASK: + +Output AT MOST N tasks. No prose; no numbering; no commentary outside +the TASK: lines. +]] + sys = sys:gsub("N", tostring(config.norris.tasks_max or 16)) + local msgs = scrub_messages({ + { role = "system", content = sys }, + { role = "user", content = goal }, + }, secrets_mode_for(pre_cfg)) + local text, usage = broker.chat(pre_cfg, msgs, + { category = "norris-preplan", + max_tokens = 800, timeout_ms = 60000 }) + if text then + if secrets_session then text = secrets_session:rehydrate(text) end + if usage then _record_usage(usage.model, usage.category, usage) end + local parsed = executor.extract_task_lines(text) + local cap = config.norris.tasks_max or 16 + if #parsed > cap then + -- trim and warn + for i = #parsed, cap + 1, -1 do parsed[i] = nil end + renderer.status(("preplan emitted >%d tasks; truncated"):format(cap)) + end + if #parsed > 0 then + tasks = parsed + renderer.status(("preplanned %d tasks via %s"):format(#tasks, pre_name)) + else + renderer.status("preplan produced no TASK lines; running single-model") + end + else + renderer.status("preplan failed: " .. tostring(usage) + .. "; running single-model") + end + end +end +if tasks then + ctx.norris_tasks = { current = 1, list = tasks } +end +``` + +--- + +## 5. Pillar 2 — Executor loop + +`safety.norris_step` extension: if `ctx.norris_tasks` is set, embed +the current task into the system suffix. The existing while loop in +`run_norris` already calls `norris_step` once per iteration; after +each `result.status == "continue"`, bump +`ctx.norris_tasks.current = ctx.norris_tasks.current + 1`. When +`current > #ctx.norris_tasks.list`, the loop exits with a +synthesized `"tasks_complete"` final status. + +System suffix extension (context.lua NORRIS_SUFFIX_TEMPLATE): + +```lua +local NORRIS_SUFFIX_TEMPLATE = [[ + + +[NORRIS MODE] You are operating autonomously toward the following goal: + + %s + +%s + +Plan and execute step by step ... +]] + +-- Compose: 1st %s = goal; 2nd %s = task hint (empty when no tasks). +local function compose_norris_suffix(self) + local task_hint = "" + if self.norris_tasks and self.norris_tasks.list then + local k = self.norris_tasks.current + local n = #self.norris_tasks.list + if self.norris_tasks.list[k] then + task_hint = string.format( + "Current step %d/%d:\n %s\n", k, n, self.norris_tasks.list[k]) + end + end + return string.format(NORRIS_SUFFIX_TEMPLATE, self.norris_goal, task_hint) +end +``` + +--- + +## 6. Pillar 3 — Cost + secrets composition + +Preplan call goes through the same `broker.chat` API as Phase 7 cost- +accumulator wiring. `category = "norris-preplan"` tags it for +`:cost detail` separation: + +``` +[aish] session usage detail (total=$0.000119, 312/45 tokens): + anthropic/claude-haiku-4.5 norris-preplan 1 calls, 180 / 35 tokens, $0.000099 + qwen-coder-7b-snappy-8k norris 5 calls, 132 / 10 tokens, $0.000000 (local) +[aish] estimated session ctx: 412 tokens; token_budget=4096 (10.1% used) +``` + +Secrets scrub fires before broker.chat sees the messages; rehydrate +on reply — same path as other call sites. + +--- + +## 7. Pillar 4 — Graceful fall-back + +If `cfg.norris.preplanner` is unset → `tasks = nil` → Norris behaves +as Phase 6 (single-model loop; existing semantics). + +If preplan call fails (transport err, parse failure, empty list) → +status log + `tasks = nil` → same fall-back. + +If executor model lookup fails (`cfg.norris.executor` names a +non-existent preset) → status log + use active_cfg (existing +behavior). User can fix config and re-launch. + +If `:reset` clears the conversation mid-Norris → existing behavior +clears turns; `ctx.norris_tasks` should ALSO clear since the goal +context is gone. Document in §9. + +--- + +## 8. UX Surface Summary + +| Config | Default | Effect | +|---|---|---| +| `cfg.norris.preplanner` | nil | Name of model preset for the preplan call; absent = no split | +| `cfg.norris.executor` | nil (uses active model) | Name of model preset for per-step execution | +| `cfg.norris.tasks_max` | 16 | Cap on TASK list size (parse-time trim) | +| `cfg.norris.preplan_system` | (built-in template) | Override preplan system prompt | + +| Startup status | Behavior | +|---|---| +| (preplan unset) | nothing — existing single-model Norris | +| (preplan success) | `[aish] preplanned N tasks via ` | +| (preplan failed) | `[aish] preplan failed: ; running single-model` | +| (preplan over cap) | `[aish] preplan emitted >N tasks; truncated` | + +No new meta commands in v1. Inspect via `:cost detail` (separate +norris-preplan row) and the existing `:history` (preplan call + reply +become assistant turns visible there). + +--- + +## 9. Out of Scope (Phase 10) + +- **Mid-flight re-plan** — preplan fires ONCE per Norris launch. + Re-plan based on per-step results would be a separate iteration; + user can `:norris off` + re-launch with refined goal for v1. +- **Adaptive task decomposition** — TASKs are fixed at launch; the + executor doesn't get to refine them. v1 trusts the preplanner's + parse. +- **Multi-step task = sub-tasks** — flat list only. Nested TASK + hierarchies are a future shape. +- **Skip-then-retry** — skip at HALT advances to the next task; no + retry mechanism. User re-launches if they need a retry. +- **Per-task model selection** — single executor model for the whole + session. Per-task routing (e.g. some tasks → cloud, some → local) + is interesting but bigger surface; defer. +- **Preplan-while-executing** — sequential: preplan first, THEN + execute. Streaming overlap is a future optimization. + +--- + +## 10. Risks + +| Risk | Mitigation | +|---|---| +| Preplan model emits malformed output (no `TASK:` lines, or wraps in markdown) | extract_task_lines tolerates leading whitespace + ignores non-TASK lines. If zero TASKs parsed, fall back to single-model. | +| Preplanner cost surprises user (silent paid call on every :norris launch) | Phase 7 cost meter accounts it under `norris-preplan` category; warn_at_dollars still fires. Default = unset (no automatic cost). | +| Task list is wrong / off-goal | Executor still has the global GOAL in the NORRIS suffix; can deviate per-step. Skip-budget per Phase 3 still escalates. User retains `:norris off` abort. | +| Local executor can't actually do a planned step (model too weak) | Same as today's Norris-on-local case — model emits something useless; HALT prompt lets user skip or abort. Phase 10 doesn't fix this; preplan + execute split makes the failure mode more visible (you can SEE which TASK is stuck). | +| ctx.norris_tasks survives across non-:reset session boundaries | Cleared at Norris exit (in run_norris's finally-equivalent) so re-launching Norris in same session starts fresh. | +| Eviction during long Norris session removes preplan + first executor turns | Tasks stored on ctx (NOT in turns); survive eviction. Per Phase 3 R-C3 the goal anchor in the NORRIS suffix also survives. | +| Preplan system prompt drift (user overrides badly) | Built-in fallback if cfg.norris.preplan_system absent; user override is opt-in. | +| Anthropic cloud preplan emits "Here's my plan:\n1. ...\n2. ..." (markdown numbering) instead of TASK: lines | extract_task_lines uses strict `^TASK:` matcher; markdown lists are ignored. preplan_system explicitly demands the format. If real cloud models drift, document or refine prompt at impl time. | + +--- + +## 11. Open Questions (Phase 10) + +| # | Question | Impact | Resolution target | +|---|---|---|---| +| Q-PP1 | Should `cfg.norris.executor` ALSO apply when `cfg.norris.preplanner` is unset (i.e., "always use this model for Norris regardless of :model selection")? | Norris model selection scope | Analyze | +| Q-PP2 | Preplan call uses `broker.chat` (non-streaming) — for cloud Haiku that's ~2-3 seconds before Norris begins. Should we stream the TASKs as they're emitted (chat_stream) so the user sees the plan forming? | UX latency | Analyze (probably non-streaming is fine; 16 TASKs at 5 tokens each = 80 tokens; ~1s on cloud) | +| Q-PP3 | If user runs `:norris foo` then later `:norris bar` in the same session, does preplanning fire again? | Re-launch semantics | Analyze (yes — each :norris launch fires preplan + populates new ctx.norris_tasks) | +| Q-PP4 | Does the executor see the FULL goal AND current task, or just current task? | Executor context | Analyze (both; per §5 the NORRIS suffix composes both) | +| Q-PP5 | Should `:norris` (without args) report `ctx.norris_tasks` state for an active session? | Introspection | Analyze (small ergonomics; could be a v2 polish) | +| Q-PP6 | What if preplan returns 1 task ("Do the thing") that's basically the whole goal? Worth running? | Degenerate case | Analyze (yes — degenerate case still works; one-task execution is identical to no-preplan single-model run, just with an extra ~1s cost. Don't special-case.) | + +--- + +## 12. Phase 10 → Phase 11+ Out-of-band + +Candidate follow-ups (non-binding): + +- **Phase 11**: cross-session cost rollup (Phase 7 §12 option 1 — + long-deferred). +- **Cost preflight enforcement** (Phase 7 §12 option 2 — also long- + deferred; Phase 8's accurate counts are the prerequisite). +- **Mid-flight Norris re-plan** — preplanner gets to re-decompose + based on executor progress. Real value, but needs careful + state-machine design (when to re-plan, how to preserve already- + completed work). +- **Per-task model selection** — task could carry a model hint + emitted by the preplanner. + +Phase 10 itself is self-contained — depends on Phase 3 (Norris) + +Phase 7 (cost accumulator) which are both implemented.