cb2f948e76
Analysis resolves 6 OQs from the formulate: - executor cfg independent of preplanner cfg (Q-PP1) - preplan non-streaming for v1 (Q-PP2) - re-launch fires preplan again, naturally (Q-PP3) - executor sees goal + current task (Q-PP4) - :norris introspection out-of-scope v1 (Q-PP5) - 1-task degenerate case runs as normal (Q-PP6) Code-reading findings: safety.norris_step signature unchanged (executor cfg flows in as model_cfg param); NORRIS_SUFFIX_TEMPLATE stays stable (task hint appends after); renderer.norris_step already accepts descr (just unused by safety.norris_step today). Plan: 5 commits — executor / context / safety / repl / config-and- memory. Each commit verifiable in isolation; the orchestration lights up at C4 (repl preplan wiring); C5 documents. Sonnet review next (per ~/.claude/projects/.../memory rule). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
365 lines
22 KiB
Markdown
365 lines
22 KiB
Markdown
# aish — Phase 10 Manifest
|
||
|
||
**Project:** aish — AI-augmented conversational shell
|
||
**Document:** Phase 10 Requirements, Architecture & Design Decisions
|
||
**Status:** Formulate (pre-analyze)
|
||
**Date:** 2026-05-17
|
||
|
||
PHASE0 is the locked substrate; PHASE1-9 are layered on top. This
|
||
manifest specifies what Phase 10 adds — **Cloud preplanner → local
|
||
executor split** for Norris autonomous mode. Resolves Gitea issue #89.
|
||
|
||
Today Norris runs entirely on ONE model: pick cloud (capable but slow
|
||
per step + costs per step) OR local (fast + free per step but easily
|
||
distracted on multi-step planning). Phase 10 splits the planning and
|
||
execution roles: cloud emits a TASK list ONCE per Norris session;
|
||
local model executes each task. Most tasks are simple shell ops the
|
||
local model handles fine; cloud is used only at the planning layer
|
||
that benefits from its reasoning.
|
||
|
||
PHASE0 §11 amendment to add Phase 10 row lands in the same commit
|
||
as this formulate doc.
|
||
|
||
---
|
||
|
||
## 1. Scope of Phase 10
|
||
|
||
Four pillars:
|
||
|
||
1. **Preplan call** — on `:norris <goal>` launch, if `cfg.norris.preplanner`
|
||
names a configured model preset, fire ONE broker.chat call against
|
||
that preset with a system-prompt asking for `TASK: <imperative>` lines.
|
||
Parse them into a list; cap at `cfg.norris.tasks_max` (default 16).
|
||
Stash the list + current index on ctx (separate from ctx.turns so
|
||
eviction can't lose them — mirrors the ctx.norris_goal anchor).
|
||
|
||
2. **Executor loop** — `safety.norris_step` already iterates per-step;
|
||
extend its prompt to include the CURRENT task. Synthesize a user-
|
||
turn-shaped `[task k/N] <task text>` block fed alongside the
|
||
existing NORRIS suffix. When all tasks consumed (or executor signals
|
||
GOAL: complete early), Norris exits.
|
||
|
||
3. **Cost + secrets composition** — preplan call goes through the
|
||
normal scrub_messages + on_delta usage callbacks. Category
|
||
`"norris-preplan"`; executor steps keep `"norris"`. `:cost detail`
|
||
surfaces both as separate rows.
|
||
|
||
4. **Graceful fall-back** — if `cfg.norris.preplanner` is unset OR
|
||
the preplan call fails (transport err, parse failure, empty list),
|
||
Norris runs as today: single model handles both planning and
|
||
execution via the existing in-loop reasoning. No regression for
|
||
users without Phase 10 config.
|
||
|
||
**Phase 10 is done when:**
|
||
|
||
- `:norris find files larger than 10MB in /var/log and report sizes`
|
||
launched with `cfg.norris.preplanner = "cloud"` + `cfg.norris.executor
|
||
= "fast"`:
|
||
1. Cloud emits a TASK list (e.g., `TASK: find /var/log -size +10M`;
|
||
`TASK: stat -c "%n %s" <results>`; `TASK: format and report`).
|
||
2. Status: `[aish] preplanned 3 tasks via cloud`
|
||
3. Per-step execution by `fast`: each step shows the task it's
|
||
working on; existing HALT protocol still gates destructive ops.
|
||
- Without `cfg.norris.preplanner`, Norris behaves exactly as Phase 6
|
||
(no regression for existing users).
|
||
- Preplan failure (broken cloud endpoint) → status log + fall back
|
||
to single-model Norris.
|
||
- `:cost detail` after a Norris session shows BOTH
|
||
`cloud / norris-preplan` (one row) and `<executor model> / norris`
|
||
(one row).
|
||
|
||
---
|
||
|
||
## 2. Technology Decisions (delta from Phase 9)
|
||
|
||
| Decision | Choice | Rationale |
|
||
|---|---|---|
|
||
| Preplan trigger | ONCE at `:norris <goal>` launch (run_norris in repl.lua) | One round-trip per Norris session keeps cost predictable. Re-planning mid-flight deferred to a future iteration. |
|
||
| Preplan model selection | `cfg.norris.preplanner` (string; matches a key in cfg.models) | Same shape as `cfg.safety.llm_model`. Optional; absent = no split, existing behavior. |
|
||
| Executor model selection | `cfg.norris.executor` (string; matches cfg.models key) | Optional; absent = active_cfg (the user's `:model` choice at launch — existing behavior). |
|
||
| Preplan system prompt | Static template baked into safety.lua: "Decompose the goal into single-step imperative TASKs. Output format: TASK: <imperative sentence, max 80 chars>. Maximum N tasks." with N = cfg.norris.tasks_max | Predictable parse; small surface. Override via cfg.norris.preplan_system if user wants. |
|
||
| TASK line parsing | `^TASK:%s*(.+)$` per line; trim whitespace; filter empty | Same shape as the existing CMD: / DELEGATE: / CMD&: extractors in executor.lua. Trivially adapt extract_*_lines. |
|
||
| Task storage | `ctx.norris_tasks = { current = 1, list = {...} }` (NEW field, separate from ctx.turns) | Survives eviction (mirrors ctx.norris_goal anchor); cleared at Norris exit. |
|
||
| Step-prompt synthesis | `safety.norris_step` reads `ctx.norris_tasks.list[current]` and prepends `[task k/N] <text>` to the rendered messages (system block? or synth user turn?). Decision: prepend to the NORRIS suffix already in the system prompt. | Keeps user-turn alternation legal; NORRIS suffix already exists and is per-turn re-composed. |
|
||
| Per-task advance | After `safety.norris_step` returns "continue", repl.lua's run_norris bumps `ctx.norris_tasks.current`. When current > #list, Norris exits with status "tasks_complete". | Same as the existing step counter; just tied to the task list now. |
|
||
| Goal anchor + task layered together | Both visible in the NORRIS suffix: `goal:` line (existing) + `current task k/N:` line (new) | Planner-executor still sees the global goal AND the current focus. |
|
||
| Preplan parse failure | Status log + fall back to single-model Norris (no tasks) | Robust; user can re-launch :norris if preplan was wonky. |
|
||
| Preplan empty result | Same as parse failure — fall back | Robust. |
|
||
| tasks_max cap | Default 16; cfg.norris.tasks_max overrides | Bounded blast radius; matches the existing max_norris_steps cap intent. |
|
||
| Cost category | "norris-preplan" for the preplan call; "norris" for executor steps (existing) | `:cost detail` surfaces them as separate rows. |
|
||
| Secrets/scrub | Preplan call goes through scrub_messages + rehydrate (matches all other broker calls in repl.lua) | No special-case. |
|
||
| Norris HALT protocol | Unchanged — per executor step | Existing safety.is_destructive + halt-proceed/skip/abort still gates. |
|
||
| Skip semantics | If user halts and skips at task k, advance to task k+1 (NOT re-try) | Predictable; user can :norris off + relaunch with refined goal if they need full re-plan. |
|
||
|
||
---
|
||
|
||
## 3. Module Changes
|
||
|
||
| File | State after Phase 9 | Phase 10 changes |
|
||
|---|---|---|
|
||
| `repl.lua` | `run_norris(goal)` builds helpers, runs while loop calling safety.norris_step | Pre-loop: if `cfg.norris.preplanner` set, fire one broker.chat against that preset; parse TASK lines; set `ctx.norris_tasks`. Per-iteration: bump `ctx.norris_tasks.current` after each non-terminal result; exit "tasks_complete" when exhausted. |
|
||
| `safety.lua` | norris_step composes the NORRIS suffix; uses model_cfg for broker call | Read `ctx.norris_tasks` if set; embed `[task k/N] <text>` into the suffix template OR pass via opts. Use `cfg.norris.executor` (resolved by repl.lua at run_norris launch) for the per-step broker call. |
|
||
| `context.lua` | system prompt composition + ctx.norris_active/norris_goal/norris_consecutive_skips | Add `ctx.norris_tasks` field (table or nil); clear on :reset (matches norris_goal lifecycle). NORRIS_SUFFIX_TEMPLATE extended to optionally show current task. |
|
||
| `executor.lua` | extract_cmd_lines, extract_cmd_bg_lines, extract_delegate_lines | Add `extract_task_lines(text)` — pure function. |
|
||
| `config.lua` | Phase 9 .aish.lua header + existing example blocks | Add commented-out `norris = { preplanner = "cloud", executor = "fast", tasks_max = 16 }` block. |
|
||
| `docs/PHASE0.md` | §11 lists phases 0-9 | Amendment: add Phase 10 row. |
|
||
|
||
No new module files.
|
||
|
||
---
|
||
|
||
## 4. Pillar 1 — Preplan call
|
||
|
||
```lua
|
||
-- repl.lua run_norris, pre-loop block:
|
||
local tasks
|
||
if config.norris and config.norris.preplanner then
|
||
local pre_name = config.norris.preplanner
|
||
local pre_cfg = config.models and config.models[pre_name]
|
||
if pre_cfg then
|
||
local sys = (config.norris and config.norris.preplan_system) or [[
|
||
You are a task decomposer. Given the user's goal, decompose it into a
|
||
sequence of single-step imperative TASKs. Output format: one TASK per
|
||
line, EXACTLY this shape:
|
||
|
||
TASK: <imperative sentence, max 80 chars>
|
||
|
||
Output AT MOST N tasks. No prose; no numbering; no commentary outside
|
||
the TASK: lines.
|
||
]]
|
||
sys = sys:gsub("N", tostring(config.norris.tasks_max or 16))
|
||
local msgs = scrub_messages({
|
||
{ role = "system", content = sys },
|
||
{ role = "user", content = goal },
|
||
}, secrets_mode_for(pre_cfg))
|
||
local text, usage = broker.chat(pre_cfg, msgs,
|
||
{ category = "norris-preplan",
|
||
max_tokens = 800, timeout_ms = 60000 })
|
||
if text then
|
||
if secrets_session then text = secrets_session:rehydrate(text) end
|
||
if usage then _record_usage(usage.model, usage.category, usage) end
|
||
local parsed = executor.extract_task_lines(text)
|
||
local cap = config.norris.tasks_max or 16
|
||
if #parsed > cap then
|
||
-- trim and warn
|
||
for i = #parsed, cap + 1, -1 do parsed[i] = nil end
|
||
renderer.status(("preplan emitted >%d tasks; truncated"):format(cap))
|
||
end
|
||
if #parsed > 0 then
|
||
tasks = parsed
|
||
renderer.status(("preplanned %d tasks via %s"):format(#tasks, pre_name))
|
||
else
|
||
renderer.status("preplan produced no TASK lines; running single-model")
|
||
end
|
||
else
|
||
renderer.status("preplan failed: " .. tostring(usage)
|
||
.. "; running single-model")
|
||
end
|
||
end
|
||
end
|
||
if tasks then
|
||
ctx.norris_tasks = { current = 1, list = tasks }
|
||
end
|
||
```
|
||
|
||
---
|
||
|
||
## 5. Pillar 2 — Executor loop
|
||
|
||
`safety.norris_step` extension: if `ctx.norris_tasks` is set, embed
|
||
the current task into the system suffix. The existing while loop in
|
||
`run_norris` already calls `norris_step` once per iteration; after
|
||
each `result.status == "continue"`, bump
|
||
`ctx.norris_tasks.current = ctx.norris_tasks.current + 1`. When
|
||
`current > #ctx.norris_tasks.list`, the loop exits with a
|
||
synthesized `"tasks_complete"` final status.
|
||
|
||
System suffix extension (context.lua NORRIS_SUFFIX_TEMPLATE):
|
||
|
||
```lua
|
||
local NORRIS_SUFFIX_TEMPLATE = [[
|
||
|
||
|
||
[NORRIS MODE] You are operating autonomously toward the following goal:
|
||
|
||
%s
|
||
|
||
%s
|
||
|
||
Plan and execute step by step ...
|
||
]]
|
||
|
||
-- Compose: 1st %s = goal; 2nd %s = task hint (empty when no tasks).
|
||
local function compose_norris_suffix(self)
|
||
local task_hint = ""
|
||
if self.norris_tasks and self.norris_tasks.list then
|
||
local k = self.norris_tasks.current
|
||
local n = #self.norris_tasks.list
|
||
if self.norris_tasks.list[k] then
|
||
task_hint = string.format(
|
||
"Current step %d/%d:\n %s\n", k, n, self.norris_tasks.list[k])
|
||
end
|
||
end
|
||
return string.format(NORRIS_SUFFIX_TEMPLATE, self.norris_goal, task_hint)
|
||
end
|
||
```
|
||
|
||
---
|
||
|
||
## 6. Pillar 3 — Cost + secrets composition
|
||
|
||
Preplan call goes through the same `broker.chat` API as Phase 7 cost-
|
||
accumulator wiring. `category = "norris-preplan"` tags it for
|
||
`:cost detail` separation:
|
||
|
||
```
|
||
[aish] session usage detail (total=$0.000119, 312/45 tokens):
|
||
anthropic/claude-haiku-4.5 norris-preplan 1 calls, 180 / 35 tokens, $0.000099
|
||
qwen-coder-7b-snappy-8k norris 5 calls, 132 / 10 tokens, $0.000000 (local)
|
||
[aish] estimated session ctx: 412 tokens; token_budget=4096 (10.1% used)
|
||
```
|
||
|
||
Secrets scrub fires before broker.chat sees the messages; rehydrate
|
||
on reply — same path as other call sites.
|
||
|
||
---
|
||
|
||
## 7. Pillar 4 — Graceful fall-back
|
||
|
||
If `cfg.norris.preplanner` is unset → `tasks = nil` → Norris behaves
|
||
as Phase 6 (single-model loop; existing semantics).
|
||
|
||
If preplan call fails (transport err, parse failure, empty list) →
|
||
status log + `tasks = nil` → same fall-back.
|
||
|
||
If executor model lookup fails (`cfg.norris.executor` names a
|
||
non-existent preset) → status log + use active_cfg (existing
|
||
behavior). User can fix config and re-launch.
|
||
|
||
If `:reset` clears the conversation mid-Norris → existing behavior
|
||
clears turns; `ctx.norris_tasks` should ALSO clear since the goal
|
||
context is gone. Document in §9.
|
||
|
||
---
|
||
|
||
## 8. UX Surface Summary
|
||
|
||
| Config | Default | Effect |
|
||
|---|---|---|
|
||
| `cfg.norris.preplanner` | nil | Name of model preset for the preplan call; absent = no split |
|
||
| `cfg.norris.executor` | nil (uses active model) | Name of model preset for per-step execution |
|
||
| `cfg.norris.tasks_max` | 16 | Cap on TASK list size (parse-time trim) |
|
||
| `cfg.norris.preplan_system` | (built-in template) | Override preplan system prompt |
|
||
|
||
| Startup status | Behavior |
|
||
|---|---|
|
||
| (preplan unset) | nothing — existing single-model Norris |
|
||
| (preplan success) | `[aish] preplanned N tasks via <preplanner>` |
|
||
| (preplan failed) | `[aish] preplan failed: <reason>; running single-model` |
|
||
| (preplan over cap) | `[aish] preplan emitted >N tasks; truncated` |
|
||
|
||
No new meta commands in v1. Inspect via `:cost detail` (separate
|
||
norris-preplan row) and the existing `:history` (preplan call + reply
|
||
become assistant turns visible there).
|
||
|
||
---
|
||
|
||
## 9. Out of Scope (Phase 10)
|
||
|
||
- **Mid-flight re-plan** — preplan fires ONCE per Norris launch.
|
||
Re-plan based on per-step results would be a separate iteration;
|
||
user can `:norris off` + re-launch with refined goal for v1.
|
||
- **Adaptive task decomposition** — TASKs are fixed at launch; the
|
||
executor doesn't get to refine them. v1 trusts the preplanner's
|
||
parse.
|
||
- **Multi-step task = sub-tasks** — flat list only. Nested TASK
|
||
hierarchies are a future shape.
|
||
- **Skip-then-retry** — skip at HALT advances to the next task; no
|
||
retry mechanism. User re-launches if they need a retry.
|
||
- **Per-task model selection** — single executor model for the whole
|
||
session. Per-task routing (e.g. some tasks → cloud, some → local)
|
||
is interesting but bigger surface; defer.
|
||
- **Preplan-while-executing** — sequential: preplan first, THEN
|
||
execute. Streaming overlap is a future optimization.
|
||
|
||
---
|
||
|
||
## 10. Risks
|
||
|
||
| Risk | Mitigation |
|
||
|---|---|
|
||
| Preplan model emits malformed output (no `TASK:` lines, or wraps in markdown) | extract_task_lines tolerates leading whitespace + ignores non-TASK lines. If zero TASKs parsed, fall back to single-model. |
|
||
| Preplanner cost surprises user (silent paid call on every :norris launch) | Phase 7 cost meter accounts it under `norris-preplan` category; warn_at_dollars still fires. Default = unset (no automatic cost). |
|
||
| Task list is wrong / off-goal | Executor still has the global GOAL in the NORRIS suffix; can deviate per-step. Skip-budget per Phase 3 still escalates. User retains `:norris off` abort. |
|
||
| Local executor can't actually do a planned step (model too weak) | Same as today's Norris-on-local case — model emits something useless; HALT prompt lets user skip or abort. Phase 10 doesn't fix this; preplan + execute split makes the failure mode more visible (you can SEE which TASK is stuck). |
|
||
| ctx.norris_tasks survives across non-:reset session boundaries | Cleared at Norris exit (in run_norris's finally-equivalent) so re-launching Norris in same session starts fresh. |
|
||
| Eviction during long Norris session removes preplan + first executor turns | Tasks stored on ctx (NOT in turns); survive eviction. Per Phase 3 R-C3 the goal anchor in the NORRIS suffix also survives. |
|
||
| Preplan system prompt drift (user overrides badly) | Built-in fallback if cfg.norris.preplan_system absent; user override is opt-in. |
|
||
| Anthropic cloud preplan emits "Here's my plan:\n1. ...\n2. ..." (markdown numbering) instead of TASK: lines | extract_task_lines uses strict `^TASK:` matcher; markdown lists are ignored. preplan_system explicitly demands the format. If real cloud models drift, document or refine prompt at impl time. |
|
||
|
||
---
|
||
|
||
## 11. Open Questions — RESOLVED (analyze step)
|
||
|
||
| # | Question | Resolution |
|
||
|---|---|---|
|
||
| Q-PP1 | `cfg.norris.executor` applies even without preplanner? | **YES.** Resolving the executor is independent of preplan. If `cfg.norris.executor` names a valid preset, `run_norris` uses it for `safety.norris_step` regardless of preplanner state. Preplanner unset + executor set = "always use cloud-haiku for Norris steps even though my interactive `:model` is qwen-coder". Useful split. |
|
||
| Q-PP2 | Stream the preplan TASKs as they're emitted? | **NO (v1 = non-streaming).** Use `broker.chat` (non-streaming) for preplan. Preplan emits ~16 × ~10 tokens = ~160 tokens total; on cloud Haiku that's <2s. Print the full TASK list at completion (`[aish] preplanned N tasks via cloud`) rather than streaming letter-by-letter. Streaming adds latency variance + screen flicker for sub-2s win. Reconsider if real-world preplan latency exceeds 5s. |
|
||
| Q-PP3 | Re-launch fires preplan again? | **YES, naturally.** Each `:norris <goal>` re-enters `run_norris`. The pre-loop preplan block runs (different goal → different decomposition). `ctx.norris_tasks` is overwritten. No special re-launch logic needed; falls out of lifecycle. |
|
||
| Q-PP4 | Executor sees full goal AND current task? | **BOTH.** Goal anchor in NORRIS suffix (existing) + a NEW optional task-hint block appended right after. The executor planner can use the goal to detect off-track tasks and adjust its CMD: emission. |
|
||
| Q-PP5 | `:norris` (no args) reports tasks state? | **No — out-of-scope v1.** Inside Norris there's no readline prompt; meta commands aren't reachable. After exit, `ctx.norris_tasks` is cleared. The renderer's per-step `[step k/N: <task>]` line is the user-facing readout. Re-consider if users ask for a "task plan preview before execution" mode. |
|
||
| Q-PP6 | 1-task degenerate case? | **Run as normal, no special case.** Functionally identical to single-model Norris (executor sees goal + single TASK hint). Preplanner cost is the only delta. Acceptable. |
|
||
|
||
**Additional findings from code reading:**
|
||
|
||
- `safety.norris_step(ctx, model_cfg, ...)` takes `model_cfg` as a parameter. **Implication:** `run_norris` resolves the executor cfg ONCE pre-loop and passes it in every iteration. No signature change to safety.lua. The "executor" is just a different `model_cfg` than `active_cfg`.
|
||
- `Context:reset()` does NOT touch `norris_goal`/`norris_active` (Norris state is owned by `run_norris`, set on entry + cleared on exit). `ctx.norris_tasks` follows the same lifecycle: created at preplan, cleared at `run_norris` exit, NOT by `:reset` (which is unreachable mid-Norris anyway).
|
||
- `NORRIS_SUFFIX_TEMPLATE` has one `%s` slot for goal. Don't change the template; **append** a `compose_norris_task_hint(self)` helper output AFTER the formatted suffix. Keeps the template stable; the hint block is additive.
|
||
- Preplan call lives in `repl.lua` (not `safety.lua`) — keeps safety's invariant "single broker round-trip per call". Repl already orchestrates multi-call flows (Norris loop, secrets rehydration, routing); preplan is one more pre-loop hook.
|
||
- The renderer needs a per-step prefix showing `[step k/N: <task>]`. `renderer.norris_step` currently takes `(n, max_n)`; extend to `(n, max_n, descr)` — descr was already in the signature per the helpers contract above (line 339 of safety.lua), but `run_norris` doesn't pass it today. Phase 10 wiring fills that gap.
|
||
|
||
---
|
||
|
||
## 11b. Plan — commit-by-commit roadmap (5 commits)
|
||
|
||
| # | Commit subject | Files | Why this slice |
|
||
|---|---|---|---|
|
||
| 1 | `executor: extract_task_lines for Phase 10 preplan parsing` | executor.lua + inline test | Pure function; verifiable standalone. Locks the TASK: parse contract before the preplan call wires it. |
|
||
| 2 | `context: norris_tasks anchor + task-hint composition` | context.lua + inline test | New field on Context. Adds `compose_norris_task_hint(self)`; appends after the NORRIS suffix. ctx.norris_tasks is nil by default → no regression. |
|
||
| 3 | `safety/renderer: pass current task descr through norris_step` | safety.lua + repl.lua tiny wiring | One-line tweak in safety.lua to source descr from ctx.norris_tasks. helpers.render_step already accepts descr (line 246 of renderer.lua). |
|
||
| 4 | `repl: preplan + executor cfg resolution + tasks_max truncate (closes #89)` | repl.lua | The orchestration commit. Pre-loop preplan block; fall-back paths; executor cfg resolution (active_cfg vs cfg.norris.executor); ctx.norris_tasks lifecycle. |
|
||
| 5 | `phase10: config example + MEMORY index + project status` | config.lua, MEMORY.md, memory/project_phase_status.md | Documentation + persistent project state. Ships the user-visible config block. |
|
||
|
||
Each commit must leave the tree in a state where `luajit main.lua` runs and existing tests pass; commits 1-3 ship behind a feature-unused-yet stance (nothing calls them), commit 4 lights them up, commit 5 documents.
|
||
|
||
### Per-commit verification
|
||
|
||
- **C1**: 6 inline unit cases for `extract_task_lines`: empty input → {}, single TASK → {it}, mixed CMD+TASK → only TASKs, leading whitespace tolerated, blank lines ignored, > tasks_max → caller's job to cap (function itself just parses). test runs from repo root.
|
||
- **C2**: 5 inline unit cases for `compose_norris_task_hint`: nil tasks → "", empty list → "", current=1 of 3 → contains "step 1/3", current > #list → "" (completed), full to_messages render with tasks shows hint in system content. self.turns + self.norris_tasks unmutated.
|
||
- **C3**: safety_test snapshot still 87/87 (no behavior change for the no-tasks path). Manual run of single-model Norris to confirm no regression.
|
||
- **C4**: E2E with cfg.norris.preplanner=cloud + executor=fast. Goal: `find files larger than 10MB in /var/log and report sizes`. Verify preplan emits 2-5 tasks; executor runs each. :cost detail shows two model rows. Fall-back E2E with preplanner pointing to bogus model → status log + normal Norris.
|
||
- **C5**: visual inspection of config.lua. MEMORY.md + project_phase_status.md updated to "Phase 0-10 done".
|
||
|
||
### Resolved review tickets folded into the plan
|
||
|
||
(None yet — Sonnet review runs after this manifest is committed.)
|
||
---
|
||
|
||
## 12. Phase 10 → Phase 11+ Out-of-band
|
||
|
||
Candidate follow-ups (non-binding):
|
||
|
||
- **Phase 11**: cross-session cost rollup (Phase 7 §12 option 1 —
|
||
long-deferred).
|
||
- **Cost preflight enforcement** (Phase 7 §12 option 2 — also long-
|
||
deferred; Phase 8's accurate counts are the prerequisite).
|
||
- **Mid-flight Norris re-plan** — preplanner gets to re-decompose
|
||
based on executor progress. Real value, but needs careful
|
||
state-machine design (when to re-plan, how to preserve already-
|
||
completed work).
|
||
- **Per-task model selection** — task could carry a model hint
|
||
emitted by the preplanner.
|
||
|
||
Phase 10 itself is self-contained — depends on Phase 3 (Norris) +
|
||
Phase 7 (cost accumulator) which are both implemented.
|