Files
aish/docs/PHASE10.md
T
marfrit cbef05ff40 phase10: fold in Sonnet review — 2 blockers + 4 important + 2 nits
All 8 actionable findings accepted; R9-R11 were confirmations.

Blockers:
  - R1: sys:gsub("N", ...) would corrupt "No prose / commentary /
    numbering" → "16o prose" etc. Switch to %d + string.format.
  - R2: §5 had a 2-slot NORRIS_SUFFIX_TEMPLATE redesign that
    contradicted §11's "don't change the template; append helper
    output after". §5 now shows the helper-append approach.

Important:
  - R3: preplan bypasses call_broker (no fallback retry) — keep that
    by design; retry would silently swap planning models. Documented
    in §10 Risks so it doesn't get "fixed" later.
  - R4: no pcall around run_norris → ctx.norris_active/_goal/_tasks
    can leak across launches if a Norris step crashes. Fix: clear all
    three at the TOP of run_norris before preplan. Cheaper than full
    pcall wrap; handles the stale-tasks vector.
  - R5: clarified C3 commit scope — safety.lua ONLY in C3; the
    executor cfg resolution + preplan wiring lands in C4.
  - R6: Context:reset() now also clears self.norris_tasks (defensive;
    :reset is unreachable mid-Norris but one line is cheap).

Nits:
  - R7: timeout_ms = pre_cfg.timeout_ms or 60000 (respect the
    configured per-model timeout).
  - R8: "Status:" → "Terminal output:" in §1 acceptance criterion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:17:30 +00:00

392 lines
26 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# aish — Phase 10 Manifest
**Project:** aish — AI-augmented conversational shell
**Document:** Phase 10 Requirements, Architecture & Design Decisions
**Status:** Formulate (pre-analyze)
**Date:** 2026-05-17
PHASE0 is the locked substrate; PHASE1-9 are layered on top. This
manifest specifies what Phase 10 adds — **Cloud preplanner → local
executor split** for Norris autonomous mode. Resolves Gitea issue #89.
Today Norris runs entirely on ONE model: pick cloud (capable but slow
per step + costs per step) OR local (fast + free per step but easily
distracted on multi-step planning). Phase 10 splits the planning and
execution roles: cloud emits a TASK list ONCE per Norris session;
local model executes each task. Most tasks are simple shell ops the
local model handles fine; cloud is used only at the planning layer
that benefits from its reasoning.
PHASE0 §11 amendment to add Phase 10 row lands in the same commit
as this formulate doc.
---
## 1. Scope of Phase 10
Four pillars:
1. **Preplan call** — on `:norris <goal>` launch, if `cfg.norris.preplanner`
names a configured model preset, fire ONE broker.chat call against
that preset with a system-prompt asking for `TASK: <imperative>` lines.
Parse them into a list; cap at `cfg.norris.tasks_max` (default 16).
Stash the list + current index on ctx (separate from ctx.turns so
eviction can't lose them — mirrors the ctx.norris_goal anchor).
2. **Executor loop**`safety.norris_step` already iterates per-step;
extend its prompt to include the CURRENT task. Synthesize a user-
turn-shaped `[task k/N] <task text>` block fed alongside the
existing NORRIS suffix. When all tasks consumed (or executor signals
GOAL: complete early), Norris exits.
3. **Cost + secrets composition** — preplan call goes through the
normal scrub_messages + on_delta usage callbacks. Category
`"norris-preplan"`; executor steps keep `"norris"`. `:cost detail`
surfaces both as separate rows.
4. **Graceful fall-back** — if `cfg.norris.preplanner` is unset OR
the preplan call fails (transport err, parse failure, empty list),
Norris runs as today: single model handles both planning and
execution via the existing in-loop reasoning. No regression for
users without Phase 10 config.
**Phase 10 is done when:**
- `:norris find files larger than 10MB in /var/log and report sizes`
launched with `cfg.norris.preplanner = "cloud"` + `cfg.norris.executor
= "fast"`:
1. Cloud emits a TASK list (e.g., `TASK: find /var/log -size +10M`;
`TASK: stat -c "%n %s" <results>`; `TASK: format and report`).
2. Terminal output: `[aish] preplanned 3 tasks via cloud` (R8: was "Status:")
3. Per-step execution by `fast`: each step shows the task it's
working on; existing HALT protocol still gates destructive ops.
- Without `cfg.norris.preplanner`, Norris behaves exactly as Phase 6
(no regression for existing users).
- Preplan failure (broken cloud endpoint) → status log + fall back
to single-model Norris.
- `:cost detail` after a Norris session shows BOTH
`cloud / norris-preplan` (one row) and `<executor model> / norris`
(one row).
---
## 2. Technology Decisions (delta from Phase 9)
| Decision | Choice | Rationale |
|---|---|---|
| Preplan trigger | ONCE at `:norris <goal>` launch (run_norris in repl.lua) | One round-trip per Norris session keeps cost predictable. Re-planning mid-flight deferred to a future iteration. |
| Preplan model selection | `cfg.norris.preplanner` (string; matches a key in cfg.models) | Same shape as `cfg.safety.llm_model`. Optional; absent = no split, existing behavior. |
| Executor model selection | `cfg.norris.executor` (string; matches cfg.models key) | Optional; absent = active_cfg (the user's `:model` choice at launch — existing behavior). |
| Preplan system prompt | Static template baked into safety.lua: "Decompose the goal into single-step imperative TASKs. Output format: TASK: <imperative sentence, max 80 chars>. Maximum N tasks." with N = cfg.norris.tasks_max | Predictable parse; small surface. Override via cfg.norris.preplan_system if user wants. |
| TASK line parsing | `^TASK:%s*(.+)$` per line; trim whitespace; filter empty | Same shape as the existing CMD: / DELEGATE: / CMD&: extractors in executor.lua. Trivially adapt extract_*_lines. |
| Task storage | `ctx.norris_tasks = { current = 1, list = {...} }` (NEW field, separate from ctx.turns) | Survives eviction (mirrors ctx.norris_goal anchor); cleared at Norris exit. |
| Step-prompt synthesis | `safety.norris_step` reads `ctx.norris_tasks.list[current]` and prepends `[task k/N] <text>` to the rendered messages (system block? or synth user turn?). Decision: prepend to the NORRIS suffix already in the system prompt. | Keeps user-turn alternation legal; NORRIS suffix already exists and is per-turn re-composed. |
| Per-task advance | After `safety.norris_step` returns "continue", repl.lua's run_norris bumps `ctx.norris_tasks.current`. When current > #list, Norris exits with status "tasks_complete". | Same as the existing step counter; just tied to the task list now. |
| Goal anchor + task layered together | Both visible in the NORRIS suffix: `goal:` line (existing) + `current task k/N:` line (new) | Planner-executor still sees the global goal AND the current focus. |
| Preplan parse failure | Status log + fall back to single-model Norris (no tasks) | Robust; user can re-launch :norris if preplan was wonky. |
| Preplan empty result | Same as parse failure — fall back | Robust. |
| tasks_max cap | Default 16; cfg.norris.tasks_max overrides | Bounded blast radius; matches the existing max_norris_steps cap intent. |
| Cost category | "norris-preplan" for the preplan call; "norris" for executor steps (existing) | `:cost detail` surfaces them as separate rows. |
| Secrets/scrub | Preplan call goes through scrub_messages + rehydrate (matches all other broker calls in repl.lua) | No special-case. |
| Norris HALT protocol | Unchanged — per executor step | Existing safety.is_destructive + halt-proceed/skip/abort still gates. |
| Skip semantics | If user halts and skips at task k, advance to task k+1 (NOT re-try) | Predictable; user can :norris off + relaunch with refined goal if they need full re-plan. |
---
## 3. Module Changes
| File | State after Phase 9 | Phase 10 changes |
|---|---|---|
| `repl.lua` | `run_norris(goal)` builds helpers, runs while loop calling safety.norris_step | Pre-loop: if `cfg.norris.preplanner` set, fire one broker.chat against that preset; parse TASK lines; set `ctx.norris_tasks`. Per-iteration: bump `ctx.norris_tasks.current` after each non-terminal result; exit "tasks_complete" when exhausted. |
| `safety.lua` | norris_step composes the NORRIS suffix; uses model_cfg for broker call | Read `ctx.norris_tasks` if set; embed `[task k/N] <text>` into the suffix template OR pass via opts. Use `cfg.norris.executor` (resolved by repl.lua at run_norris launch) for the per-step broker call. |
| `context.lua` | system prompt composition + ctx.norris_active/norris_goal/norris_consecutive_skips | Add `ctx.norris_tasks` field (table or nil); clear on :reset (matches norris_goal lifecycle). NORRIS_SUFFIX_TEMPLATE extended to optionally show current task. |
| `executor.lua` | extract_cmd_lines, extract_cmd_bg_lines, extract_delegate_lines | Add `extract_task_lines(text)` — pure function. |
| `config.lua` | Phase 9 .aish.lua header + existing example blocks | Add commented-out `norris = { preplanner = "cloud", executor = "fast", tasks_max = 16 }` block. |
| `docs/PHASE0.md` | §11 lists phases 0-9 | Amendment: add Phase 10 row. |
No new module files.
---
## 4. Pillar 1 — Preplan call
```lua
-- repl.lua run_norris, pre-loop block:
local tasks
if config.norris and config.norris.preplanner then
local pre_name = config.norris.preplanner
local pre_cfg = config.models and config.models[pre_name]
if pre_cfg then
local sys = (config.norris and config.norris.preplan_system) or [[
You are a task decomposer. Given the user's goal, decompose it into a
sequence of single-step imperative TASKs. Output format: one TASK per
line, EXACTLY this shape:
TASK: <imperative sentence, max 80 chars>
Output AT MOST %d tasks. No prose; no numbering; no commentary outside
the TASK: lines.
]]
-- R1 fix: %d via string.format; gsub("N", ...) would corrupt
-- "No prose / No commentary / No numbering" → "16o prose" etc.
sys = string.format(sys, config.norris.tasks_max or 16)
local msgs = scrub_messages({
{ role = "system", content = sys },
{ role = "user", content = goal },
}, secrets_mode_for(pre_cfg))
local text, usage = broker.chat(pre_cfg, msgs,
{ category = "norris-preplan",
max_tokens = 800,
-- R7 fix: respect the model's configured timeout
timeout_ms = pre_cfg.timeout_ms or 60000 })
if text then
if secrets_session then text = secrets_session:rehydrate(text) end
if usage then _record_usage(usage.model, usage.category, usage) end
local parsed = executor.extract_task_lines(text)
local cap = config.norris.tasks_max or 16
if #parsed > cap then
-- trim and warn
for i = #parsed, cap + 1, -1 do parsed[i] = nil end
renderer.status(("preplan emitted >%d tasks; truncated"):format(cap))
end
if #parsed > 0 then
tasks = parsed
renderer.status(("preplanned %d tasks via %s"):format(#tasks, pre_name))
else
renderer.status("preplan produced no TASK lines; running single-model")
end
else
renderer.status("preplan failed: " .. tostring(usage)
.. "; running single-model")
end
end
end
if tasks then
ctx.norris_tasks = { current = 1, list = tasks }
end
```
---
## 5. Pillar 2 — Executor loop
`safety.norris_step` extension: if `ctx.norris_tasks` is set, embed
the current task into the system suffix. The existing while loop in
`run_norris` already calls `norris_step` once per iteration; after
each `result.status == "continue"`, bump
`ctx.norris_tasks.current = ctx.norris_tasks.current + 1`. When
`current > #ctx.norris_tasks.list`, the loop exits with a
synthesized `"tasks_complete"` final status.
System suffix extension (R2 fix — keep NORRIS_SUFFIX_TEMPLATE
**unchanged**; append a task-hint block AFTER the existing format):
```lua
-- New helper at module scope in context.lua, alongside NORRIS_SUFFIX_TEMPLATE:
local function compose_norris_task_hint(self)
if not (self.norris_tasks and self.norris_tasks.list) then return "" end
local k = self.norris_tasks.current
local n = #self.norris_tasks.list
local task = self.norris_tasks.list[k]
if not task then return "" end -- exhausted → no hint
return string.format(
"\n\nCurrent step %d/%d:\n %s", k, n, task)
end
-- In Context:to_messages, AFTER the existing string.format(NORRIS_SUFFIX...)
-- block, append the hint:
if self.norris_active and self.norris_goal then
sys_content = sys_content
.. string.format(NORRIS_SUFFIX_TEMPLATE, self.norris_goal)
.. compose_norris_task_hint(self)
end
```
Also (R6 fix) defensive clear in `Context:reset()`:
```lua
function Context:reset()
self.turns = {}
self.pending_exec_output = nil
self.summary = nil
self.norris_tasks = nil -- R6: defensive; :reset is unreachable
-- mid-Norris but cheap to be safe.
end
```
---
## 6. Pillar 3 — Cost + secrets composition
Preplan call goes through the same `broker.chat` API as Phase 7 cost-
accumulator wiring. `category = "norris-preplan"` tags it for
`:cost detail` separation:
```
[aish] session usage detail (total=$0.000119, 312/45 tokens):
anthropic/claude-haiku-4.5 norris-preplan 1 calls, 180 / 35 tokens, $0.000099
qwen-coder-7b-snappy-8k norris 5 calls, 132 / 10 tokens, $0.000000 (local)
[aish] estimated session ctx: 412 tokens; token_budget=4096 (10.1% used)
```
Secrets scrub fires before broker.chat sees the messages; rehydrate
on reply — same path as other call sites.
---
## 7. Pillar 4 — Graceful fall-back
If `cfg.norris.preplanner` is unset → `tasks = nil` → Norris behaves
as Phase 6 (single-model loop; existing semantics).
If preplan call fails (transport err, parse failure, empty list) →
status log + `tasks = nil` → same fall-back.
If executor model lookup fails (`cfg.norris.executor` names a
non-existent preset) → status log + use active_cfg (existing
behavior). User can fix config and re-launch.
If `:reset` is invoked → unreachable mid-Norris (no readline prompt
while the planner is running). Out-of-Norris, `Context:reset()` now
also clears `self.norris_tasks` as defensive coding (R6 fix).
R4: `run_norris` clears `ctx.norris_active`/`ctx.norris_goal`/
`ctx.norris_tasks` at the **top** of the function, BEFORE the preplan
block. This guarantees a fresh launch starts clean even if a prior
Norris session crashed with stale state. Cheaper than wrapping the
whole driver in pcall.
---
## 8. UX Surface Summary
| Config | Default | Effect |
|---|---|---|
| `cfg.norris.preplanner` | nil | Name of model preset for the preplan call; absent = no split |
| `cfg.norris.executor` | nil (uses active model) | Name of model preset for per-step execution |
| `cfg.norris.tasks_max` | 16 | Cap on TASK list size (parse-time trim) |
| `cfg.norris.preplan_system` | (built-in template) | Override preplan system prompt |
| Startup status | Behavior |
|---|---|
| (preplan unset) | nothing — existing single-model Norris |
| (preplan success) | `[aish] preplanned N tasks via <preplanner>` |
| (preplan failed) | `[aish] preplan failed: <reason>; running single-model` |
| (preplan over cap) | `[aish] preplan emitted >N tasks; truncated` |
No new meta commands in v1. Inspect via `:cost detail` (separate
norris-preplan row) and the existing `:history` (preplan call + reply
become assistant turns visible there).
---
## 9. Out of Scope (Phase 10)
- **Mid-flight re-plan** — preplan fires ONCE per Norris launch.
Re-plan based on per-step results would be a separate iteration;
user can `:norris off` + re-launch with refined goal for v1.
- **Adaptive task decomposition** — TASKs are fixed at launch; the
executor doesn't get to refine them. v1 trusts the preplanner's
parse.
- **Multi-step task = sub-tasks** — flat list only. Nested TASK
hierarchies are a future shape.
- **Skip-then-retry** — skip at HALT advances to the next task; no
retry mechanism. User re-launches if they need a retry.
- **Per-task model selection** — single executor model for the whole
session. Per-task routing (e.g. some tasks → cloud, some → local)
is interesting but bigger surface; defer.
- **Preplan-while-executing** — sequential: preplan first, THEN
execute. Streaming overlap is a future optimization.
---
## 10. Risks
| Risk | Mitigation |
|---|---|
| Preplan model emits malformed output (no `TASK:` lines, or wraps in markdown) | extract_task_lines tolerates leading whitespace + ignores non-TASK lines. If zero TASKs parsed, fall back to single-model. |
| Preplanner cost surprises user (silent paid call on every :norris launch) | Phase 7 cost meter accounts it under `norris-preplan` category; warn_at_dollars still fires. Default = unset (no automatic cost). |
| Task list is wrong / off-goal | Executor still has the global GOAL in the NORRIS suffix; can deviate per-step. Skip-budget per Phase 3 still escalates. User retains `:norris off` abort. |
| Local executor can't actually do a planned step (model too weak) | Same as today's Norris-on-local case — model emits something useless; HALT prompt lets user skip or abort. Phase 10 doesn't fix this; preplan + execute split makes the failure mode more visible (you can SEE which TASK is stuck). |
| ctx.norris_tasks survives across non-:reset session boundaries | Cleared at Norris exit (in run_norris's finally-equivalent) so re-launching Norris in same session starts fresh. |
| Eviction during long Norris session removes preplan + first executor turns | Tasks stored on ctx (NOT in turns); survive eviction. Per Phase 3 R-C3 the goal anchor in the NORRIS suffix also survives. |
| Preplan system prompt drift (user overrides badly) | Built-in fallback if cfg.norris.preplan_system absent; user override is opt-in. |
| Anthropic cloud preplan emits "Here's my plan:\n1. ...\n2. ..." (markdown numbering) instead of TASK: lines | extract_task_lines uses strict `^TASK:` matcher; markdown lists are ignored. preplan_system explicitly demands the format. If real cloud models drift, document or refine prompt at impl time. |
| R3: preplan call bypasses `call_broker` (Phase 5 fallback-retry wrapper) | **By design** — retrying the preplan against `fallback_model` would produce a different decomposition from a different model. That's not a recovery; it's a silent semantics change. Hard-fail to single-model Norris is the safer fallback. Documented here so a future maintainer doesn't "fix" it by wiring `call_broker` and surprise users. |
---
## 11. Open Questions — RESOLVED (analyze step)
| # | Question | Resolution |
|---|---|---|
| Q-PP1 | `cfg.norris.executor` applies even without preplanner? | **YES.** Resolving the executor is independent of preplan. If `cfg.norris.executor` names a valid preset, `run_norris` uses it for `safety.norris_step` regardless of preplanner state. Preplanner unset + executor set = "always use cloud-haiku for Norris steps even though my interactive `:model` is qwen-coder". Useful split. |
| Q-PP2 | Stream the preplan TASKs as they're emitted? | **NO (v1 = non-streaming).** Use `broker.chat` (non-streaming) for preplan. Preplan emits ~16 × ~10 tokens = ~160 tokens total; on cloud Haiku that's <2s. Print the full TASK list at completion (`[aish] preplanned N tasks via cloud`) rather than streaming letter-by-letter. Streaming adds latency variance + screen flicker for sub-2s win. Reconsider if real-world preplan latency exceeds 5s. |
| Q-PP3 | Re-launch fires preplan again? | **YES, naturally.** Each `:norris <goal>` re-enters `run_norris`. The pre-loop preplan block runs (different goal → different decomposition). `ctx.norris_tasks` is overwritten. No special re-launch logic needed; falls out of lifecycle. |
| Q-PP4 | Executor sees full goal AND current task? | **BOTH.** Goal anchor in NORRIS suffix (existing) + a NEW optional task-hint block appended right after. The executor planner can use the goal to detect off-track tasks and adjust its CMD: emission. |
| Q-PP5 | `:norris` (no args) reports tasks state? | **No — out-of-scope v1.** Inside Norris there's no readline prompt; meta commands aren't reachable. After exit, `ctx.norris_tasks` is cleared. The renderer's per-step `[step k/N: <task>]` line is the user-facing readout. Re-consider if users ask for a "task plan preview before execution" mode. |
| Q-PP6 | 1-task degenerate case? | **Run as normal, no special case.** Functionally identical to single-model Norris (executor sees goal + single TASK hint). Preplanner cost is the only delta. Acceptable. |
**Additional findings from code reading:**
- `safety.norris_step(ctx, model_cfg, ...)` takes `model_cfg` as a parameter. **Implication:** `run_norris` resolves the executor cfg ONCE pre-loop and passes it in every iteration. No signature change to safety.lua. The "executor" is just a different `model_cfg` than `active_cfg`.
- `Context:reset()` does NOT touch `norris_goal`/`norris_active` (Norris state is owned by `run_norris`, set on entry + cleared on exit). `ctx.norris_tasks` follows the same lifecycle: created at preplan, cleared at `run_norris` exit, NOT by `:reset` (which is unreachable mid-Norris anyway).
- `NORRIS_SUFFIX_TEMPLATE` has one `%s` slot for goal. Don't change the template; **append** a `compose_norris_task_hint(self)` helper output AFTER the formatted suffix. Keeps the template stable; the hint block is additive.
- Preplan call lives in `repl.lua` (not `safety.lua`) — keeps safety's invariant "single broker round-trip per call". Repl already orchestrates multi-call flows (Norris loop, secrets rehydration, routing); preplan is one more pre-loop hook.
- The renderer needs a per-step prefix showing `[step k/N: <task>]`. `renderer.norris_step` currently takes `(n, max_n)`; extend to `(n, max_n, descr)` — descr was already in the signature per the helpers contract above (line 339 of safety.lua), but `run_norris` doesn't pass it today. Phase 10 wiring fills that gap.
---
## 11b. Plan — commit-by-commit roadmap (5 commits)
| # | Commit subject | Files | Why this slice |
|---|---|---|---|
| 1 | `executor: extract_task_lines for Phase 10 preplan parsing` | executor.lua + inline test | Pure function; verifiable standalone. Locks the TASK: parse contract before the preplan call wires it. |
| 2 | `context: norris_tasks anchor + task-hint composition` | context.lua + inline test | New field on Context. Adds `compose_norris_task_hint(self)`; appends after the NORRIS suffix. ctx.norris_tasks is nil by default → no regression. |
| 3 | `safety: pass current task descr to render_step from norris_step` | safety.lua ONLY | One-line tweak in safety.lua to source `descr` from `ctx.norris_tasks` and pass to `helpers.render_step(step_n, max_steps, descr)`. **No repl.lua change in this commit** (R5 clarification). |
| 4 | `repl: preplan + executor cfg resolution + tasks_max truncate (closes #89)` | repl.lua | The orchestration commit. Pre-loop preplan block; fall-back paths; executor cfg resolution (`active_cfg` vs `cfg.norris.executor`); `ctx.norris_tasks` lifecycle (clear-at-top per R4); pass executor_cfg to safety.norris_step instead of active_cfg. |
| 5 | `phase10: config example + MEMORY index + project status` | config.lua, MEMORY.md, memory/project_phase_status.md | Documentation + persistent project state. Ships the user-visible config block. |
Each commit must leave the tree in a state where `luajit main.lua` runs and existing tests pass; commits 1-3 ship behind a feature-unused-yet stance (nothing calls them), commit 4 lights them up, commit 5 documents.
### Per-commit verification
- **C1**: 6 inline unit cases for `extract_task_lines`: empty input → {}, single TASK → {it}, mixed CMD+TASK → only TASKs, leading whitespace tolerated, blank lines ignored, > tasks_max → caller's job to cap (function itself just parses). test runs from repo root.
- **C2**: 5 inline unit cases for `compose_norris_task_hint`: nil tasks → "", empty list → "", current=1 of 3 → contains "step 1/3", current > #list → "" (completed), full to_messages render with tasks shows hint in system content. self.turns + self.norris_tasks unmutated.
- **C3**: safety_test snapshot still 87/87 (no behavior change for the no-tasks path). Manual run of single-model Norris to confirm no regression.
- **C4**: E2E with cfg.norris.preplanner=cloud + executor=fast. Goal: `find files larger than 10MB in /var/log and report sizes`. Verify preplan emits 2-5 tasks; executor runs each. :cost detail shows two model rows. Fall-back E2E with preplanner pointing to bogus model → status log + normal Norris.
- **C5**: visual inspection of config.lua. MEMORY.md + project_phase_status.md updated to "Phase 0-10 done".
### Resolved review tickets folded into the plan
**Sonnet review 2026-05-17 — 2 blockers + 4 important + 2 nits. All accepted.**
- **R1 (blocker)** `sys:gsub("N", ...)` would corrupt "No prose", "No commentary", "No numbering" → "16o prose". **Fix**: use `string.format` with `%d` in the template, replace the gsub call.
- **R2 (blocker)** §5 pseudocode showed a 2-slot NORRIS_SUFFIX_TEMPLATE redesign, contradicting §11's "don't change the template; append helper output AFTER". **Fix**: §5 below now shows the helper-append approach matching §11.
- **R3 (important)** Preplan call bypasses `call_broker` (Phase 5 fallback-retry wrapper). **Decision: intentional** — fallback for a preplan call would produce a different decomposition from a different model, which is actively undesirable. Documented in §10 Risks.
- **R4 (important)** No pcall around `run_norris` → stale `ctx.norris_active`/`norris_goal`/`norris_tasks` on uncaught error. Pre-existing bug; Phase 10 adds one more leaky field. **Fix**: clear all three at the TOP of `run_norris` (before preplan) so a fresh launch always starts clean regardless of prior crash. Cheaper than full pcall wrap; sufficient for the stale-tasks vector.
- **R5 (important)** C3 commit scope ambiguity. **Clarification**: C3's "tiny repl.lua wiring" is ONLY passing `descr` to `render_step`. Executor cfg resolution (active_cfg vs cfg.norris.executor) lands in C4 alongside the preplan block. Table updated.
- **R6 (important)** `ctx.norris_tasks` lifecycle vs `Context:reset()`. **Fix**: add `self.norris_tasks = nil` to `Context:reset()` as defensive coding (one line, no regression). §7 amended to remove the contradictory "Document in §9" deferral.
- **R7 (nit)** Hardcoded `timeout_ms = 60000` ignores `pre_cfg.timeout_ms`. **Fix**: `pre_cfg.timeout_ms or 60000` in §4 pseudocode.
- **R8 (nit)** "Status:" label in §1 acceptance criterion could be misread as on-screen prefix. **Fix**: rename to "Terminal output:".
- **R9-R11**: confirmations of clean composition with #87 (compression doesn't fire during Norris steps — correct), #86/#88 (both scoped to ask_ai; can't leak into preplan call site). No action.
---
## 12. Phase 10 → Phase 11+ Out-of-band
Candidate follow-ups (non-binding):
- **Phase 11**: cross-session cost rollup (Phase 7 §12 option 1 —
long-deferred).
- **Cost preflight enforcement** (Phase 7 §12 option 2 — also long-
deferred; Phase 8's accurate counts are the prerequisite).
- **Mid-flight Norris re-plan** — preplanner gets to re-decompose
based on executor progress. Real value, but needs careful
state-machine design (when to re-plan, how to preserve already-
completed work).
- **Per-task model selection** — task could carry a model hint
emitted by the preplanner.
Phase 10 itself is self-contained — depends on Phase 3 (Norris) +
Phase 7 (cost accumulator) which are both implemented.