phase10: fold in Sonnet review — 2 blockers + 4 important + 2 nits

All 8 actionable findings accepted; R9-R11 were confirmations. Blockers: - R1: sys:gsub("N", ...) would corrupt "No prose / commentary / numbering" → "16o prose" etc. Switch to %d + string.format. - R2: §5 had a 2-slot NORRIS_SUFFIX_TEMPLATE redesign that contradicted §11's "don't change the template; append helper output after". §5 now shows the helper-append approach. Important: - R3: preplan bypasses call_broker (no fallback retry) — keep that by design; retry would silently swap planning models. Documented in §10 Risks so it doesn't get "fixed" later. - R4: no pcall around run_norris → ctx.norris_active/_goal/_tasks can leak across launches if a Norris step crashes. Fix: clear all three at the TOP of run_norris before preplan. Cheaper than full pcall wrap; handles the stale-tasks vector. - R5: clarified C3 commit scope — safety.lua ONLY in C3; the executor cfg resolution + preplan wiring lands in C4. - R6: Context:reset() now also clears self.norris_tasks (defensive; :reset is unreachable mid-Norris but one line is cheap). Nits: - R7: timeout_ms = pre_cfg.timeout_ms or 60000 (respect the configured per-model timeout). - R8: "Status:" → "Terminal output:" in §1 acceptance criterion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:17:30 +00:00
parent cb2f948e76
commit cbef05ff40
1 changed files with 61 additions and 34 deletions
@@ -57,7 +57,7 @@ Four pillars:
  = "fast"`:
  1. Cloud emits a TASK list (e.g., `TASK: find /var/log -size +10M`;
     `TASK: stat -c "%n %s" <results>`; `TASK: format and report`).
-  2. Status: `[aish] preplanned 3 tasks via cloud`
+  2. Terminal output: `[aish] preplanned 3 tasks via cloud`  (R8: was "Status:")
  3. Per-step execution by `fast`: each step shows the task it's
     working on; existing HALT protocol still gates destructive ops.
 - Without `cfg.norris.preplanner`, Norris behaves exactly as Phase 6
@@ -124,17 +124,21 @@ line, EXACTLY this shape:
  TASK: <imperative sentence, max 80 chars>
-Output AT MOST N tasks. No prose; no numbering; no commentary outside
+Output AT MOST %d tasks. No prose; no numbering; no commentary outside
 the TASK: lines.
 ]]
-        sys = sys:gsub("N", tostring(config.norris.tasks_max or 16))
+        -- R1 fix: %d via string.format; gsub("N", ...) would corrupt
        -- "No prose / No commentary / No numbering" → "16o prose" etc.
        sys = string.format(sys, config.norris.tasks_max or 16)
        local msgs = scrub_messages({
            { role = "system", content = sys },
            { role = "user",   content = goal },
        }, secrets_mode_for(pre_cfg))
        local text, usage = broker.chat(pre_cfg, msgs,
            { category   = "norris-preplan",
-              max_tokens = 800, timeout_ms = 60000 })
+              max_tokens = 800,
              -- R7 fix: respect the model's configured timeout
              timeout_ms = pre_cfg.timeout_ms or 60000 })
        if text then
            if secrets_session then text = secrets_session:rehydrate(text) end
            if usage then _record_usage(usage.model, usage.category, usage) end
@@ -174,33 +178,38 @@ each `result.status == "continue"`, bump
 `current > #ctx.norris_tasks.list`, the loop exits with a
 synthesized `"tasks_complete"` final status.
-System suffix extension (context.lua NORRIS_SUFFIX_TEMPLATE):
+System suffix extension (R2 fix — keep NORRIS_SUFFIX_TEMPLATE
 **unchanged**; append a task-hint block AFTER the existing format):
 ```lua
-local NORRIS_SUFFIX_TEMPLATE = [[
+-- New helper at module scope in context.lua, alongside NORRIS_SUFFIX_TEMPLATE:
-
+local function compose_norris_task_hint(self)
-
+    if not (self.norris_tasks and self.norris_tasks.list) then return "" end
 [NORRIS MODE] You are operating autonomously toward the following goal:
    %s
 %s
 Plan and execute step by step ...
 ]]
 -- Compose: 1st %s = goal; 2nd %s = task hint (empty when no tasks).
 local function compose_norris_suffix(self)
    local task_hint = ""
    if self.norris_tasks and self.norris_tasks.list then
    local k = self.norris_tasks.current
    local n = #self.norris_tasks.list
-        if self.norris_tasks.list[k] then
+    local task = self.norris_tasks.list[k]
-            task_hint = string.format(
+    if not task then return "" end  -- exhausted → no hint
-                "Current step %d/%d:\n    %s\n", k, n, self.norris_tasks.list[k])
+    return string.format(
        "\n\nCurrent step %d/%d:\n    %s", k, n, task)
 end
 -- In Context:to_messages, AFTER the existing string.format(NORRIS_SUFFIX...)
 -- block, append the hint:
 if self.norris_active and self.norris_goal then
    sys_content = sys_content
        .. string.format(NORRIS_SUFFIX_TEMPLATE, self.norris_goal)
        .. compose_norris_task_hint(self)
 end
-    return string.format(NORRIS_SUFFIX_TEMPLATE, self.norris_goal, task_hint)
+```
 Also (R6 fix) defensive clear in `Context:reset()`:
 ```lua
 function Context:reset()
    self.turns = {}
    self.pending_exec_output = nil
    self.summary = nil
    self.norris_tasks = nil  -- R6: defensive; :reset is unreachable
                             --     mid-Norris but cheap to be safe.
 end
 ```
@@ -236,9 +245,15 @@ If executor model lookup fails (`cfg.norris.executor` names a
 non-existent preset) → status log + use active_cfg (existing
 behavior). User can fix config and re-launch.
-If `:reset` clears the conversation mid-Norris → existing behavior
+If `:reset` is invoked → unreachable mid-Norris (no readline prompt
-clears turns; `ctx.norris_tasks` should ALSO clear since the goal
+while the planner is running). Out-of-Norris, `Context:reset()` now
-context is gone. Document in §9.
+also clears `self.norris_tasks` as defensive coding (R6 fix).
 R4: `run_norris` clears `ctx.norris_active`/`ctx.norris_goal`/
 `ctx.norris_tasks` at the **top** of the function, BEFORE the preplan
 block. This guarantees a fresh launch starts clean even if a prior
 Norris session crashed with stale state. Cheaper than wrapping the
 whole driver in pcall.
 ---
@@ -296,6 +311,7 @@ become assistant turns visible there).
 | Eviction during long Norris session removes preplan + first executor turns | Tasks stored on ctx (NOT in turns); survive eviction. Per Phase 3 R-C3 the goal anchor in the NORRIS suffix also survives. |
 | Preplan system prompt drift (user overrides badly) | Built-in fallback if cfg.norris.preplan_system absent; user override is opt-in. |
 | Anthropic cloud preplan emits "Here's my plan:\n1. ...\n2. ..." (markdown numbering) instead of TASK: lines | extract_task_lines uses strict `^TASK:` matcher; markdown lists are ignored. preplan_system explicitly demands the format. If real cloud models drift, document or refine prompt at impl time. |
 | R3: preplan call bypasses `call_broker` (Phase 5 fallback-retry wrapper) | **By design** — retrying the preplan against `fallback_model` would produce a different decomposition from a different model. That's not a recovery; it's a silent semantics change. Hard-fail to single-model Norris is the safer fallback. Documented here so a future maintainer doesn't "fix" it by wiring `call_broker` and surprise users. |
 ---
@@ -326,8 +342,8 @@ become assistant turns visible there).
 |---|---|---|---|
 | 1 | `executor: extract_task_lines for Phase 10 preplan parsing` | executor.lua + inline test | Pure function; verifiable standalone. Locks the TASK: parse contract before the preplan call wires it. |
 | 2 | `context: norris_tasks anchor + task-hint composition` | context.lua + inline test | New field on Context. Adds `compose_norris_task_hint(self)`; appends after the NORRIS suffix. ctx.norris_tasks is nil by default → no regression. |
-| 3 | `safety/renderer: pass current task descr through norris_step` | safety.lua + repl.lua tiny wiring | One-line tweak in safety.lua to source descr from ctx.norris_tasks. helpers.render_step already accepts descr (line 246 of renderer.lua). |
+| 3 | `safety: pass current task descr to render_step from norris_step` | safety.lua ONLY | One-line tweak in safety.lua to source `descr` from `ctx.norris_tasks` and pass to `helpers.render_step(step_n, max_steps, descr)`. **No repl.lua change in this commit** (R5 clarification). |
-| 4 | `repl: preplan + executor cfg resolution + tasks_max truncate (closes #89)` | repl.lua | The orchestration commit. Pre-loop preplan block; fall-back paths; executor cfg resolution (active_cfg vs cfg.norris.executor); ctx.norris_tasks lifecycle. |
+| 4 | `repl: preplan + executor cfg resolution + tasks_max truncate (closes #89)` | repl.lua | The orchestration commit. Pre-loop preplan block; fall-back paths; executor cfg resolution (`active_cfg` vs `cfg.norris.executor`); `ctx.norris_tasks` lifecycle (clear-at-top per R4); pass executor_cfg to safety.norris_step instead of active_cfg. |
 | 5 | `phase10: config example + MEMORY index + project status` | config.lua, MEMORY.md, memory/project_phase_status.md | Documentation + persistent project state. Ships the user-visible config block. |
 Each commit must leave the tree in a state where `luajit main.lua` runs and existing tests pass; commits 1-3 ship behind a feature-unused-yet stance (nothing calls them), commit 4 lights them up, commit 5 documents.
@@ -342,7 +358,18 @@ Each commit must leave the tree in a state where `luajit main.lua` runs and exis
 ### Resolved review tickets folded into the plan
-(None yet — Sonnet review runs after this manifest is committed.)
+**Sonnet review 2026-05-17 — 2 blockers + 4 important + 2 nits. All accepted.**
 - **R1 (blocker)** `sys:gsub("N", ...)` would corrupt "No prose", "No commentary", "No numbering" → "16o prose". **Fix**: use `string.format` with `%d` in the template, replace the gsub call.
 - **R2 (blocker)** §5 pseudocode showed a 2-slot NORRIS_SUFFIX_TEMPLATE redesign, contradicting §11's "don't change the template; append helper output AFTER". **Fix**: §5 below now shows the helper-append approach matching §11.
 - **R3 (important)** Preplan call bypasses `call_broker` (Phase 5 fallback-retry wrapper). **Decision: intentional** — fallback for a preplan call would produce a different decomposition from a different model, which is actively undesirable. Documented in §10 Risks.
 - **R4 (important)** No pcall around `run_norris` → stale `ctx.norris_active`/`norris_goal`/`norris_tasks` on uncaught error. Pre-existing bug; Phase 10 adds one more leaky field. **Fix**: clear all three at the TOP of `run_norris` (before preplan) so a fresh launch always starts clean regardless of prior crash. Cheaper than full pcall wrap; sufficient for the stale-tasks vector.
 - **R5 (important)** C3 commit scope ambiguity. **Clarification**: C3's "tiny repl.lua wiring" is ONLY passing `descr` to `render_step`. Executor cfg resolution (active_cfg vs cfg.norris.executor) lands in C4 alongside the preplan block. Table updated.
 - **R6 (important)** `ctx.norris_tasks` lifecycle vs `Context:reset()`. **Fix**: add `self.norris_tasks = nil` to `Context:reset()` as defensive coding (one line, no regression). §7 amended to remove the contradictory "Document in §9" deferral.
 - **R7 (nit)** Hardcoded `timeout_ms = 60000` ignores `pre_cfg.timeout_ms`. **Fix**: `pre_cfg.timeout_ms or 60000` in §4 pseudocode.
 - **R8 (nit)** "Status:" label in §1 acceptance criterion could be misread as on-screen prefix. **Fix**: rename to "Terminal output:".
 - **R9-R11**: confirmations of clean composition with #87 (compression doesn't fire during Norris steps — correct), #86/#88 (both scoped to ask_ai; can't leak into preplan call site). No action.
 ---
 ## 12. Phase 10 → Phase 11+ Out-of-band