phase10: fold in Sonnet review — 2 blockers + 4 important + 2 nits

All 8 actionable findings accepted; R9-R11 were confirmations.

Blockers:
  - R1: sys:gsub("N", ...) would corrupt "No prose / commentary /
    numbering" → "16o prose" etc. Switch to %d + string.format.
  - R2: §5 had a 2-slot NORRIS_SUFFIX_TEMPLATE redesign that
    contradicted §11's "don't change the template; append helper
    output after". §5 now shows the helper-append approach.

Important:
  - R3: preplan bypasses call_broker (no fallback retry) — keep that
    by design; retry would silently swap planning models. Documented
    in §10 Risks so it doesn't get "fixed" later.
  - R4: no pcall around run_norris → ctx.norris_active/_goal/_tasks
    can leak across launches if a Norris step crashes. Fix: clear all
    three at the TOP of run_norris before preplan. Cheaper than full
    pcall wrap; handles the stale-tasks vector.
  - R5: clarified C3 commit scope — safety.lua ONLY in C3; the
    executor cfg resolution + preplan wiring lands in C4.
  - R6: Context:reset() now also clears self.norris_tasks (defensive;
    :reset is unreachable mid-Norris but one line is cheap).

Nits:
  - R7: timeout_ms = pre_cfg.timeout_ms or 60000 (respect the
    configured per-model timeout).
  - R8: "Status:" → "Terminal output:" in §1 acceptance criterion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-17 08:17:30 +00:00
parent cb2f948e76
commit cbef05ff40
+58 -31
View File
@@ -57,7 +57,7 @@ Four pillars:
= "fast"`: = "fast"`:
1. Cloud emits a TASK list (e.g., `TASK: find /var/log -size +10M`; 1. Cloud emits a TASK list (e.g., `TASK: find /var/log -size +10M`;
`TASK: stat -c "%n %s" <results>`; `TASK: format and report`). `TASK: stat -c "%n %s" <results>`; `TASK: format and report`).
2. Status: `[aish] preplanned 3 tasks via cloud` 2. Terminal output: `[aish] preplanned 3 tasks via cloud` (R8: was "Status:")
3. Per-step execution by `fast`: each step shows the task it's 3. Per-step execution by `fast`: each step shows the task it's
working on; existing HALT protocol still gates destructive ops. working on; existing HALT protocol still gates destructive ops.
- Without `cfg.norris.preplanner`, Norris behaves exactly as Phase 6 - Without `cfg.norris.preplanner`, Norris behaves exactly as Phase 6
@@ -124,17 +124,21 @@ line, EXACTLY this shape:
TASK: <imperative sentence, max 80 chars> TASK: <imperative sentence, max 80 chars>
Output AT MOST N tasks. No prose; no numbering; no commentary outside Output AT MOST %d tasks. No prose; no numbering; no commentary outside
the TASK: lines. the TASK: lines.
]] ]]
sys = sys:gsub("N", tostring(config.norris.tasks_max or 16)) -- R1 fix: %d via string.format; gsub("N", ...) would corrupt
-- "No prose / No commentary / No numbering" → "16o prose" etc.
sys = string.format(sys, config.norris.tasks_max or 16)
local msgs = scrub_messages({ local msgs = scrub_messages({
{ role = "system", content = sys }, { role = "system", content = sys },
{ role = "user", content = goal }, { role = "user", content = goal },
}, secrets_mode_for(pre_cfg)) }, secrets_mode_for(pre_cfg))
local text, usage = broker.chat(pre_cfg, msgs, local text, usage = broker.chat(pre_cfg, msgs,
{ category = "norris-preplan", { category = "norris-preplan",
max_tokens = 800, timeout_ms = 60000 }) max_tokens = 800,
-- R7 fix: respect the model's configured timeout
timeout_ms = pre_cfg.timeout_ms or 60000 })
if text then if text then
if secrets_session then text = secrets_session:rehydrate(text) end if secrets_session then text = secrets_session:rehydrate(text) end
if usage then _record_usage(usage.model, usage.category, usage) end if usage then _record_usage(usage.model, usage.category, usage) end
@@ -174,33 +178,38 @@ each `result.status == "continue"`, bump
`current > #ctx.norris_tasks.list`, the loop exits with a `current > #ctx.norris_tasks.list`, the loop exits with a
synthesized `"tasks_complete"` final status. synthesized `"tasks_complete"` final status.
System suffix extension (context.lua NORRIS_SUFFIX_TEMPLATE): System suffix extension (R2 fix — keep NORRIS_SUFFIX_TEMPLATE
**unchanged**; append a task-hint block AFTER the existing format):
```lua ```lua
local NORRIS_SUFFIX_TEMPLATE = [[ -- New helper at module scope in context.lua, alongside NORRIS_SUFFIX_TEMPLATE:
local function compose_norris_task_hint(self)
if not (self.norris_tasks and self.norris_tasks.list) then return "" end
[NORRIS MODE] You are operating autonomously toward the following goal:
%s
%s
Plan and execute step by step ...
]]
-- Compose: 1st %s = goal; 2nd %s = task hint (empty when no tasks).
local function compose_norris_suffix(self)
local task_hint = ""
if self.norris_tasks and self.norris_tasks.list then
local k = self.norris_tasks.current local k = self.norris_tasks.current
local n = #self.norris_tasks.list local n = #self.norris_tasks.list
if self.norris_tasks.list[k] then local task = self.norris_tasks.list[k]
task_hint = string.format( if not task then return "" end -- exhausted → no hint
"Current step %d/%d:\n %s\n", k, n, self.norris_tasks.list[k]) return string.format(
"\n\nCurrent step %d/%d:\n %s", k, n, task)
end end
-- In Context:to_messages, AFTER the existing string.format(NORRIS_SUFFIX...)
-- block, append the hint:
if self.norris_active and self.norris_goal then
sys_content = sys_content
.. string.format(NORRIS_SUFFIX_TEMPLATE, self.norris_goal)
.. compose_norris_task_hint(self)
end end
return string.format(NORRIS_SUFFIX_TEMPLATE, self.norris_goal, task_hint) ```
Also (R6 fix) defensive clear in `Context:reset()`:
```lua
function Context:reset()
self.turns = {}
self.pending_exec_output = nil
self.summary = nil
self.norris_tasks = nil -- R6: defensive; :reset is unreachable
-- mid-Norris but cheap to be safe.
end end
``` ```
@@ -236,9 +245,15 @@ If executor model lookup fails (`cfg.norris.executor` names a
non-existent preset) → status log + use active_cfg (existing non-existent preset) → status log + use active_cfg (existing
behavior). User can fix config and re-launch. behavior). User can fix config and re-launch.
If `:reset` clears the conversation mid-Norris → existing behavior If `:reset` is invoked → unreachable mid-Norris (no readline prompt
clears turns; `ctx.norris_tasks` should ALSO clear since the goal while the planner is running). Out-of-Norris, `Context:reset()` now
context is gone. Document in §9. also clears `self.norris_tasks` as defensive coding (R6 fix).
R4: `run_norris` clears `ctx.norris_active`/`ctx.norris_goal`/
`ctx.norris_tasks` at the **top** of the function, BEFORE the preplan
block. This guarantees a fresh launch starts clean even if a prior
Norris session crashed with stale state. Cheaper than wrapping the
whole driver in pcall.
--- ---
@@ -296,6 +311,7 @@ become assistant turns visible there).
| Eviction during long Norris session removes preplan + first executor turns | Tasks stored on ctx (NOT in turns); survive eviction. Per Phase 3 R-C3 the goal anchor in the NORRIS suffix also survives. | | Eviction during long Norris session removes preplan + first executor turns | Tasks stored on ctx (NOT in turns); survive eviction. Per Phase 3 R-C3 the goal anchor in the NORRIS suffix also survives. |
| Preplan system prompt drift (user overrides badly) | Built-in fallback if cfg.norris.preplan_system absent; user override is opt-in. | | Preplan system prompt drift (user overrides badly) | Built-in fallback if cfg.norris.preplan_system absent; user override is opt-in. |
| Anthropic cloud preplan emits "Here's my plan:\n1. ...\n2. ..." (markdown numbering) instead of TASK: lines | extract_task_lines uses strict `^TASK:` matcher; markdown lists are ignored. preplan_system explicitly demands the format. If real cloud models drift, document or refine prompt at impl time. | | Anthropic cloud preplan emits "Here's my plan:\n1. ...\n2. ..." (markdown numbering) instead of TASK: lines | extract_task_lines uses strict `^TASK:` matcher; markdown lists are ignored. preplan_system explicitly demands the format. If real cloud models drift, document or refine prompt at impl time. |
| R3: preplan call bypasses `call_broker` (Phase 5 fallback-retry wrapper) | **By design** — retrying the preplan against `fallback_model` would produce a different decomposition from a different model. That's not a recovery; it's a silent semantics change. Hard-fail to single-model Norris is the safer fallback. Documented here so a future maintainer doesn't "fix" it by wiring `call_broker` and surprise users. |
--- ---
@@ -326,8 +342,8 @@ become assistant turns visible there).
|---|---|---|---| |---|---|---|---|
| 1 | `executor: extract_task_lines for Phase 10 preplan parsing` | executor.lua + inline test | Pure function; verifiable standalone. Locks the TASK: parse contract before the preplan call wires it. | | 1 | `executor: extract_task_lines for Phase 10 preplan parsing` | executor.lua + inline test | Pure function; verifiable standalone. Locks the TASK: parse contract before the preplan call wires it. |
| 2 | `context: norris_tasks anchor + task-hint composition` | context.lua + inline test | New field on Context. Adds `compose_norris_task_hint(self)`; appends after the NORRIS suffix. ctx.norris_tasks is nil by default → no regression. | | 2 | `context: norris_tasks anchor + task-hint composition` | context.lua + inline test | New field on Context. Adds `compose_norris_task_hint(self)`; appends after the NORRIS suffix. ctx.norris_tasks is nil by default → no regression. |
| 3 | `safety/renderer: pass current task descr through norris_step` | safety.lua + repl.lua tiny wiring | One-line tweak in safety.lua to source descr from ctx.norris_tasks. helpers.render_step already accepts descr (line 246 of renderer.lua). | | 3 | `safety: pass current task descr to render_step from norris_step` | safety.lua ONLY | One-line tweak in safety.lua to source `descr` from `ctx.norris_tasks` and pass to `helpers.render_step(step_n, max_steps, descr)`. **No repl.lua change in this commit** (R5 clarification). |
| 4 | `repl: preplan + executor cfg resolution + tasks_max truncate (closes #89)` | repl.lua | The orchestration commit. Pre-loop preplan block; fall-back paths; executor cfg resolution (active_cfg vs cfg.norris.executor); ctx.norris_tasks lifecycle. | | 4 | `repl: preplan + executor cfg resolution + tasks_max truncate (closes #89)` | repl.lua | The orchestration commit. Pre-loop preplan block; fall-back paths; executor cfg resolution (`active_cfg` vs `cfg.norris.executor`); `ctx.norris_tasks` lifecycle (clear-at-top per R4); pass executor_cfg to safety.norris_step instead of active_cfg. |
| 5 | `phase10: config example + MEMORY index + project status` | config.lua, MEMORY.md, memory/project_phase_status.md | Documentation + persistent project state. Ships the user-visible config block. | | 5 | `phase10: config example + MEMORY index + project status` | config.lua, MEMORY.md, memory/project_phase_status.md | Documentation + persistent project state. Ships the user-visible config block. |
Each commit must leave the tree in a state where `luajit main.lua` runs and existing tests pass; commits 1-3 ship behind a feature-unused-yet stance (nothing calls them), commit 4 lights them up, commit 5 documents. Each commit must leave the tree in a state where `luajit main.lua` runs and existing tests pass; commits 1-3 ship behind a feature-unused-yet stance (nothing calls them), commit 4 lights them up, commit 5 documents.
@@ -342,7 +358,18 @@ Each commit must leave the tree in a state where `luajit main.lua` runs and exis
### Resolved review tickets folded into the plan ### Resolved review tickets folded into the plan
(None yet — Sonnet review runs after this manifest is committed.) **Sonnet review 2026-05-17 — 2 blockers + 4 important + 2 nits. All accepted.**
- **R1 (blocker)** `sys:gsub("N", ...)` would corrupt "No prose", "No commentary", "No numbering" → "16o prose". **Fix**: use `string.format` with `%d` in the template, replace the gsub call.
- **R2 (blocker)** §5 pseudocode showed a 2-slot NORRIS_SUFFIX_TEMPLATE redesign, contradicting §11's "don't change the template; append helper output AFTER". **Fix**: §5 below now shows the helper-append approach matching §11.
- **R3 (important)** Preplan call bypasses `call_broker` (Phase 5 fallback-retry wrapper). **Decision: intentional** — fallback for a preplan call would produce a different decomposition from a different model, which is actively undesirable. Documented in §10 Risks.
- **R4 (important)** No pcall around `run_norris` → stale `ctx.norris_active`/`norris_goal`/`norris_tasks` on uncaught error. Pre-existing bug; Phase 10 adds one more leaky field. **Fix**: clear all three at the TOP of `run_norris` (before preplan) so a fresh launch always starts clean regardless of prior crash. Cheaper than full pcall wrap; sufficient for the stale-tasks vector.
- **R5 (important)** C3 commit scope ambiguity. **Clarification**: C3's "tiny repl.lua wiring" is ONLY passing `descr` to `render_step`. Executor cfg resolution (active_cfg vs cfg.norris.executor) lands in C4 alongside the preplan block. Table updated.
- **R6 (important)** `ctx.norris_tasks` lifecycle vs `Context:reset()`. **Fix**: add `self.norris_tasks = nil` to `Context:reset()` as defensive coding (one line, no regression). §7 amended to remove the contradictory "Document in §9" deferral.
- **R7 (nit)** Hardcoded `timeout_ms = 60000` ignores `pre_cfg.timeout_ms`. **Fix**: `pre_cfg.timeout_ms or 60000` in §4 pseudocode.
- **R8 (nit)** "Status:" label in §1 acceptance criterion could be misread as on-screen prefix. **Fix**: rename to "Terminal output:".
- **R9-R11**: confirmations of clean composition with #87 (compression doesn't fire during Norris steps — correct), #86/#88 (both scoped to ask_ai; can't leak into preplan call site). No action.
--- ---
## 12. Phase 10 → Phase 11+ Out-of-band ## 12. Phase 10 → Phase 11+ Out-of-band