phase10: fold in Sonnet review — 2 blockers + 4 important + 2 nits
All 8 actionable findings accepted; R9-R11 were confirmations.
Blockers:
- R1: sys:gsub("N", ...) would corrupt "No prose / commentary /
numbering" → "16o prose" etc. Switch to %d + string.format.
- R2: §5 had a 2-slot NORRIS_SUFFIX_TEMPLATE redesign that
contradicted §11's "don't change the template; append helper
output after". §5 now shows the helper-append approach.
Important:
- R3: preplan bypasses call_broker (no fallback retry) — keep that
by design; retry would silently swap planning models. Documented
in §10 Risks so it doesn't get "fixed" later.
- R4: no pcall around run_norris → ctx.norris_active/_goal/_tasks
can leak across launches if a Norris step crashes. Fix: clear all
three at the TOP of run_norris before preplan. Cheaper than full
pcall wrap; handles the stale-tasks vector.
- R5: clarified C3 commit scope — safety.lua ONLY in C3; the
executor cfg resolution + preplan wiring lands in C4.
- R6: Context:reset() now also clears self.norris_tasks (defensive;
:reset is unreachable mid-Norris but one line is cheap).
Nits:
- R7: timeout_ms = pre_cfg.timeout_ms or 60000 (respect the
configured per-model timeout).
- R8: "Status:" → "Terminal output:" in §1 acceptance criterion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+61
-34
@@ -57,7 +57,7 @@ Four pillars:
|
||||
= "fast"`:
|
||||
1. Cloud emits a TASK list (e.g., `TASK: find /var/log -size +10M`;
|
||||
`TASK: stat -c "%n %s" <results>`; `TASK: format and report`).
|
||||
2. Status: `[aish] preplanned 3 tasks via cloud`
|
||||
2. Terminal output: `[aish] preplanned 3 tasks via cloud` (R8: was "Status:")
|
||||
3. Per-step execution by `fast`: each step shows the task it's
|
||||
working on; existing HALT protocol still gates destructive ops.
|
||||
- Without `cfg.norris.preplanner`, Norris behaves exactly as Phase 6
|
||||
@@ -124,17 +124,21 @@ line, EXACTLY this shape:
|
||||
|
||||
TASK: <imperative sentence, max 80 chars>
|
||||
|
||||
Output AT MOST N tasks. No prose; no numbering; no commentary outside
|
||||
Output AT MOST %d tasks. No prose; no numbering; no commentary outside
|
||||
the TASK: lines.
|
||||
]]
|
||||
sys = sys:gsub("N", tostring(config.norris.tasks_max or 16))
|
||||
-- R1 fix: %d via string.format; gsub("N", ...) would corrupt
|
||||
-- "No prose / No commentary / No numbering" → "16o prose" etc.
|
||||
sys = string.format(sys, config.norris.tasks_max or 16)
|
||||
local msgs = scrub_messages({
|
||||
{ role = "system", content = sys },
|
||||
{ role = "user", content = goal },
|
||||
}, secrets_mode_for(pre_cfg))
|
||||
local text, usage = broker.chat(pre_cfg, msgs,
|
||||
{ category = "norris-preplan",
|
||||
max_tokens = 800, timeout_ms = 60000 })
|
||||
{ category = "norris-preplan",
|
||||
max_tokens = 800,
|
||||
-- R7 fix: respect the model's configured timeout
|
||||
timeout_ms = pre_cfg.timeout_ms or 60000 })
|
||||
if text then
|
||||
if secrets_session then text = secrets_session:rehydrate(text) end
|
||||
if usage then _record_usage(usage.model, usage.category, usage) end
|
||||
@@ -174,33 +178,38 @@ each `result.status == "continue"`, bump
|
||||
`current > #ctx.norris_tasks.list`, the loop exits with a
|
||||
synthesized `"tasks_complete"` final status.
|
||||
|
||||
System suffix extension (context.lua NORRIS_SUFFIX_TEMPLATE):
|
||||
System suffix extension (R2 fix — keep NORRIS_SUFFIX_TEMPLATE
|
||||
**unchanged**; append a task-hint block AFTER the existing format):
|
||||
|
||||
```lua
|
||||
local NORRIS_SUFFIX_TEMPLATE = [[
|
||||
-- New helper at module scope in context.lua, alongside NORRIS_SUFFIX_TEMPLATE:
|
||||
local function compose_norris_task_hint(self)
|
||||
if not (self.norris_tasks and self.norris_tasks.list) then return "" end
|
||||
local k = self.norris_tasks.current
|
||||
local n = #self.norris_tasks.list
|
||||
local task = self.norris_tasks.list[k]
|
||||
if not task then return "" end -- exhausted → no hint
|
||||
return string.format(
|
||||
"\n\nCurrent step %d/%d:\n %s", k, n, task)
|
||||
end
|
||||
|
||||
-- In Context:to_messages, AFTER the existing string.format(NORRIS_SUFFIX...)
|
||||
-- block, append the hint:
|
||||
if self.norris_active and self.norris_goal then
|
||||
sys_content = sys_content
|
||||
.. string.format(NORRIS_SUFFIX_TEMPLATE, self.norris_goal)
|
||||
.. compose_norris_task_hint(self)
|
||||
end
|
||||
```
|
||||
|
||||
[NORRIS MODE] You are operating autonomously toward the following goal:
|
||||
|
||||
%s
|
||||
|
||||
%s
|
||||
|
||||
Plan and execute step by step ...
|
||||
]]
|
||||
|
||||
-- Compose: 1st %s = goal; 2nd %s = task hint (empty when no tasks).
|
||||
local function compose_norris_suffix(self)
|
||||
local task_hint = ""
|
||||
if self.norris_tasks and self.norris_tasks.list then
|
||||
local k = self.norris_tasks.current
|
||||
local n = #self.norris_tasks.list
|
||||
if self.norris_tasks.list[k] then
|
||||
task_hint = string.format(
|
||||
"Current step %d/%d:\n %s\n", k, n, self.norris_tasks.list[k])
|
||||
end
|
||||
end
|
||||
return string.format(NORRIS_SUFFIX_TEMPLATE, self.norris_goal, task_hint)
|
||||
Also (R6 fix) defensive clear in `Context:reset()`:
|
||||
```lua
|
||||
function Context:reset()
|
||||
self.turns = {}
|
||||
self.pending_exec_output = nil
|
||||
self.summary = nil
|
||||
self.norris_tasks = nil -- R6: defensive; :reset is unreachable
|
||||
-- mid-Norris but cheap to be safe.
|
||||
end
|
||||
```
|
||||
|
||||
@@ -236,9 +245,15 @@ If executor model lookup fails (`cfg.norris.executor` names a
|
||||
non-existent preset) → status log + use active_cfg (existing
|
||||
behavior). User can fix config and re-launch.
|
||||
|
||||
If `:reset` clears the conversation mid-Norris → existing behavior
|
||||
clears turns; `ctx.norris_tasks` should ALSO clear since the goal
|
||||
context is gone. Document in §9.
|
||||
If `:reset` is invoked → unreachable mid-Norris (no readline prompt
|
||||
while the planner is running). Out-of-Norris, `Context:reset()` now
|
||||
also clears `self.norris_tasks` as defensive coding (R6 fix).
|
||||
|
||||
R4: `run_norris` clears `ctx.norris_active`/`ctx.norris_goal`/
|
||||
`ctx.norris_tasks` at the **top** of the function, BEFORE the preplan
|
||||
block. This guarantees a fresh launch starts clean even if a prior
|
||||
Norris session crashed with stale state. Cheaper than wrapping the
|
||||
whole driver in pcall.
|
||||
|
||||
---
|
||||
|
||||
@@ -296,6 +311,7 @@ become assistant turns visible there).
|
||||
| Eviction during long Norris session removes preplan + first executor turns | Tasks stored on ctx (NOT in turns); survive eviction. Per Phase 3 R-C3 the goal anchor in the NORRIS suffix also survives. |
|
||||
| Preplan system prompt drift (user overrides badly) | Built-in fallback if cfg.norris.preplan_system absent; user override is opt-in. |
|
||||
| Anthropic cloud preplan emits "Here's my plan:\n1. ...\n2. ..." (markdown numbering) instead of TASK: lines | extract_task_lines uses strict `^TASK:` matcher; markdown lists are ignored. preplan_system explicitly demands the format. If real cloud models drift, document or refine prompt at impl time. |
|
||||
| R3: preplan call bypasses `call_broker` (Phase 5 fallback-retry wrapper) | **By design** — retrying the preplan against `fallback_model` would produce a different decomposition from a different model. That's not a recovery; it's a silent semantics change. Hard-fail to single-model Norris is the safer fallback. Documented here so a future maintainer doesn't "fix" it by wiring `call_broker` and surprise users. |
|
||||
|
||||
---
|
||||
|
||||
@@ -326,8 +342,8 @@ become assistant turns visible there).
|
||||
|---|---|---|---|
|
||||
| 1 | `executor: extract_task_lines for Phase 10 preplan parsing` | executor.lua + inline test | Pure function; verifiable standalone. Locks the TASK: parse contract before the preplan call wires it. |
|
||||
| 2 | `context: norris_tasks anchor + task-hint composition` | context.lua + inline test | New field on Context. Adds `compose_norris_task_hint(self)`; appends after the NORRIS suffix. ctx.norris_tasks is nil by default → no regression. |
|
||||
| 3 | `safety/renderer: pass current task descr through norris_step` | safety.lua + repl.lua tiny wiring | One-line tweak in safety.lua to source descr from ctx.norris_tasks. helpers.render_step already accepts descr (line 246 of renderer.lua). |
|
||||
| 4 | `repl: preplan + executor cfg resolution + tasks_max truncate (closes #89)` | repl.lua | The orchestration commit. Pre-loop preplan block; fall-back paths; executor cfg resolution (active_cfg vs cfg.norris.executor); ctx.norris_tasks lifecycle. |
|
||||
| 3 | `safety: pass current task descr to render_step from norris_step` | safety.lua ONLY | One-line tweak in safety.lua to source `descr` from `ctx.norris_tasks` and pass to `helpers.render_step(step_n, max_steps, descr)`. **No repl.lua change in this commit** (R5 clarification). |
|
||||
| 4 | `repl: preplan + executor cfg resolution + tasks_max truncate (closes #89)` | repl.lua | The orchestration commit. Pre-loop preplan block; fall-back paths; executor cfg resolution (`active_cfg` vs `cfg.norris.executor`); `ctx.norris_tasks` lifecycle (clear-at-top per R4); pass executor_cfg to safety.norris_step instead of active_cfg. |
|
||||
| 5 | `phase10: config example + MEMORY index + project status` | config.lua, MEMORY.md, memory/project_phase_status.md | Documentation + persistent project state. Ships the user-visible config block. |
|
||||
|
||||
Each commit must leave the tree in a state where `luajit main.lua` runs and existing tests pass; commits 1-3 ship behind a feature-unused-yet stance (nothing calls them), commit 4 lights them up, commit 5 documents.
|
||||
@@ -342,7 +358,18 @@ Each commit must leave the tree in a state where `luajit main.lua` runs and exis
|
||||
|
||||
### Resolved review tickets folded into the plan
|
||||
|
||||
(None yet — Sonnet review runs after this manifest is committed.)
|
||||
**Sonnet review 2026-05-17 — 2 blockers + 4 important + 2 nits. All accepted.**
|
||||
|
||||
- **R1 (blocker)** `sys:gsub("N", ...)` would corrupt "No prose", "No commentary", "No numbering" → "16o prose". **Fix**: use `string.format` with `%d` in the template, replace the gsub call.
|
||||
- **R2 (blocker)** §5 pseudocode showed a 2-slot NORRIS_SUFFIX_TEMPLATE redesign, contradicting §11's "don't change the template; append helper output AFTER". **Fix**: §5 below now shows the helper-append approach matching §11.
|
||||
- **R3 (important)** Preplan call bypasses `call_broker` (Phase 5 fallback-retry wrapper). **Decision: intentional** — fallback for a preplan call would produce a different decomposition from a different model, which is actively undesirable. Documented in §10 Risks.
|
||||
- **R4 (important)** No pcall around `run_norris` → stale `ctx.norris_active`/`norris_goal`/`norris_tasks` on uncaught error. Pre-existing bug; Phase 10 adds one more leaky field. **Fix**: clear all three at the TOP of `run_norris` (before preplan) so a fresh launch always starts clean regardless of prior crash. Cheaper than full pcall wrap; sufficient for the stale-tasks vector.
|
||||
- **R5 (important)** C3 commit scope ambiguity. **Clarification**: C3's "tiny repl.lua wiring" is ONLY passing `descr` to `render_step`. Executor cfg resolution (active_cfg vs cfg.norris.executor) lands in C4 alongside the preplan block. Table updated.
|
||||
- **R6 (important)** `ctx.norris_tasks` lifecycle vs `Context:reset()`. **Fix**: add `self.norris_tasks = nil` to `Context:reset()` as defensive coding (one line, no regression). §7 amended to remove the contradictory "Document in §9" deferral.
|
||||
- **R7 (nit)** Hardcoded `timeout_ms = 60000` ignores `pre_cfg.timeout_ms`. **Fix**: `pre_cfg.timeout_ms or 60000` in §4 pseudocode.
|
||||
- **R8 (nit)** "Status:" label in §1 acceptance criterion could be misread as on-screen prefix. **Fix**: rename to "Terminal output:".
|
||||
- **R9-R11**: confirmations of clean composition with #87 (compression doesn't fire during Norris steps — correct), #86/#88 (both scoped to ask_ai; can't leak into preplan call site). No action.
|
||||
|
||||
---
|
||||
|
||||
## 12. Phase 10 → Phase 11+ Out-of-band
|
||||
|
||||
Reference in New Issue
Block a user