phase10: fold in Sonnet review — 2 blockers + 4 important + 2 nits
All 8 actionable findings accepted; R9-R11 were confirmations.
Blockers:
- R1: sys:gsub("N", ...) would corrupt "No prose / commentary /
numbering" → "16o prose" etc. Switch to %d + string.format.
- R2: §5 had a 2-slot NORRIS_SUFFIX_TEMPLATE redesign that
contradicted §11's "don't change the template; append helper
output after". §5 now shows the helper-append approach.
Important:
- R3: preplan bypasses call_broker (no fallback retry) — keep that
by design; retry would silently swap planning models. Documented
in §10 Risks so it doesn't get "fixed" later.
- R4: no pcall around run_norris → ctx.norris_active/_goal/_tasks
can leak across launches if a Norris step crashes. Fix: clear all
three at the TOP of run_norris before preplan. Cheaper than full
pcall wrap; handles the stale-tasks vector.
- R5: clarified C3 commit scope — safety.lua ONLY in C3; the
executor cfg resolution + preplan wiring lands in C4.
- R6: Context:reset() now also clears self.norris_tasks (defensive;
:reset is unreachable mid-Norris but one line is cheap).
Nits:
- R7: timeout_ms = pre_cfg.timeout_ms or 60000 (respect the
configured per-model timeout).
- R8: "Status:" → "Terminal output:" in §1 acceptance criterion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+58
-31
@@ -57,7 +57,7 @@ Four pillars:
|
|||||||
= "fast"`:
|
= "fast"`:
|
||||||
1. Cloud emits a TASK list (e.g., `TASK: find /var/log -size +10M`;
|
1. Cloud emits a TASK list (e.g., `TASK: find /var/log -size +10M`;
|
||||||
`TASK: stat -c "%n %s" <results>`; `TASK: format and report`).
|
`TASK: stat -c "%n %s" <results>`; `TASK: format and report`).
|
||||||
2. Status: `[aish] preplanned 3 tasks via cloud`
|
2. Terminal output: `[aish] preplanned 3 tasks via cloud` (R8: was "Status:")
|
||||||
3. Per-step execution by `fast`: each step shows the task it's
|
3. Per-step execution by `fast`: each step shows the task it's
|
||||||
working on; existing HALT protocol still gates destructive ops.
|
working on; existing HALT protocol still gates destructive ops.
|
||||||
- Without `cfg.norris.preplanner`, Norris behaves exactly as Phase 6
|
- Without `cfg.norris.preplanner`, Norris behaves exactly as Phase 6
|
||||||
@@ -124,17 +124,21 @@ line, EXACTLY this shape:
|
|||||||
|
|
||||||
TASK: <imperative sentence, max 80 chars>
|
TASK: <imperative sentence, max 80 chars>
|
||||||
|
|
||||||
Output AT MOST N tasks. No prose; no numbering; no commentary outside
|
Output AT MOST %d tasks. No prose; no numbering; no commentary outside
|
||||||
the TASK: lines.
|
the TASK: lines.
|
||||||
]]
|
]]
|
||||||
sys = sys:gsub("N", tostring(config.norris.tasks_max or 16))
|
-- R1 fix: %d via string.format; gsub("N", ...) would corrupt
|
||||||
|
-- "No prose / No commentary / No numbering" → "16o prose" etc.
|
||||||
|
sys = string.format(sys, config.norris.tasks_max or 16)
|
||||||
local msgs = scrub_messages({
|
local msgs = scrub_messages({
|
||||||
{ role = "system", content = sys },
|
{ role = "system", content = sys },
|
||||||
{ role = "user", content = goal },
|
{ role = "user", content = goal },
|
||||||
}, secrets_mode_for(pre_cfg))
|
}, secrets_mode_for(pre_cfg))
|
||||||
local text, usage = broker.chat(pre_cfg, msgs,
|
local text, usage = broker.chat(pre_cfg, msgs,
|
||||||
{ category = "norris-preplan",
|
{ category = "norris-preplan",
|
||||||
max_tokens = 800, timeout_ms = 60000 })
|
max_tokens = 800,
|
||||||
|
-- R7 fix: respect the model's configured timeout
|
||||||
|
timeout_ms = pre_cfg.timeout_ms or 60000 })
|
||||||
if text then
|
if text then
|
||||||
if secrets_session then text = secrets_session:rehydrate(text) end
|
if secrets_session then text = secrets_session:rehydrate(text) end
|
||||||
if usage then _record_usage(usage.model, usage.category, usage) end
|
if usage then _record_usage(usage.model, usage.category, usage) end
|
||||||
@@ -174,33 +178,38 @@ each `result.status == "continue"`, bump
|
|||||||
`current > #ctx.norris_tasks.list`, the loop exits with a
|
`current > #ctx.norris_tasks.list`, the loop exits with a
|
||||||
synthesized `"tasks_complete"` final status.
|
synthesized `"tasks_complete"` final status.
|
||||||
|
|
||||||
System suffix extension (context.lua NORRIS_SUFFIX_TEMPLATE):
|
System suffix extension (R2 fix — keep NORRIS_SUFFIX_TEMPLATE
|
||||||
|
**unchanged**; append a task-hint block AFTER the existing format):
|
||||||
|
|
||||||
```lua
|
```lua
|
||||||
local NORRIS_SUFFIX_TEMPLATE = [[
|
-- New helper at module scope in context.lua, alongside NORRIS_SUFFIX_TEMPLATE:
|
||||||
|
local function compose_norris_task_hint(self)
|
||||||
|
if not (self.norris_tasks and self.norris_tasks.list) then return "" end
|
||||||
[NORRIS MODE] You are operating autonomously toward the following goal:
|
|
||||||
|
|
||||||
%s
|
|
||||||
|
|
||||||
%s
|
|
||||||
|
|
||||||
Plan and execute step by step ...
|
|
||||||
]]
|
|
||||||
|
|
||||||
-- Compose: 1st %s = goal; 2nd %s = task hint (empty when no tasks).
|
|
||||||
local function compose_norris_suffix(self)
|
|
||||||
local task_hint = ""
|
|
||||||
if self.norris_tasks and self.norris_tasks.list then
|
|
||||||
local k = self.norris_tasks.current
|
local k = self.norris_tasks.current
|
||||||
local n = #self.norris_tasks.list
|
local n = #self.norris_tasks.list
|
||||||
if self.norris_tasks.list[k] then
|
local task = self.norris_tasks.list[k]
|
||||||
task_hint = string.format(
|
if not task then return "" end -- exhausted → no hint
|
||||||
"Current step %d/%d:\n %s\n", k, n, self.norris_tasks.list[k])
|
return string.format(
|
||||||
|
"\n\nCurrent step %d/%d:\n %s", k, n, task)
|
||||||
end
|
end
|
||||||
|
|
||||||
|
-- In Context:to_messages, AFTER the existing string.format(NORRIS_SUFFIX...)
|
||||||
|
-- block, append the hint:
|
||||||
|
if self.norris_active and self.norris_goal then
|
||||||
|
sys_content = sys_content
|
||||||
|
.. string.format(NORRIS_SUFFIX_TEMPLATE, self.norris_goal)
|
||||||
|
.. compose_norris_task_hint(self)
|
||||||
end
|
end
|
||||||
return string.format(NORRIS_SUFFIX_TEMPLATE, self.norris_goal, task_hint)
|
```
|
||||||
|
|
||||||
|
Also (R6 fix) defensive clear in `Context:reset()`:
|
||||||
|
```lua
|
||||||
|
function Context:reset()
|
||||||
|
self.turns = {}
|
||||||
|
self.pending_exec_output = nil
|
||||||
|
self.summary = nil
|
||||||
|
self.norris_tasks = nil -- R6: defensive; :reset is unreachable
|
||||||
|
-- mid-Norris but cheap to be safe.
|
||||||
end
|
end
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -236,9 +245,15 @@ If executor model lookup fails (`cfg.norris.executor` names a
|
|||||||
non-existent preset) → status log + use active_cfg (existing
|
non-existent preset) → status log + use active_cfg (existing
|
||||||
behavior). User can fix config and re-launch.
|
behavior). User can fix config and re-launch.
|
||||||
|
|
||||||
If `:reset` clears the conversation mid-Norris → existing behavior
|
If `:reset` is invoked → unreachable mid-Norris (no readline prompt
|
||||||
clears turns; `ctx.norris_tasks` should ALSO clear since the goal
|
while the planner is running). Out-of-Norris, `Context:reset()` now
|
||||||
context is gone. Document in §9.
|
also clears `self.norris_tasks` as defensive coding (R6 fix).
|
||||||
|
|
||||||
|
R4: `run_norris` clears `ctx.norris_active`/`ctx.norris_goal`/
|
||||||
|
`ctx.norris_tasks` at the **top** of the function, BEFORE the preplan
|
||||||
|
block. This guarantees a fresh launch starts clean even if a prior
|
||||||
|
Norris session crashed with stale state. Cheaper than wrapping the
|
||||||
|
whole driver in pcall.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -296,6 +311,7 @@ become assistant turns visible there).
|
|||||||
| Eviction during long Norris session removes preplan + first executor turns | Tasks stored on ctx (NOT in turns); survive eviction. Per Phase 3 R-C3 the goal anchor in the NORRIS suffix also survives. |
|
| Eviction during long Norris session removes preplan + first executor turns | Tasks stored on ctx (NOT in turns); survive eviction. Per Phase 3 R-C3 the goal anchor in the NORRIS suffix also survives. |
|
||||||
| Preplan system prompt drift (user overrides badly) | Built-in fallback if cfg.norris.preplan_system absent; user override is opt-in. |
|
| Preplan system prompt drift (user overrides badly) | Built-in fallback if cfg.norris.preplan_system absent; user override is opt-in. |
|
||||||
| Anthropic cloud preplan emits "Here's my plan:\n1. ...\n2. ..." (markdown numbering) instead of TASK: lines | extract_task_lines uses strict `^TASK:` matcher; markdown lists are ignored. preplan_system explicitly demands the format. If real cloud models drift, document or refine prompt at impl time. |
|
| Anthropic cloud preplan emits "Here's my plan:\n1. ...\n2. ..." (markdown numbering) instead of TASK: lines | extract_task_lines uses strict `^TASK:` matcher; markdown lists are ignored. preplan_system explicitly demands the format. If real cloud models drift, document or refine prompt at impl time. |
|
||||||
|
| R3: preplan call bypasses `call_broker` (Phase 5 fallback-retry wrapper) | **By design** — retrying the preplan against `fallback_model` would produce a different decomposition from a different model. That's not a recovery; it's a silent semantics change. Hard-fail to single-model Norris is the safer fallback. Documented here so a future maintainer doesn't "fix" it by wiring `call_broker` and surprise users. |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -326,8 +342,8 @@ become assistant turns visible there).
|
|||||||
|---|---|---|---|
|
|---|---|---|---|
|
||||||
| 1 | `executor: extract_task_lines for Phase 10 preplan parsing` | executor.lua + inline test | Pure function; verifiable standalone. Locks the TASK: parse contract before the preplan call wires it. |
|
| 1 | `executor: extract_task_lines for Phase 10 preplan parsing` | executor.lua + inline test | Pure function; verifiable standalone. Locks the TASK: parse contract before the preplan call wires it. |
|
||||||
| 2 | `context: norris_tasks anchor + task-hint composition` | context.lua + inline test | New field on Context. Adds `compose_norris_task_hint(self)`; appends after the NORRIS suffix. ctx.norris_tasks is nil by default → no regression. |
|
| 2 | `context: norris_tasks anchor + task-hint composition` | context.lua + inline test | New field on Context. Adds `compose_norris_task_hint(self)`; appends after the NORRIS suffix. ctx.norris_tasks is nil by default → no regression. |
|
||||||
| 3 | `safety/renderer: pass current task descr through norris_step` | safety.lua + repl.lua tiny wiring | One-line tweak in safety.lua to source descr from ctx.norris_tasks. helpers.render_step already accepts descr (line 246 of renderer.lua). |
|
| 3 | `safety: pass current task descr to render_step from norris_step` | safety.lua ONLY | One-line tweak in safety.lua to source `descr` from `ctx.norris_tasks` and pass to `helpers.render_step(step_n, max_steps, descr)`. **No repl.lua change in this commit** (R5 clarification). |
|
||||||
| 4 | `repl: preplan + executor cfg resolution + tasks_max truncate (closes #89)` | repl.lua | The orchestration commit. Pre-loop preplan block; fall-back paths; executor cfg resolution (active_cfg vs cfg.norris.executor); ctx.norris_tasks lifecycle. |
|
| 4 | `repl: preplan + executor cfg resolution + tasks_max truncate (closes #89)` | repl.lua | The orchestration commit. Pre-loop preplan block; fall-back paths; executor cfg resolution (`active_cfg` vs `cfg.norris.executor`); `ctx.norris_tasks` lifecycle (clear-at-top per R4); pass executor_cfg to safety.norris_step instead of active_cfg. |
|
||||||
| 5 | `phase10: config example + MEMORY index + project status` | config.lua, MEMORY.md, memory/project_phase_status.md | Documentation + persistent project state. Ships the user-visible config block. |
|
| 5 | `phase10: config example + MEMORY index + project status` | config.lua, MEMORY.md, memory/project_phase_status.md | Documentation + persistent project state. Ships the user-visible config block. |
|
||||||
|
|
||||||
Each commit must leave the tree in a state where `luajit main.lua` runs and existing tests pass; commits 1-3 ship behind a feature-unused-yet stance (nothing calls them), commit 4 lights them up, commit 5 documents.
|
Each commit must leave the tree in a state where `luajit main.lua` runs and existing tests pass; commits 1-3 ship behind a feature-unused-yet stance (nothing calls them), commit 4 lights them up, commit 5 documents.
|
||||||
@@ -342,7 +358,18 @@ Each commit must leave the tree in a state where `luajit main.lua` runs and exis
|
|||||||
|
|
||||||
### Resolved review tickets folded into the plan
|
### Resolved review tickets folded into the plan
|
||||||
|
|
||||||
(None yet — Sonnet review runs after this manifest is committed.)
|
**Sonnet review 2026-05-17 — 2 blockers + 4 important + 2 nits. All accepted.**
|
||||||
|
|
||||||
|
- **R1 (blocker)** `sys:gsub("N", ...)` would corrupt "No prose", "No commentary", "No numbering" → "16o prose". **Fix**: use `string.format` with `%d` in the template, replace the gsub call.
|
||||||
|
- **R2 (blocker)** §5 pseudocode showed a 2-slot NORRIS_SUFFIX_TEMPLATE redesign, contradicting §11's "don't change the template; append helper output AFTER". **Fix**: §5 below now shows the helper-append approach matching §11.
|
||||||
|
- **R3 (important)** Preplan call bypasses `call_broker` (Phase 5 fallback-retry wrapper). **Decision: intentional** — fallback for a preplan call would produce a different decomposition from a different model, which is actively undesirable. Documented in §10 Risks.
|
||||||
|
- **R4 (important)** No pcall around `run_norris` → stale `ctx.norris_active`/`norris_goal`/`norris_tasks` on uncaught error. Pre-existing bug; Phase 10 adds one more leaky field. **Fix**: clear all three at the TOP of `run_norris` (before preplan) so a fresh launch always starts clean regardless of prior crash. Cheaper than full pcall wrap; sufficient for the stale-tasks vector.
|
||||||
|
- **R5 (important)** C3 commit scope ambiguity. **Clarification**: C3's "tiny repl.lua wiring" is ONLY passing `descr` to `render_step`. Executor cfg resolution (active_cfg vs cfg.norris.executor) lands in C4 alongside the preplan block. Table updated.
|
||||||
|
- **R6 (important)** `ctx.norris_tasks` lifecycle vs `Context:reset()`. **Fix**: add `self.norris_tasks = nil` to `Context:reset()` as defensive coding (one line, no regression). §7 amended to remove the contradictory "Document in §9" deferral.
|
||||||
|
- **R7 (nit)** Hardcoded `timeout_ms = 60000` ignores `pre_cfg.timeout_ms`. **Fix**: `pre_cfg.timeout_ms or 60000` in §4 pseudocode.
|
||||||
|
- **R8 (nit)** "Status:" label in §1 acceptance criterion could be misread as on-screen prefix. **Fix**: rename to "Terminal output:".
|
||||||
|
- **R9-R11**: confirmations of clean composition with #87 (compression doesn't fire during Norris steps — correct), #86/#88 (both scoped to ask_ai; can't leak into preplan call site). No action.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 12. Phase 10 → Phase 11+ Out-of-band
|
## 12. Phase 10 → Phase 11+ Out-of-band
|
||||||
|
|||||||
Reference in New Issue
Block a user