diff --git a/docs/PHASE10.md b/docs/PHASE10.md index 9710366..7c7b50c 100644 --- a/docs/PHASE10.md +++ b/docs/PHASE10.md @@ -57,7 +57,7 @@ Four pillars: = "fast"`: 1. Cloud emits a TASK list (e.g., `TASK: find /var/log -size +10M`; `TASK: stat -c "%n %s" `; `TASK: format and report`). - 2. Status: `[aish] preplanned 3 tasks via cloud` + 2. Terminal output: `[aish] preplanned 3 tasks via cloud` (R8: was "Status:") 3. Per-step execution by `fast`: each step shows the task it's working on; existing HALT protocol still gates destructive ops. - Without `cfg.norris.preplanner`, Norris behaves exactly as Phase 6 @@ -124,17 +124,21 @@ line, EXACTLY this shape: TASK: -Output AT MOST N tasks. No prose; no numbering; no commentary outside +Output AT MOST %d tasks. No prose; no numbering; no commentary outside the TASK: lines. ]] - sys = sys:gsub("N", tostring(config.norris.tasks_max or 16)) + -- R1 fix: %d via string.format; gsub("N", ...) would corrupt + -- "No prose / No commentary / No numbering" → "16o prose" etc. + sys = string.format(sys, config.norris.tasks_max or 16) local msgs = scrub_messages({ { role = "system", content = sys }, { role = "user", content = goal }, }, secrets_mode_for(pre_cfg)) local text, usage = broker.chat(pre_cfg, msgs, - { category = "norris-preplan", - max_tokens = 800, timeout_ms = 60000 }) + { category = "norris-preplan", + max_tokens = 800, + -- R7 fix: respect the model's configured timeout + timeout_ms = pre_cfg.timeout_ms or 60000 }) if text then if secrets_session then text = secrets_session:rehydrate(text) end if usage then _record_usage(usage.model, usage.category, usage) end @@ -174,33 +178,38 @@ each `result.status == "continue"`, bump `current > #ctx.norris_tasks.list`, the loop exits with a synthesized `"tasks_complete"` final status. -System suffix extension (context.lua NORRIS_SUFFIX_TEMPLATE): +System suffix extension (R2 fix — keep NORRIS_SUFFIX_TEMPLATE +**unchanged**; append a task-hint block AFTER the existing format): ```lua -local NORRIS_SUFFIX_TEMPLATE = [[ +-- New helper at module scope in context.lua, alongside NORRIS_SUFFIX_TEMPLATE: +local function compose_norris_task_hint(self) + if not (self.norris_tasks and self.norris_tasks.list) then return "" end + local k = self.norris_tasks.current + local n = #self.norris_tasks.list + local task = self.norris_tasks.list[k] + if not task then return "" end -- exhausted → no hint + return string.format( + "\n\nCurrent step %d/%d:\n %s", k, n, task) +end +-- In Context:to_messages, AFTER the existing string.format(NORRIS_SUFFIX...) +-- block, append the hint: +if self.norris_active and self.norris_goal then + sys_content = sys_content + .. string.format(NORRIS_SUFFIX_TEMPLATE, self.norris_goal) + .. compose_norris_task_hint(self) +end +``` -[NORRIS MODE] You are operating autonomously toward the following goal: - - %s - -%s - -Plan and execute step by step ... -]] - --- Compose: 1st %s = goal; 2nd %s = task hint (empty when no tasks). -local function compose_norris_suffix(self) - local task_hint = "" - if self.norris_tasks and self.norris_tasks.list then - local k = self.norris_tasks.current - local n = #self.norris_tasks.list - if self.norris_tasks.list[k] then - task_hint = string.format( - "Current step %d/%d:\n %s\n", k, n, self.norris_tasks.list[k]) - end - end - return string.format(NORRIS_SUFFIX_TEMPLATE, self.norris_goal, task_hint) +Also (R6 fix) defensive clear in `Context:reset()`: +```lua +function Context:reset() + self.turns = {} + self.pending_exec_output = nil + self.summary = nil + self.norris_tasks = nil -- R6: defensive; :reset is unreachable + -- mid-Norris but cheap to be safe. end ``` @@ -236,9 +245,15 @@ If executor model lookup fails (`cfg.norris.executor` names a non-existent preset) → status log + use active_cfg (existing behavior). User can fix config and re-launch. -If `:reset` clears the conversation mid-Norris → existing behavior -clears turns; `ctx.norris_tasks` should ALSO clear since the goal -context is gone. Document in §9. +If `:reset` is invoked → unreachable mid-Norris (no readline prompt +while the planner is running). Out-of-Norris, `Context:reset()` now +also clears `self.norris_tasks` as defensive coding (R6 fix). + +R4: `run_norris` clears `ctx.norris_active`/`ctx.norris_goal`/ +`ctx.norris_tasks` at the **top** of the function, BEFORE the preplan +block. This guarantees a fresh launch starts clean even if a prior +Norris session crashed with stale state. Cheaper than wrapping the +whole driver in pcall. --- @@ -296,6 +311,7 @@ become assistant turns visible there). | Eviction during long Norris session removes preplan + first executor turns | Tasks stored on ctx (NOT in turns); survive eviction. Per Phase 3 R-C3 the goal anchor in the NORRIS suffix also survives. | | Preplan system prompt drift (user overrides badly) | Built-in fallback if cfg.norris.preplan_system absent; user override is opt-in. | | Anthropic cloud preplan emits "Here's my plan:\n1. ...\n2. ..." (markdown numbering) instead of TASK: lines | extract_task_lines uses strict `^TASK:` matcher; markdown lists are ignored. preplan_system explicitly demands the format. If real cloud models drift, document or refine prompt at impl time. | +| R3: preplan call bypasses `call_broker` (Phase 5 fallback-retry wrapper) | **By design** — retrying the preplan against `fallback_model` would produce a different decomposition from a different model. That's not a recovery; it's a silent semantics change. Hard-fail to single-model Norris is the safer fallback. Documented here so a future maintainer doesn't "fix" it by wiring `call_broker` and surprise users. | --- @@ -326,8 +342,8 @@ become assistant turns visible there). |---|---|---|---| | 1 | `executor: extract_task_lines for Phase 10 preplan parsing` | executor.lua + inline test | Pure function; verifiable standalone. Locks the TASK: parse contract before the preplan call wires it. | | 2 | `context: norris_tasks anchor + task-hint composition` | context.lua + inline test | New field on Context. Adds `compose_norris_task_hint(self)`; appends after the NORRIS suffix. ctx.norris_tasks is nil by default → no regression. | -| 3 | `safety/renderer: pass current task descr through norris_step` | safety.lua + repl.lua tiny wiring | One-line tweak in safety.lua to source descr from ctx.norris_tasks. helpers.render_step already accepts descr (line 246 of renderer.lua). | -| 4 | `repl: preplan + executor cfg resolution + tasks_max truncate (closes #89)` | repl.lua | The orchestration commit. Pre-loop preplan block; fall-back paths; executor cfg resolution (active_cfg vs cfg.norris.executor); ctx.norris_tasks lifecycle. | +| 3 | `safety: pass current task descr to render_step from norris_step` | safety.lua ONLY | One-line tweak in safety.lua to source `descr` from `ctx.norris_tasks` and pass to `helpers.render_step(step_n, max_steps, descr)`. **No repl.lua change in this commit** (R5 clarification). | +| 4 | `repl: preplan + executor cfg resolution + tasks_max truncate (closes #89)` | repl.lua | The orchestration commit. Pre-loop preplan block; fall-back paths; executor cfg resolution (`active_cfg` vs `cfg.norris.executor`); `ctx.norris_tasks` lifecycle (clear-at-top per R4); pass executor_cfg to safety.norris_step instead of active_cfg. | | 5 | `phase10: config example + MEMORY index + project status` | config.lua, MEMORY.md, memory/project_phase_status.md | Documentation + persistent project state. Ships the user-visible config block. | Each commit must leave the tree in a state where `luajit main.lua` runs and existing tests pass; commits 1-3 ship behind a feature-unused-yet stance (nothing calls them), commit 4 lights them up, commit 5 documents. @@ -342,7 +358,18 @@ Each commit must leave the tree in a state where `luajit main.lua` runs and exis ### Resolved review tickets folded into the plan -(None yet — Sonnet review runs after this manifest is committed.) +**Sonnet review 2026-05-17 — 2 blockers + 4 important + 2 nits. All accepted.** + +- **R1 (blocker)** `sys:gsub("N", ...)` would corrupt "No prose", "No commentary", "No numbering" → "16o prose". **Fix**: use `string.format` with `%d` in the template, replace the gsub call. +- **R2 (blocker)** §5 pseudocode showed a 2-slot NORRIS_SUFFIX_TEMPLATE redesign, contradicting §11's "don't change the template; append helper output AFTER". **Fix**: §5 below now shows the helper-append approach matching §11. +- **R3 (important)** Preplan call bypasses `call_broker` (Phase 5 fallback-retry wrapper). **Decision: intentional** — fallback for a preplan call would produce a different decomposition from a different model, which is actively undesirable. Documented in §10 Risks. +- **R4 (important)** No pcall around `run_norris` → stale `ctx.norris_active`/`norris_goal`/`norris_tasks` on uncaught error. Pre-existing bug; Phase 10 adds one more leaky field. **Fix**: clear all three at the TOP of `run_norris` (before preplan) so a fresh launch always starts clean regardless of prior crash. Cheaper than full pcall wrap; sufficient for the stale-tasks vector. +- **R5 (important)** C3 commit scope ambiguity. **Clarification**: C3's "tiny repl.lua wiring" is ONLY passing `descr` to `render_step`. Executor cfg resolution (active_cfg vs cfg.norris.executor) lands in C4 alongside the preplan block. Table updated. +- **R6 (important)** `ctx.norris_tasks` lifecycle vs `Context:reset()`. **Fix**: add `self.norris_tasks = nil` to `Context:reset()` as defensive coding (one line, no regression). §7 amended to remove the contradictory "Document in §9" deferral. +- **R7 (nit)** Hardcoded `timeout_ms = 60000` ignores `pre_cfg.timeout_ms`. **Fix**: `pre_cfg.timeout_ms or 60000` in §4 pseudocode. +- **R8 (nit)** "Status:" label in §1 acceptance criterion could be misread as on-screen prefix. **Fix**: rename to "Terminal output:". +- **R9-R11**: confirmations of clean composition with #87 (compression doesn't fire during Norris steps — correct), #86/#88 (both scoped to ask_ai; can't leak into preplan call site). No action. + --- ## 12. Phase 10 → Phase 11+ Out-of-band