diff --git a/docs/PHASE10.md b/docs/PHASE10.md index 7dd5edf..9710366 100644 --- a/docs/PHASE10.md +++ b/docs/PHASE10.md @@ -299,17 +299,50 @@ become assistant turns visible there). --- -## 11. Open Questions (Phase 10) +## 11. Open Questions — RESOLVED (analyze step) -| # | Question | Impact | Resolution target | +| # | Question | Resolution | +|---|---|---| +| Q-PP1 | `cfg.norris.executor` applies even without preplanner? | **YES.** Resolving the executor is independent of preplan. If `cfg.norris.executor` names a valid preset, `run_norris` uses it for `safety.norris_step` regardless of preplanner state. Preplanner unset + executor set = "always use cloud-haiku for Norris steps even though my interactive `:model` is qwen-coder". Useful split. | +| Q-PP2 | Stream the preplan TASKs as they're emitted? | **NO (v1 = non-streaming).** Use `broker.chat` (non-streaming) for preplan. Preplan emits ~16 × ~10 tokens = ~160 tokens total; on cloud Haiku that's <2s. Print the full TASK list at completion (`[aish] preplanned N tasks via cloud`) rather than streaming letter-by-letter. Streaming adds latency variance + screen flicker for sub-2s win. Reconsider if real-world preplan latency exceeds 5s. | +| Q-PP3 | Re-launch fires preplan again? | **YES, naturally.** Each `:norris ` re-enters `run_norris`. The pre-loop preplan block runs (different goal → different decomposition). `ctx.norris_tasks` is overwritten. No special re-launch logic needed; falls out of lifecycle. | +| Q-PP4 | Executor sees full goal AND current task? | **BOTH.** Goal anchor in NORRIS suffix (existing) + a NEW optional task-hint block appended right after. The executor planner can use the goal to detect off-track tasks and adjust its CMD: emission. | +| Q-PP5 | `:norris` (no args) reports tasks state? | **No — out-of-scope v1.** Inside Norris there's no readline prompt; meta commands aren't reachable. After exit, `ctx.norris_tasks` is cleared. The renderer's per-step `[step k/N: ]` line is the user-facing readout. Re-consider if users ask for a "task plan preview before execution" mode. | +| Q-PP6 | 1-task degenerate case? | **Run as normal, no special case.** Functionally identical to single-model Norris (executor sees goal + single TASK hint). Preplanner cost is the only delta. Acceptable. | + +**Additional findings from code reading:** + +- `safety.norris_step(ctx, model_cfg, ...)` takes `model_cfg` as a parameter. **Implication:** `run_norris` resolves the executor cfg ONCE pre-loop and passes it in every iteration. No signature change to safety.lua. The "executor" is just a different `model_cfg` than `active_cfg`. +- `Context:reset()` does NOT touch `norris_goal`/`norris_active` (Norris state is owned by `run_norris`, set on entry + cleared on exit). `ctx.norris_tasks` follows the same lifecycle: created at preplan, cleared at `run_norris` exit, NOT by `:reset` (which is unreachable mid-Norris anyway). +- `NORRIS_SUFFIX_TEMPLATE` has one `%s` slot for goal. Don't change the template; **append** a `compose_norris_task_hint(self)` helper output AFTER the formatted suffix. Keeps the template stable; the hint block is additive. +- Preplan call lives in `repl.lua` (not `safety.lua`) — keeps safety's invariant "single broker round-trip per call". Repl already orchestrates multi-call flows (Norris loop, secrets rehydration, routing); preplan is one more pre-loop hook. +- The renderer needs a per-step prefix showing `[step k/N: ]`. `renderer.norris_step` currently takes `(n, max_n)`; extend to `(n, max_n, descr)` — descr was already in the signature per the helpers contract above (line 339 of safety.lua), but `run_norris` doesn't pass it today. Phase 10 wiring fills that gap. + +--- + +## 11b. Plan — commit-by-commit roadmap (5 commits) + +| # | Commit subject | Files | Why this slice | |---|---|---|---| -| Q-PP1 | Should `cfg.norris.executor` ALSO apply when `cfg.norris.preplanner` is unset (i.e., "always use this model for Norris regardless of :model selection")? | Norris model selection scope | Analyze | -| Q-PP2 | Preplan call uses `broker.chat` (non-streaming) — for cloud Haiku that's ~2-3 seconds before Norris begins. Should we stream the TASKs as they're emitted (chat_stream) so the user sees the plan forming? | UX latency | Analyze (probably non-streaming is fine; 16 TASKs at 5 tokens each = 80 tokens; ~1s on cloud) | -| Q-PP3 | If user runs `:norris foo` then later `:norris bar` in the same session, does preplanning fire again? | Re-launch semantics | Analyze (yes — each :norris launch fires preplan + populates new ctx.norris_tasks) | -| Q-PP4 | Does the executor see the FULL goal AND current task, or just current task? | Executor context | Analyze (both; per §5 the NORRIS suffix composes both) | -| Q-PP5 | Should `:norris` (without args) report `ctx.norris_tasks` state for an active session? | Introspection | Analyze (small ergonomics; could be a v2 polish) | -| Q-PP6 | What if preplan returns 1 task ("Do the thing") that's basically the whole goal? Worth running? | Degenerate case | Analyze (yes — degenerate case still works; one-task execution is identical to no-preplan single-model run, just with an extra ~1s cost. Don't special-case.) | +| 1 | `executor: extract_task_lines for Phase 10 preplan parsing` | executor.lua + inline test | Pure function; verifiable standalone. Locks the TASK: parse contract before the preplan call wires it. | +| 2 | `context: norris_tasks anchor + task-hint composition` | context.lua + inline test | New field on Context. Adds `compose_norris_task_hint(self)`; appends after the NORRIS suffix. ctx.norris_tasks is nil by default → no regression. | +| 3 | `safety/renderer: pass current task descr through norris_step` | safety.lua + repl.lua tiny wiring | One-line tweak in safety.lua to source descr from ctx.norris_tasks. helpers.render_step already accepts descr (line 246 of renderer.lua). | +| 4 | `repl: preplan + executor cfg resolution + tasks_max truncate (closes #89)` | repl.lua | The orchestration commit. Pre-loop preplan block; fall-back paths; executor cfg resolution (active_cfg vs cfg.norris.executor); ctx.norris_tasks lifecycle. | +| 5 | `phase10: config example + MEMORY index + project status` | config.lua, MEMORY.md, memory/project_phase_status.md | Documentation + persistent project state. Ships the user-visible config block. | +Each commit must leave the tree in a state where `luajit main.lua` runs and existing tests pass; commits 1-3 ship behind a feature-unused-yet stance (nothing calls them), commit 4 lights them up, commit 5 documents. + +### Per-commit verification + +- **C1**: 6 inline unit cases for `extract_task_lines`: empty input → {}, single TASK → {it}, mixed CMD+TASK → only TASKs, leading whitespace tolerated, blank lines ignored, > tasks_max → caller's job to cap (function itself just parses). test runs from repo root. +- **C2**: 5 inline unit cases for `compose_norris_task_hint`: nil tasks → "", empty list → "", current=1 of 3 → contains "step 1/3", current > #list → "" (completed), full to_messages render with tasks shows hint in system content. self.turns + self.norris_tasks unmutated. +- **C3**: safety_test snapshot still 87/87 (no behavior change for the no-tasks path). Manual run of single-model Norris to confirm no regression. +- **C4**: E2E with cfg.norris.preplanner=cloud + executor=fast. Goal: `find files larger than 10MB in /var/log and report sizes`. Verify preplan emits 2-5 tasks; executor runs each. :cost detail shows two model rows. Fall-back E2E with preplanner pointing to bogus model → status log + normal Norris. +- **C5**: visual inspection of config.lua. MEMORY.md + project_phase_status.md updated to "Phase 0-10 done". + +### Resolved review tickets folded into the plan + +(None yet — Sonnet review runs after this manifest is committed.) --- ## 12. Phase 10 → Phase 11+ Out-of-band