Files
aish/docs/PHASE10.md
T
marfrit a7cbe22d1d phase10: formulate manifest — cloud preplanner / local executor split
Resolves direction for #89. Splits Norris into two roles:

- Preplanner (cloud) fires ONCE at :norris launch; emits TASK: list.
- Executor (local) handles each TASK; existing HALT protocol intact.

ctx.norris_tasks anchor survives eviction (mirrors ctx.norris_goal).
Cost category 'norris-preplan' separates the cloud preplan call
from per-step executor cost in :cost detail.

Graceful fall-back when cfg.norris.preplanner is unset OR preplan
call fails — Norris runs as today (single-model). No regression for
existing users.

PHASE0 §11 amended to add Phase 10 row.

Manifest declares 6 Open Questions for analyze step; 12 design
decisions table; module-touch table; 4-pillar plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:11:33 +00:00

18 KiB

aish — Phase 10 Manifest

Project: aish — AI-augmented conversational shell Document: Phase 10 Requirements, Architecture & Design Decisions Status: Formulate (pre-analyze) Date: 2026-05-17

PHASE0 is the locked substrate; PHASE1-9 are layered on top. This manifest specifies what Phase 10 adds — Cloud preplanner → local executor split for Norris autonomous mode. Resolves Gitea issue #89.

Today Norris runs entirely on ONE model: pick cloud (capable but slow per step + costs per step) OR local (fast + free per step but easily distracted on multi-step planning). Phase 10 splits the planning and execution roles: cloud emits a TASK list ONCE per Norris session; local model executes each task. Most tasks are simple shell ops the local model handles fine; cloud is used only at the planning layer that benefits from its reasoning.

PHASE0 §11 amendment to add Phase 10 row lands in the same commit as this formulate doc.


1. Scope of Phase 10

Four pillars:

  1. Preplan call — on :norris <goal> launch, if cfg.norris.preplanner names a configured model preset, fire ONE broker.chat call against that preset with a system-prompt asking for TASK: <imperative> lines. Parse them into a list; cap at cfg.norris.tasks_max (default 16). Stash the list + current index on ctx (separate from ctx.turns so eviction can't lose them — mirrors the ctx.norris_goal anchor).

  2. Executor loopsafety.norris_step already iterates per-step; extend its prompt to include the CURRENT task. Synthesize a user- turn-shaped [task k/N] <task text> block fed alongside the existing NORRIS suffix. When all tasks consumed (or executor signals GOAL: complete early), Norris exits.

  3. Cost + secrets composition — preplan call goes through the normal scrub_messages + on_delta usage callbacks. Category "norris-preplan"; executor steps keep "norris". :cost detail surfaces both as separate rows.

  4. Graceful fall-back — if cfg.norris.preplanner is unset OR the preplan call fails (transport err, parse failure, empty list), Norris runs as today: single model handles both planning and execution via the existing in-loop reasoning. No regression for users without Phase 10 config.

Phase 10 is done when:

  • :norris find files larger than 10MB in /var/log and report sizes launched with cfg.norris.preplanner = "cloud" + cfg.norris.executor = "fast":
    1. Cloud emits a TASK list (e.g., TASK: find /var/log -size +10M; TASK: stat -c "%n %s" <results>; TASK: format and report).
    2. Status: [aish] preplanned 3 tasks via cloud
    3. Per-step execution by fast: each step shows the task it's working on; existing HALT protocol still gates destructive ops.
  • Without cfg.norris.preplanner, Norris behaves exactly as Phase 6 (no regression for existing users).
  • Preplan failure (broken cloud endpoint) → status log + fall back to single-model Norris.
  • :cost detail after a Norris session shows BOTH cloud / norris-preplan (one row) and <executor model> / norris (one row).

2. Technology Decisions (delta from Phase 9)

Decision Choice Rationale
Preplan trigger ONCE at :norris <goal> launch (run_norris in repl.lua) One round-trip per Norris session keeps cost predictable. Re-planning mid-flight deferred to a future iteration.
Preplan model selection cfg.norris.preplanner (string; matches a key in cfg.models) Same shape as cfg.safety.llm_model. Optional; absent = no split, existing behavior.
Executor model selection cfg.norris.executor (string; matches cfg.models key) Optional; absent = active_cfg (the user's :model choice at launch — existing behavior).
Preplan system prompt Static template baked into safety.lua: "Decompose the goal into single-step imperative TASKs. Output format: TASK: <imperative sentence, max 80 chars>. Maximum N tasks." with N = cfg.norris.tasks_max Predictable parse; small surface. Override via cfg.norris.preplan_system if user wants.
TASK line parsing ^TASK:%s*(.+)$ per line; trim whitespace; filter empty Same shape as the existing CMD: / DELEGATE: / CMD&: extractors in executor.lua. Trivially adapt extract_*_lines.
Task storage ctx.norris_tasks = { current = 1, list = {...} } (NEW field, separate from ctx.turns) Survives eviction (mirrors ctx.norris_goal anchor); cleared at Norris exit.
Step-prompt synthesis safety.norris_step reads ctx.norris_tasks.list[current] and prepends [task k/N] <text> to the rendered messages (system block? or synth user turn?). Decision: prepend to the NORRIS suffix already in the system prompt. Keeps user-turn alternation legal; NORRIS suffix already exists and is per-turn re-composed.
Per-task advance After safety.norris_step returns "continue", repl.lua's run_norris bumps ctx.norris_tasks.current. When current > #list, Norris exits with status "tasks_complete". Same as the existing step counter; just tied to the task list now.
Goal anchor + task layered together Both visible in the NORRIS suffix: goal: line (existing) + current task k/N: line (new) Planner-executor still sees the global goal AND the current focus.
Preplan parse failure Status log + fall back to single-model Norris (no tasks) Robust; user can re-launch :norris if preplan was wonky.
Preplan empty result Same as parse failure — fall back Robust.
tasks_max cap Default 16; cfg.norris.tasks_max overrides Bounded blast radius; matches the existing max_norris_steps cap intent.
Cost category "norris-preplan" for the preplan call; "norris" for executor steps (existing) :cost detail surfaces them as separate rows.
Secrets/scrub Preplan call goes through scrub_messages + rehydrate (matches all other broker calls in repl.lua) No special-case.
Norris HALT protocol Unchanged — per executor step Existing safety.is_destructive + halt-proceed/skip/abort still gates.
Skip semantics If user halts and skips at task k, advance to task k+1 (NOT re-try) Predictable; user can :norris off + relaunch with refined goal if they need full re-plan.

3. Module Changes

File State after Phase 9 Phase 10 changes
repl.lua run_norris(goal) builds helpers, runs while loop calling safety.norris_step Pre-loop: if cfg.norris.preplanner set, fire one broker.chat against that preset; parse TASK lines; set ctx.norris_tasks. Per-iteration: bump ctx.norris_tasks.current after each non-terminal result; exit "tasks_complete" when exhausted.
safety.lua norris_step composes the NORRIS suffix; uses model_cfg for broker call Read ctx.norris_tasks if set; embed [task k/N] <text> into the suffix template OR pass via opts. Use cfg.norris.executor (resolved by repl.lua at run_norris launch) for the per-step broker call.
context.lua system prompt composition + ctx.norris_active/norris_goal/norris_consecutive_skips Add ctx.norris_tasks field (table or nil); clear on :reset (matches norris_goal lifecycle). NORRIS_SUFFIX_TEMPLATE extended to optionally show current task.
executor.lua extract_cmd_lines, extract_cmd_bg_lines, extract_delegate_lines Add extract_task_lines(text) — pure function.
config.lua Phase 9 .aish.lua header + existing example blocks Add commented-out norris = { preplanner = "cloud", executor = "fast", tasks_max = 16 } block.
docs/PHASE0.md §11 lists phases 0-9 Amendment: add Phase 10 row.

No new module files.


4. Pillar 1 — Preplan call

-- repl.lua run_norris, pre-loop block:
local tasks
if config.norris and config.norris.preplanner then
    local pre_name = config.norris.preplanner
    local pre_cfg  = config.models and config.models[pre_name]
    if pre_cfg then
        local sys = (config.norris and config.norris.preplan_system) or [[
You are a task decomposer. Given the user's goal, decompose it into a
sequence of single-step imperative TASKs. Output format: one TASK per
line, EXACTLY this shape:

  TASK: <imperative sentence, max 80 chars>

Output AT MOST N tasks. No prose; no numbering; no commentary outside
the TASK: lines.
]]
        sys = sys:gsub("N", tostring(config.norris.tasks_max or 16))
        local msgs = scrub_messages({
            { role = "system", content = sys },
            { role = "user",   content = goal },
        }, secrets_mode_for(pre_cfg))
        local text, usage = broker.chat(pre_cfg, msgs,
            { category = "norris-preplan",
              max_tokens = 800, timeout_ms = 60000 })
        if text then
            if secrets_session then text = secrets_session:rehydrate(text) end
            if usage then _record_usage(usage.model, usage.category, usage) end
            local parsed = executor.extract_task_lines(text)
            local cap = config.norris.tasks_max or 16
            if #parsed > cap then
                -- trim and warn
                for i = #parsed, cap + 1, -1 do parsed[i] = nil end
                renderer.status(("preplan emitted >%d tasks; truncated"):format(cap))
            end
            if #parsed > 0 then
                tasks = parsed
                renderer.status(("preplanned %d tasks via %s"):format(#tasks, pre_name))
            else
                renderer.status("preplan produced no TASK lines; running single-model")
            end
        else
            renderer.status("preplan failed: " .. tostring(usage)
                            .. "; running single-model")
        end
    end
end
if tasks then
    ctx.norris_tasks = { current = 1, list = tasks }
end

5. Pillar 2 — Executor loop

safety.norris_step extension: if ctx.norris_tasks is set, embed the current task into the system suffix. The existing while loop in run_norris already calls norris_step once per iteration; after each result.status == "continue", bump ctx.norris_tasks.current = ctx.norris_tasks.current + 1. When current > #ctx.norris_tasks.list, the loop exits with a synthesized "tasks_complete" final status.

System suffix extension (context.lua NORRIS_SUFFIX_TEMPLATE):

local NORRIS_SUFFIX_TEMPLATE = [[


[NORRIS MODE] You are operating autonomously toward the following goal:

    %s

%s

Plan and execute step by step ...
]]

-- Compose: 1st %s = goal; 2nd %s = task hint (empty when no tasks).
local function compose_norris_suffix(self)
    local task_hint = ""
    if self.norris_tasks and self.norris_tasks.list then
        local k = self.norris_tasks.current
        local n = #self.norris_tasks.list
        if self.norris_tasks.list[k] then
            task_hint = string.format(
                "Current step %d/%d:\n    %s\n", k, n, self.norris_tasks.list[k])
        end
    end
    return string.format(NORRIS_SUFFIX_TEMPLATE, self.norris_goal, task_hint)
end

6. Pillar 3 — Cost + secrets composition

Preplan call goes through the same broker.chat API as Phase 7 cost- accumulator wiring. category = "norris-preplan" tags it for :cost detail separation:

[aish] session usage detail (total=$0.000119, 312/45 tokens):
  anthropic/claude-haiku-4.5  norris-preplan  1 calls,  180 / 35 tokens, $0.000099
  qwen-coder-7b-snappy-8k     norris          5 calls,  132 / 10 tokens, $0.000000  (local)
[aish] estimated session ctx: 412 tokens; token_budget=4096 (10.1% used)

Secrets scrub fires before broker.chat sees the messages; rehydrate on reply — same path as other call sites.


7. Pillar 4 — Graceful fall-back

If cfg.norris.preplanner is unset → tasks = nil → Norris behaves as Phase 6 (single-model loop; existing semantics).

If preplan call fails (transport err, parse failure, empty list) → status log + tasks = nil → same fall-back.

If executor model lookup fails (cfg.norris.executor names a non-existent preset) → status log + use active_cfg (existing behavior). User can fix config and re-launch.

If :reset clears the conversation mid-Norris → existing behavior clears turns; ctx.norris_tasks should ALSO clear since the goal context is gone. Document in §9.


8. UX Surface Summary

Config Default Effect
cfg.norris.preplanner nil Name of model preset for the preplan call; absent = no split
cfg.norris.executor nil (uses active model) Name of model preset for per-step execution
cfg.norris.tasks_max 16 Cap on TASK list size (parse-time trim)
cfg.norris.preplan_system (built-in template) Override preplan system prompt
Startup status Behavior
(preplan unset) nothing — existing single-model Norris
(preplan success) [aish] preplanned N tasks via <preplanner>
(preplan failed) [aish] preplan failed: <reason>; running single-model
(preplan over cap) [aish] preplan emitted >N tasks; truncated

No new meta commands in v1. Inspect via :cost detail (separate norris-preplan row) and the existing :history (preplan call + reply become assistant turns visible there).


9. Out of Scope (Phase 10)

  • Mid-flight re-plan — preplan fires ONCE per Norris launch. Re-plan based on per-step results would be a separate iteration; user can :norris off + re-launch with refined goal for v1.
  • Adaptive task decomposition — TASKs are fixed at launch; the executor doesn't get to refine them. v1 trusts the preplanner's parse.
  • Multi-step task = sub-tasks — flat list only. Nested TASK hierarchies are a future shape.
  • Skip-then-retry — skip at HALT advances to the next task; no retry mechanism. User re-launches if they need a retry.
  • Per-task model selection — single executor model for the whole session. Per-task routing (e.g. some tasks → cloud, some → local) is interesting but bigger surface; defer.
  • Preplan-while-executing — sequential: preplan first, THEN execute. Streaming overlap is a future optimization.

10. Risks

Risk Mitigation
Preplan model emits malformed output (no TASK: lines, or wraps in markdown) extract_task_lines tolerates leading whitespace + ignores non-TASK lines. If zero TASKs parsed, fall back to single-model.
Preplanner cost surprises user (silent paid call on every :norris launch) Phase 7 cost meter accounts it under norris-preplan category; warn_at_dollars still fires. Default = unset (no automatic cost).
Task list is wrong / off-goal Executor still has the global GOAL in the NORRIS suffix; can deviate per-step. Skip-budget per Phase 3 still escalates. User retains :norris off abort.
Local executor can't actually do a planned step (model too weak) Same as today's Norris-on-local case — model emits something useless; HALT prompt lets user skip or abort. Phase 10 doesn't fix this; preplan + execute split makes the failure mode more visible (you can SEE which TASK is stuck).
ctx.norris_tasks survives across non-:reset session boundaries Cleared at Norris exit (in run_norris's finally-equivalent) so re-launching Norris in same session starts fresh.
Eviction during long Norris session removes preplan + first executor turns Tasks stored on ctx (NOT in turns); survive eviction. Per Phase 3 R-C3 the goal anchor in the NORRIS suffix also survives.
Preplan system prompt drift (user overrides badly) Built-in fallback if cfg.norris.preplan_system absent; user override is opt-in.
Anthropic cloud preplan emits "Here's my plan:\n1. ...\n2. ..." (markdown numbering) instead of TASK: lines extract_task_lines uses strict ^TASK: matcher; markdown lists are ignored. preplan_system explicitly demands the format. If real cloud models drift, document or refine prompt at impl time.

11. Open Questions (Phase 10)

# Question Impact Resolution target
Q-PP1 Should cfg.norris.executor ALSO apply when cfg.norris.preplanner is unset (i.e., "always use this model for Norris regardless of :model selection")? Norris model selection scope Analyze
Q-PP2 Preplan call uses broker.chat (non-streaming) — for cloud Haiku that's ~2-3 seconds before Norris begins. Should we stream the TASKs as they're emitted (chat_stream) so the user sees the plan forming? UX latency Analyze (probably non-streaming is fine; 16 TASKs at 5 tokens each = 80 tokens; ~1s on cloud)
Q-PP3 If user runs :norris foo then later :norris bar in the same session, does preplanning fire again? Re-launch semantics Analyze (yes — each :norris launch fires preplan + populates new ctx.norris_tasks)
Q-PP4 Does the executor see the FULL goal AND current task, or just current task? Executor context Analyze (both; per §5 the NORRIS suffix composes both)
Q-PP5 Should :norris (without args) report ctx.norris_tasks state for an active session? Introspection Analyze (small ergonomics; could be a v2 polish)
Q-PP6 What if preplan returns 1 task ("Do the thing") that's basically the whole goal? Worth running? Degenerate case Analyze (yes — degenerate case still works; one-task execution is identical to no-preplan single-model run, just with an extra ~1s cost. Don't special-case.)

12. Phase 10 → Phase 11+ Out-of-band

Candidate follow-ups (non-binding):

  • Phase 11: cross-session cost rollup (Phase 7 §12 option 1 — long-deferred).
  • Cost preflight enforcement (Phase 7 §12 option 2 — also long- deferred; Phase 8's accurate counts are the prerequisite).
  • Mid-flight Norris re-plan — preplanner gets to re-decompose based on executor progress. Real value, but needs careful state-machine design (when to re-plan, how to preserve already- completed work).
  • Per-task model selection — task could carry a model hint emitted by the preplanner.

Phase 10 itself is self-contained — depends on Phase 3 (Norris) + Phase 7 (cost accumulator) which are both implemented.