Commit Graph

47 Commits

Author SHA1 Message Date
marfrit 76a8f97009 repl: cloud preplanner + local executor split for Norris (closes #89)
Phase 10 C4 — the orchestration commit. Splits Norris autonomous
mode into a one-shot cloud preplan + per-step local executor flow,
with graceful fall-back to single-model Norris when preplan is
disabled or fails.

run_norris additions (in order):

  1. R4 fix: clear ctx.norris_active/_goal/_tasks at the TOP so a
     prior crashed Norris can't leak stale state into the new launch.

  2. Preplan block (gated on cfg.norris.preplanner):
     - Look up the preplanner preset in cfg.models; warn + skip if
       absent.
     - Build a system prompt asking for TASK: <imperative> lines
       (R1: %d via string.format — gsub("N", ...) would corrupt
       "No prose / commentary / numbering" to "16o prose").
     - Scrub messages per the preplan model's redact policy; run
       broker.chat (non-streaming, per Q-PP2) with category
       "norris-preplan"; R7: respect pre_cfg.timeout_ms.
     - On success: rehydrate; record usage via _record_usage;
       extract_task_lines; cap to tasks_max; populate
       ctx.norris_tasks = { current = 1, list = parsed }.
     - On ANY failure (transport err / empty list / bogus preset):
       status log + leave ctx.norris_tasks nil → single-model
       fall-back. R3 design: NOT routed via call_broker; a fallback
       retry would silently swap planning models which is worse
       than a clean hard-fail.

  3. Executor cfg resolution (independent of preplan per Q-PP1):
     cfg.norris.executor names a preset → executor_cfg = that cfg.
     Unset / missing preset → executor_cfg = active_cfg (existing
     :model-selection behavior).

  4. Loop body: pass executor_cfg (not active_cfg) to
     safety.norris_step. After each "continue" result, advance
     ctx.norris_tasks.current. When current > #list, exit with
     synthesized status "tasks_complete" + reason "all N preplanned
     tasks executed".

  5. Exit cleanup: clear ctx.norris_tasks alongside the existing
     norris_active/_goal clears so a re-launch starts fresh.

renderer.norris_end gains "tasks_complete" as a non-error status
(cyan, same as "done"). Distinct from "done" (executor said
GOAL: complete) — executor exhausted the plan but didn't confirm
goal, which is a clean exit, not an error.

E2E verified (preplanner=fast, executor=fast on hossenfelder:8082):

  :norris print the date and the current uptime
  → preplanned 2 tasks via fast
  → ─ step 1/3 ─ Print the current date.
  → CMD: date → Sun May 17 ...
  → ─ step 2/3 ─ Print the current uptime.
  → CMD: uptime → ... up 1 day ...
  → NORRIS TASKS COMPLETE: all 2 preplanned tasks executed

  :cost detail correctly shows two rows for the same model:
    norris-preplan  1 calls,  95 /  12 tokens
    norris          1 calls, 364 /   9 tokens

Fall-back verified:
  cfg.norris.preplanner = "doesnotexist" →
    "[aish] preplanner 'doesnotexist' is not in cfg.models;
     running single-model" → Norris runs as Phase 6.

No-preplan path verified (no cfg.norris block):
  Norris runs exactly as Phase 6, no behavior change.

Regression: 87/87 safety, 31/31 router_model, repl loads.

Closes #89.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 08:21:25 +00:00
marfrit c55077bc07 context + repl + config: route-aware context compression (closes #87)
Small local models effectively use a fraction of their advertised
context window. Per-request compression for routes that hit a
local-compress-flagged model preset: keeps only the last N turns
and tail-truncates oversized content. Cloud routes get the full
context unchanged.

Changes:

- context.lua _compress_turns(turns, keep, max_chars): returns a
  new list (self.turns NEVER mutated) with the last `keep` turns
  preserved + content tail-truncated to `max_chars`. Defensive:
  drops tool turns at the slice head (orphaned without their
  assistant-with-tool_calls anchor — strict chat templates would
  reject them; same gotcha PHASE0 §6 warned about for user/user).

- Context:to_messages(opts) — opts.compress = { keep_turns,
  max_turn_chars } swaps the turn iterable for the compressed
  view. Affects BOTH the use_tool_role=true path and the
  use_tool_role=false fallback (PHASE2.md Q18 strict-template
  workaround). Persistence + display via :history see the full
  uncompressed ctx.turns.

- repl.lua ask_ai: when req_cfg (the routed model's cfg) has
  `local_compress = true`, build compress_opts from
  config.context.compress (defaults keep_turns=2, max_turn_chars=800).
  Pass through ctx:to_messages alongside the existing
  system_prompt_override (#86) — orthogonal opts that compose.

- Norris unaffected: safety.norris_step builds its own messages
  array; the planner needs full history per PHASE3 design.

- config.lua gains a header comment explaining the per-model opt-in
  + the context.compress defaults block + the documented tool-turn
  truncation trade-off.

13 unit cases verified:
  - no opts -> full turn list (no regression)
  - keep_turns=2 -> exactly last 2 emitted
  - long content tail-truncated to max_chars
  - self.turns unchanged after render
  - orphan tool-turn at slice head dropped (no chat-template violation)
  - tool turn included WITH its assistant anchor when keep_turns >= 3

E2E against live local broker:
  - models.fast.local_compress = true; keep_turns=1; max=200
  - 4-turn session: each broker call sees ONLY the current turn
    (verified by short coherent CMD replies despite no cross-turn
    memory available to the model). FR-promised small-model
    friendliness in action; conversation continuity is the
    documented trade-off.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 07:50:07 +00:00
marfrit 74e4bffb37 broker + repl + safety: GBNF grammar-sampling passthrough (closes #88)
llama.cpp constrains the sampler to ONLY emit tokens matching a
GBNF grammar. For small models this kills format drift at the
token level — `CMD: <cmd>` is enforced by the sampler rather than
hoped for via prompt discipline.

Probe finding (this commit's pre-implementation): cloud (Anthropic
via Bedrock) silently IGNORES the `grammar` field — returns normally
via standard sampling. Default passthrough is safe for all routes;
no per-model opt-in/opt-out needed in v1.

Changes:

- broker.lua build_request: `if opts.grammar then req.grammar =
  opts.grammar end`. Misformed grammar surfaces at request time
  via the existing transport-error path.

- repl.lua ask_ai: `grammar_override = config.routing.grammars
  [req_class]` (same gating shape as #86's system_prompts override).
  Passed via opts.grammar in the call_broker invocation.

- safety.lua is_destructive threads cfg.safety.probe_grammar through
  opts.grammar so llm_probe constrains the YES/NO output. Skips
  the regex-match dance entirely when the model can't drift.
  Caller-provided opts.grammar takes precedence over cfg.

- config.lua gains two commented examples:
  * routing.grammars per class
  * safety.probe_grammar for the destructive probe

6 unit cases verified (stubbed curl.post_sse / broker.chat):
  - default: no grammar in body
  - opts.grammar -> body contains grammar JSON-encoded
  - safety probe_grammar reaches llm_probe via opts
  - no probe_grammar configured -> opts.grammar nil
  - caller opts.grammar takes precedence over cfg.safety.probe_grammar

E2E against live local broker:
  - `routing.grammars.default = "root ::= \\"ACK\\""` configured;
    prompted "tell me a long story about a fox" -> model output
    EXACTLY "ACK" (sampler forced; would normally produce paragraphs).
    Grammar passthrough end-to-end confirmed.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 07:00:36 +00:00
marfrit 047d629a66 context + repl + config: per-class system_prompt override (closes #86)
Small local models follow precise structured instructions better than
natural language. Per-routing-class system_prompt override gives them
tighter instructions for THAT request while preserving ambient context.

Changes:

- Context:to_messages(opts) — opts.system_prompt_override REPLACES
  the base system_prompt for THIS render only (state unchanged).
  Dynamic blocks ([background], [project], [earlier summary], NORRIS
  suffix) still compose on top. opts is optional; nil-safe for old
  callers.

- repl.lua ask_ai — captures req_class from router.classify_model
  (already returned by Phase 5; previously discarded after the
  status line). Looks up config.routing.system_prompts[req_class];
  passes as opts.system_prompt_override to ctx:to_messages each
  iteration of the tool-sub-loop.

- Gating: override fires only when routing.auto is on (no class ->
  no override). If system_prompts[class] absent for a class, fall
  through to the default system_prompt (no surprise).

- Norris unaffected: safety.norris_step builds its own messages
  array; doesn't go through this path.

- config.lua gains a commented-out example showing routing.system_
  prompts with the code/default examples from the FR body.

Smoke verified:
  - 12-case context.lua unit test: opts nil/absent/present, override
    replaces base, dynamic blocks still compose, state unchanged
    after call, Norris-mode coexistence (suffix still present;
    background still suppressed).
  - E2E against cloud broker with routing.system_prompts.code set:
    triple-backtick prompt -> code class -> override fires; model
    emits terse code-only output. Non-code prompt -> default class
    -> no override -> normal verbose-ish reply.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 05:41:15 +00:00
marfrit 5b6ee553db repl: :config show meta + HELP (Phase 9 commit #3)
User-facing diagnostic for the project-overlay layer. Reads
config._sources (R3 cfg-embedded by main.lua's load_config_with_
overlay in commit #2) + the effective config; surfaces which file
contributed each top-level key.

:config show           top-level keys + which source set each
                        (nested tables collapsed to inner-key list)
:config show full      recursive dump with sensitive-key masking

Masking heuristic (any key containing token/secret/auth/key,
case-insensitive) -> "(set)" instead of the value. R6: applied
RECURSIVELY in full mode so the actual leak vector
(mcp.servers.<alias>.auth_token, models.<x>.auth_token) is caught.

Defensive depth cap (5) prevents pathological recursion.

When config._sources is absent (caller didn't go through
load_config_with_overlay), status: "(unknown — main didn't pass
_sources)" — meta still runs, just labels source as "?".

N2 known cosmetic false-positive: `key_env` / `auth_env` config
fields hold env-var NAMES (not secrets) but match the heuristic.
Future polish exempts `*_env` patterns. Same for `token_budget`
(contains "token") — also masked despite being a plain number.
Acceptable; errs toward over-masking.

HELP gains 1 :config line.

E2E verified across 4 scenarios with AISH_TRUST_FILE + isolated HOME:
  A. No project overlay: 6 user keys; nested tables collapsed.
     `secrets` masked as (set) at top level.
  B. Project overlay accepted: source map cleanly partitioned
     (user has 4 keys; project has 2 — default_model + models);
     each top-level row tagged [user] or [project].
  C. :config show full: nested dump; auth_token in models.cloud
     correctly masked as (set); SECRET_VAL never appears in
     output (grep count = 0).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Commit #4 next: config.lua template comment + PHASE9.md status
header -> Implement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:54:30 +00:00
marfrit 94b7d86926 repl: wire tokenize_fn + :cost detail estimate row (Phase 8 commit #4)
Activates Phase 8 pillars 2+3+5 end-to-end and adds the R3-revised
:cost detail trailing line.

Changes:

- When cfg.tokenize.use_endpoint is true, ctx_opts.tokenize_fn is
  set to `function(text) return broker.token_count(active_cfg, text) end`
  before Context.new fires. R4: the closure body references
  active_cfg DIRECTLY (upvalue) — Lua resolves upvalues at call
  time, so subsequent :model switches re-route to the new model's
  tokenizer automatically (verified by E2E: :model cloud after the
  fast call still produces clean estimate row).

- :cost detail gains a trailing line per R3:
    estimated session ctx: <N> tokens; token_budget=<M> (X.Y% used)
  N comes from ctx:estimate_tokens() (current in-memory snapshot,
  NOT a comparison against the accumulator sum above which is
  cumulative across calls + evicted turns). Gives at-a-glance
  budget utilization.

E2E verified against live broker:
  - fast model call -> 168 tokens estimated (real BPE via /tokenize)
  - :model cloud + cloud call -> 178 tokens estimated (closure
    follows :model switch correctly per R4)
  - 21% / 22.3% budget utilization shown
  - Accumulator sums and estimate are intentionally different
    (sums are cumulative, estimate is current snapshot) — R3-
    correctly displayed as separate lines

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

With this commit landed, Phase 8 is functionally complete; commit
#5 is config example + status bump.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:31:40 +00:00
marfrit 0d6ff93134 repl: :cost meta surface (Phase 7 commit #5)
User-facing reporter of the per-session accumulator. Three shapes:

  :cost            one-line summary (calls / tokens / cost)
  :cost detail     per-model + per-category breakdown
  :cost reset      zero the meter; clears warn flags

All read-only against ctx.usage_totals; no broker calls.

R6 — annotation uses the per-slot is_local sticky flag, NOT a fragile
cost==0 heuristic. Summary line classifies:

  cloud only -> "cost=$X.XXXXXX"
  cloud + local mix -> "cost=$X.XXXXXX (cloud only; local: tokens
                       but no cost field)"
  local only -> "cost=$X.XXXXXX (local only; no cost field)"

R7 — :cost detail rows sort by (cost desc, model asc, category asc).
Three-level key for deterministic output across equal-cost rows
(table.sort is unstable; identical costs would otherwise reorder).

R10 — all dollar values use $%.6f formatting. Sub-cent precision is
critical: a Haiku call can cost $0.000028; $%.4f would round it to
$0.0000 — indistinguishable from local $0.

Column width widened to %-26s to fit fully-qualified cloud model
names (e.g. "anthropic/claude-haiku-4.5" = 25 chars).

E2E verified against live cloud + local broker:

  :cost (empty session)          -> "0 calls, $0.000000"
  ...after mixed-mode session...
  :cost                          -> "5 calls, prompt=472 / completion=26
                                     tokens, cost=$0.000377 (cloud only;
                                     local: tokens but no cost field)"
  :cost detail                   -> 4 rows: main cloud $0.000219, probe
                                     cloud $0.000128, delegate cloud
                                     $0.000030, main local $0.000000
                                     (local). Sort by cost desc within
                                     model.
  :cost reset                    -> "cost meter reset"; subsequent
                                     :cost shows zeros.

All 5 categories appeared in the same session: main (twice — cloud
+ local), delegate, probe (x2 from :safety check). Warn-threshold
firing already verified in commit #3 + #4.

HELP gains 3 :cost lines.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:02:24 +00:00
marfrit b30212af0f safety + repl: opts.category for Norris + probe (Phase 7 commit #4)
Closes the last two broker call sites that flow through safety.lua.
Together with commits #1-#3, all 7 broker call sites in aish now
attribute usage to the cost accumulator under the right category.

Changes:

  safety.lua:

  - llm_probe (the YES/NO destructive checker) — broker.chat call
    gains opts.category = "probe". Captures (text, usage) via
    (reply, second) and, when opts.on_usage is provided AND the
    call succeeded, routes second through opts.on_usage(model,
    category, payload). N4 signature chain: opts already flowed
    through llm_second_opinion -> M.is_destructive from #52's
    work; opts.on_usage rides along naturally with no further
    signature change.

  - M.norris_step (Norris main broker round-trip):
      * opts to broker.chat_stream gains category = "norris"
      * probe_opts (passed to is_destructive inside the loop)
        gains on_usage = helpers.on_usage so the LLM probe's
        cost lands under "probe" too
      * on_delta wrapper adds elseif kind == "usage" branch that
        calls helpers.on_usage(payload.model, payload.category,
        payload). Coexists cleanly with the existing text (rehydrator)
        and tool_call branches.

  repl.lua:

  - Norris helpers table gains on_usage = _record_usage. The R5
    central chokepoint (commit #3) does the warn-threshold check
    AND ctx:add_usage atomically.

  - :safety check meta's probe_opts always carries on_usage now
    (independently of whether secrets_session is set). secrets-aware
    scrub_msgs/rehydrate added conditionally as before.

E2E verified against live broker (safety.llm_model = "cloud"):
  - :safety check ls -la /tmp -> 2 cloud probe calls
  - "[aish] session cost $0.000128 has crossed warn_at_dollars=$0.000100"
  - probe category visible in accumulator (would appear in :cost detail
    once commit #5 ships the meta).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:01:21 +00:00
marfrit 8adebd52cc repl: _record_usage helper + opts.category at 5 sites (Phase 7 commit #3)
Wires broker.lua's on_delta("usage", payload) and broker.chat's
(text, usage) return to the ctx accumulator via a single chokepoint.

Changes:

  - Forward decl `local _record_usage` near _bg_spawn — same pattern;
    the summarize-on-evict closure in make_summarize_fn (built at
    line 299) needs lexical access to _record_usage (assigned at
    line 695), so forward-declare and assign-without-`local`.

  - _record_usage(model, category, usage) — R5 central chokepoint:
    routes to ctx:add_usage, then checks the per-threshold warn
    state. R4: cost_warn_state has two independent flags (dollars
    and tokens) so first-to-fire doesn't suppress the other. R10:
    warn message uses $%.6f for sub-cent precision.

  - call_broker wrapper: wrapped on_delta now branches on
    kind == "usage" -> _record_usage(payload.model, payload.category,
    payload). R2: keys by payload.model (set inside broker.lua from
    model_cfg.model). When fallback fires, broker is called with
    fb_cfg, so payload.model IS the fallback's name automatically —
    wrapper doesn't track primary-vs-fallback itself.

  - 5 caller sites wired with opts.category:
      ask_ai call_broker             -> category="main"
      summarize-on-evict             -> category="summarize"
      DELEGATE: handler              -> category="delegate"
      :memory summarize              -> category="memory_summarize"
      :delegate meta                 -> category="delegate"

  - All 4 broker.chat call sites switched from
      local reply, err = broker.chat(...)
    to
      local reply, second = broker.chat(...)
    branching on reply nil-ness to interpret second (err on failure,
    usage on success). Captured usage routes through _record_usage.

E2E verified against live cloud broker:
  - cloud prompt -> reply "Hi! 👋"
  - Warn fired: "session cost $0.000219 has crossed warn_at_dollars=$0.000010"
  - R10 sub-cent precision visible in both numbers.

Norris + safety paths still untouched — commit #4 wires those.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:00:06 +00:00
marfrit 955bd82efb safety + repl: wire secrets into safety.lua (closes #52)
Closes the last #13 gap — Norris broker call + is_destructive LLM
second-opinion probe were the two egress points NOT covered by the
scrub-at-egress design in commit d852aca.

Approach: option (b) per #52's fix sketch — callback-via-helpers/opts.
safety.lua does NOT gain a require("secrets") dependency (acceptance
criteria 3); integration is purely through the convention the rest
of the helpers table already uses.

safety.lua changes:

  - llm_probe gains an opts table. When opts.scrub_msgs is set, the
    {system, user(cmd)} message pair is scrubbed before broker.chat.
    When opts.rehydrate is set, the YES/NO reply is rehydrated before
    parsing (defensive — the verdict shouldn't carry placeholders but
    rehydration is a safe no-op if it doesn't).

  - llm_second_opinion threads opts through to llm_probe.

  - M.is_destructive(cmd, cfg, opts) — opts optional; nil-opts is
    backwards-compatible (no scrub, original behavior).

  - M.norris_step:
      * outbound broker.chat_stream message scrubbed via
        helpers.scrub_msgs(ctx:to_messages(), model_cfg) when provided.
      * on_delta wrapped with helpers.streaming_rehydrator():push /
        :flush so the user sees rehydrated text AND text_parts
        accumulates rehydrated chunks (parity with ask_ai in repl.lua).
      * both M.is_destructive call sites (tool_call probe + CMD: probe)
        now pass probe_opts = {scrub_msgs, rehydrate} when the
        helpers carry them.

repl.lua changes:

  - Norris helpers table gains scrub_msgs / rehydrate /
    streaming_rehydrator closures, all nil-safe (return identity /
    nil when secrets_session is nil).

  - :safety check meta passes probe_opts to is_destructive when
    secrets_session is configured. Without secrets, behavior unchanged.

Unit-test verified end-to-end:
  - Stubbed broker.chat captures the messages it receives.
  - Without opts: probe SEES `ghp_realsecretvalue_...` (control).
  - With opts: probe sees `$AISH_SECRET_NNN` (correct scrub).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:40:30 +00:00
marfrit 11d0e599cd repl + renderer: tree-sitter highlighter (Phase 6 commit #5)
The largest Phase 6 commit — fence-aware stream filter in renderer.lua
+ external tree-sitter dispatch + :highlight meta in repl.lua.

renderer.lua — fence-aware filter wrapping assistant_delta:

  M.set_highlight(enabled, detected, highlight_fn)
      Called by repl.lua at startup AND on every :highlight toggle.
      Stores state in module-locals (off by default).

  State machine inside _hl_push:
    outside: pass chunks through; HOLD trailing partial-fence chars
             (per R1 — local llama.cpp splits ```python as `'``'`
             then `'`python\n'`, so naive pass-through drops the
             leading "``" and never recovers).
    inside:  buffer cumulatively until "\n```" appears; emit
             highlight_fn(body, lang) then the closing fence verbatim.
             Recursive call handles "rest" after the closing fence.

  N1: fences only open at start-of-stream OR after a newline
      (`^```` or `\n```` only). Inline backticks in prose
      ("use ``` to mark code") do not open a fence.

  R3 (PTY raw-mode toggle per highlight call): no change here — every
      executor.exec call already toggles raw-mode (existing behavior
      since Phase 1). The risk is theoretical; smoke-test interactively
      after install if multi-fence renders show flicker.

  assistant_flush handles end-of-stream gracefully: drains any held
  partial-fence tail OR an unterminated inside-fence buffer.

repl.lua — _detect_treesitter + highlighted + :highlight meta:

  _detect_treesitter()  one-shot popen probe of `tree-sitter --version`.
                        Run once at startup; cached as
                        highlight_detected.

  highlighted(body, lang_tag)   R2-placed in repl.lua (has _shq +
                                executor access). Translates the fence
                                tag (`py`, `python`, `lua`, etc.) to
                                a canonical lang via LANG_TAG, picks
                                the canonical extension via LANG_EXTENSION,
                                writes body to a tmpfile with that
                                extension, runs `tree-sitter highlight
                                <tmpfile>` via executor.exec, returns
                                the output. On ANY failure (CLI absent,
                                non-zero exit, empty output), returns
                                `body` unchanged — silent pass-through.

  R4 RESOLVED VIA REAL INSTALL: probed `tree-sitter highlight --help`
      on noether; confirmed:
        - NO `--lang` flag exists (formulate-time assumption wrong)
        - takes a PATH; language inferred from file extension
        - alternative `--scope source.X` exists but also unreliable
          without configured grammars
      Resolution: write tmpfile with `os.tmpname() .. LANG_EXTENSION[lang]`
      and pass the path. Matches the documented upstream contract.

  B4-followup: even with the CLI installed, highlighting requires
      `~/.config/tree-sitter/config.json` parser-directories with
      cloned + built `tree-sitter-<lang>` grammars. Without parsers,
      every call exits non-zero and we silently pass through. The
      :highlight install hint surfaces all three install steps so the
      user knows what's actually needed.

  :highlight [on|off|status] meta:
      no arg     -> flip
      on/off     -> set explicit
      status     -> report toggle + CLI detection state
      When toggled on AND CLI absent: emit a 4-line install hint
        (CLI install, init-config, grammar clone reminder).
      When toggled on AND CLI present: emit a 1-line note that
        parser-directories must be set up for actual highlighting.

HELP gains :highlight entry.

Tested:
  10/10 unit cases on the renderer state machine, including:
    - plain prose passthrough
    - single-chunk fence
    - B2 split fence ("``" + "`python\n" + "x=42" + "\n```")
    - N1 SOL anchor (mid-line ``` does not open)
    - trailing \n properly emitted across chunks
    - SOL-only fence open
    - prose after closing fence preserved
    - two fences in one stream
    - highlight off = passthrough (callback never fires)

  E2E :highlight meta verified:
    :highlight status -> off / detected
    :highlight on     -> toggles + emits parser-dir reminder
    :highlight status -> on / detected
    :highlight off    -> off

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Pillars 1 + 2 + 3 of Phase 6 now all implemented. Commit #6 is config
example block + status -> Implement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:27:04 +00:00
marfrit 0d63f01601 repl: expand_mentions tiered @<r1>..<r2> diff retry (Phase 6 commit #4)
Per A6 (tiered resolution): @<token> tries file lookup first; if the
file doesn't exist AND the token contains "..", retry as a git
ref-range and substitute with a fenced `diff` block. Preserves the
existing peel-on-trailing-punct logic (e.g., `@HEAD~1..HEAD,` peels
the comma, resolves the ref, restores the comma after the closing
fence).

Resolution order for @<token>:
  1. io.open(token, "rb")    -- file lookup, with trailing-punct peel
  2. if (1) fails and token contains "..":
        git --no-pager -c color.ui=never diff <r1>..<r2>
     on exit 0 + non-empty body: substitute as ```diff fenced block
  3. else: leave literal `@token` + emit "[aish] @X: not found" status

Examples:
  @README.md            -> file (path branch)
  @../sibling.txt       -> file (path branch; `..` only triggers retry
                                 when path lookup FAILS, so existing
                                 paths with `..` segments are unaffected)
  @HEAD~1..HEAD         -> diff (path fails, ref succeeds)
  @origin/main..feature -> diff (path fails — no such literal file;
                                 ref succeeds; `/` in ref is fine because
                                 we don't use the path's `/`-absence as
                                 a discriminator)
  @nonsense..gibberish  -> literal preserved (both fail)

Required restructuring:
  - _shq and _git_clean_cmd lifted from M.run closure scope to module
    scope (above expand_mentions). Single source of truth for the
    B1 prefix shared with commit #3's :diff. The in-M.run duplicates
    are removed.
  - expand_mentions now references `executor` (already required at
    module scope on line 7) for the diff retry.

Status messages updated:
  - File expansion: "@<path> expanded (N bytes, truncated)"  (existing)
  - Diff expansion: "@<path> expanded (N bytes, diff)"        (new)

Tested with the 7 existing #7 cases + 7 new diff-retry cases (14/14):
  ref-range expansion shape, body contains `diff --git`, trailing
  prose preserved, @../path stays as file (not diff), neither-path-
  nor-ref preserves literal, trailing-comma peel composes with ref
  retry.

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:20:25 +00:00
marfrit 4d5f93aaa5 repl: :diff meta + _git_clean_cmd helper (Phase 6 commit #3)
User-driven git diff injection. The model sees the diff on the next
ask_ai turn through the existing exec_output channel.

Changes:
  - _git_clean_cmd(subcmd_and_args) helper near _scan_project_tree.
    B1: every git invocation that flows into context MUST use
    `--no-pager -c color.ui=never`. Forkpty makes git think stdout
    is a TTY, enabling both color and the pager's keypad/line-clear
    escapes — these would pollute the captured context block. The
    helper is the single chokepoint; commit #4's @<r1>..<r2> retry
    will reuse it.

  - :diff [<args>] meta:
      - Reads cwd at meta invocation (R6: differs from :tree's
        scan-time cwd capture; documented in §5).
      - Runs `_git_clean_cmd("diff " .. args)` via executor.exec.
      - Empty output -> "(no diff): <label>" status, no context append.
      - Non-zero exit -> "diff failed (exit N): <label>" status,
        no context append. git's stderr already streamed to the
        user via executor.exec's live multiplex, so the failure
        reason is visible.
      - Success -> appends "[diff <label>]\n<output>" via
        ctx:append_exec_output. Label is "(working tree)" for empty
        args, else verbatim args.
      - Status confirms injection size: "diff injected: <label> (N bytes)".

  - HELP gains :diff line with three example arg shapes; N3-resolved
    (no `staged` alias — the meta is thin pass-through to git's grammar).

Smoke verified across four scenarios in an ephemeral test repo:
  - Working-tree dirty -> 110-byte diff injected, no ANSI escapes
  - --cached -> 118-byte staged diff injected, clean
  - garbage..nonexistent -> exit 128, status + skip
  - Clean working tree -> "(no diff)", status + skip

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:17:18 +00:00
marfrit d1dce832da repl: _scan_project_tree + :tree meta + auto_tree (Phase 6 commit #2)
First user-visible Phase 6 verb. Builds on commit #1's compose_project
plumbing — sets ctx.project from either the :tree meta or the
cfg.project.auto_tree startup hook.

Changes:
  - _scan_project_tree(dir, opts) helper near _run_hook:
      git -C <dir> ls-files --cached --others --exclude-standard
      when <dir> is inside a git repo (N4: no subshell);
      find <dir> -mindepth 1 -maxdepth <depth+1> -type f
      -not -path '*/.*' otherwise.
    Returns (body, info={file_count, truncated, in_git}). Sorted
    paths, truncated to max_chars (default 4096 per cfg).

  - :tree [<depth>|refresh|off] meta:
      no arg     -> scan with config defaults; resets _project_opts
      <N>        -> scan with depth=N; caches as _project_opts
      refresh    -> re-scan with cached _project_opts (else defaults)
      off        -> clear ctx.project AND ctx._project_opts (R5)
    Status line reports file count + truncation flag + which backend
    fired (git/find).

  - cfg.project.auto_tree startup hook before the main loop:
      if true, scan libc.getcwd() once and set ctx.project.
      Failures status-logged once; REPL continues. Default off
      (existing configs unchanged).

  - HELP updated with three :tree lines.

Plan §12 deliberately defers the config.lua example block to commit #6
along with the status header bump (R9 single-owner).

Smoke (aish repo cwd):
  - :tree no-arg            -> "33 files (git ls-files)"
  - :tree refresh           -> same
  - :tree off               -> "project tree cleared"
  - :tree 1                 -> rescans
  - cfg.project.auto_tree=true at startup -> auto-injected status visible

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:14:36 +00:00
marfrit d852acadc2 repl: wire #13 secrets — scrub outbound, rehydrate stream + tool args
Plumbs the secrets.lua module (commit e4b818b) into the conversation
pipeline. Hook points:

  ask_ai          — scrub_messages(ctx:to_messages(), mode) before
                    call_broker; rehydrate streamed deltas via
                    streaming_rehydrator so the user sees real values
                    while text_parts accumulates rehydrated chunks
                    (final_resp is plain — CMD: / DELEGATE: extractors
                    see plain values)

  MCP dispatch    — dispatch_tool_call rehydrates the args table before
                    sess:call_tool so the trusted MCP server receives
                    real values (the model emitted placeholders because
                    it saw a scrubbed context)

  DELEGATE: & :delegate
                  — scrub sub_msgs before broker.chat; rehydrate sub_text
                    before appending to context, so future turns see
                    real values restored

  Phase 5 summarize-on-evict
                  — scrub sum_msgs before broker.chat; rehydrate the
                    reply that becomes ctx.summary

  :memory summarize
                  — same scrub + rehydrate pair

Mode resolution per call: model_cfg.redact → config.secrets.default →
"vault+autodetect" if vault loaded, else "off".

ctx storage convention: PLAIN values throughout. The scrub happens at
the egress (broker call) per the active redact mode; ctx.turns never
holds placeholders for content the user typed or executor produced.
The model's own emissions (assistant tool_call arguments) may carry
placeholders because the model saw the scrubbed context — rehydrated
at MCP dispatch and otherwise harmless on re-serialization (idempotent
re-scrubbing).

New meta:
  :secrets [status]         vault entries, placeholders allocated this
                            session, active broker mode. Never prints
                            actual values (vault file is itself a
                            secret per gotcha 7).
  :secrets check <text>     dry-run scrub against the active broker's
                            mode — shows the output transformation.

Documented in config.lua with a commented-out block + per-broker
redact field example.

Deferred to a follow-up issue (clearly scoped):
  - safety.lua broker call sites (Norris main loop, is_destructive
    LLM second-opinion probe) — same wiring pattern, but they don't
    currently see secrets_session; needs threading through helpers.
  - @-mention file content is appended PLAIN to ctx and scrubbed at
    egress alongside the rest of the user turn (covered by the
    ask_ai scrub).
  - exec output streamed live to terminal is pre-scrub (user sees
    real values in their own shell — by design); the captured-for-
    context copy is scrubbed at egress alongside the rest.

This is the "full scope" implementation chosen via AskUserQuestion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:38:23 +00:00
marfrit cdf4e86679 repl: sub-broker delegation via DELEGATE: marker (closes #6)
Cost and context-window control: a "heavy" preset's model can offload
work to a cheaper preset without spending its own tokens on the result.
Example: deep model is mid-conversation and asks fast to summarize a
20k-line build log; the summary comes back as exec-output for the
next turn, deep stays small.

Marker syntax: DELEGATE: <preset> "<prompt>"

(Single or double quotes; one DELEGATE per line; lines without the
quoted shape are dropped — let the user write about delegation in
prose without accidental dispatch.)

Dispatch flow (mirrors CMD: / CMD&: extraction):
  1. ask_ai's stream completes
  2. extract_delegate_lines walks the final response
  3. For each {preset, prompt}: broker.chat(config.models[preset], ...)
     synchronously; result is appended via ctx:append_exec_output as
     "[delegate <preset>]: <result>"
  4. The model sees the delegate result on its next turn

Implementation choice — marker over tool: option 1 from the issue
("inline delegate marker") works with any model regardless of
tool_calls support. Option 2 (aish_delegate as a tool dispatched in
the existing Phase 2 sub-loop) is the better UX for capable models
since it returns the result mid-turn — filed as follow-up if needed.

Meta surface:
  :delegate <preset> <prompt>   one-shot direct invocation (useful for
                                testing without depending on the model
                                emitting DELEGATE:, and as a manual
                                "ask <preset> something" verb)

Scope:
  - Plan mode: emits "PLAN: DELEGATE <preset> <prompt>" without dispatch
  - Norris: not extended; the planner's model anchor would conflict with
    mid-plan switching (R-C3-adjacent risk)
  - No self-delegation guard: each DELEGATE is a separate broker call,
    not recursive; a delegate result reaching the next turn could
    contain another DELEGATE but that's bounded by max_tool_depth-style
    iteration cap on the parent
  - No cost prompt: configuring a paid cloud preset already implies
    consent to spend on it
  - Unknown preset → error status + exec-output note "[delegate X failed:
    unknown preset]"

Extractor unit-tested with 8 cases (single-quote, double-quote, multi-
line prose, empty prompt, no-quotes, case-sensitive, wrong prefix).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:29:09 +00:00
marfrit f94d16fc89 repl: background CMD&: with handle/poll (closes #8)
Builds, long-running network calls, and file watches no longer block
the turn. A new "CMD&: <cmd>" marker (analogue of CMD:) tells the REPL
to spawn the command in the background, return immediately, and poll
for completion between user inputs.

Process model: shell-wrapped to avoid needing fork()/execv() FFI.

  nohup sh -c '(<cmd>) > <log> 2>&1; echo $? > <status>' </dev/null
       >/dev/null 2>&1 & echo $!

The child is reparented to init; we hold only the PID and the path to
the .status sidecar. Completion is detected by the .status file
existing (the wrapper writes it as its last act). No waitpid needed —
the child isn't ours after the popen subshell exits.

Storage: <history.dir>/bg/<id>.log + <id>.status. The directory is
created lazily at startup (mkdir -p). Requires history.dir to be
configured; without it CMD&: emits an error status and the model
sees an "[bg failed to start]" exec-output note.

check_bg_done() runs at the top of each main-loop iteration alongside
check_every_due(). When a job is detected as exited, the REPL:
  - emits a status line "[bg:<id> exited <code>, <bytes>, <secs>s wall] <cmd>"
  - appends the same string to ctx as exec output, so the model sees
    the completion on its next turn (natural follow-up: "ok the build
    finished; let me check the log")

Meta surface:
  :bg-spawn <cmd>       start a bg job directly (no AI needed; also
                        useful for testing without depending on the
                        model emitting CMD&:)
  :bg-list              show running/done jobs (id, pid, state, runtime, cmd)
  :bg-output <id>       dump the log file to stdout
  :bg-kill <id>         SIGTERM (note: only delivers if the PID is
                        still the actual command — long-lived shells
                        may need pkill by name)

Scope (deliberately limited for v1):
  - No callback-mode readline: bg completion detection is pre-prompt,
    not mid-readline. If a build finishes while the user is typing,
    notification comes when they hit Enter.
  - Permission policy DSL (#9) does NOT apply to CMD&: — the
    asynchronous gating model wasn't designed for the y/N flow.
    Filed as follow-up if needed.
  - Norris not extended: helpers.exec_cmd is still synchronous; the
    planner doesn't dispatch bg jobs.
  - Plan mode interaction: CMD&: in plan mode emits "PLAN: & <cmd>"
    and a "[plan] would bg-run: <cmd>" exec-output note, no spawn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:25:55 +00:00
marfrit 67d80e1047 repl: :every recurring prompts via pre-prompt due-check (closes #11)
In-session timer that re-injects a prompt every N seconds. "Watch this
thing" workflows (`:every 5m "check journalctl -u nginx for errors"`)
without spawning a separate aish process.

Approach: minimum viable. check_every_due() runs at the top of each
main-loop iteration — timers fire BETWEEN user inputs, not during
readline waits or active broker calls. Mid-stream firing would require
rewriting ffi/readline to callback mode (substantial scope). If the
on-the-fly firing requirement matters in practice it can land as a
follow-up issue against the readline FFI.

Meta:
  :every <interval> <prompt>   schedule (interval: 30s | 5m | 2h | bare int)
  :every list                  show jobs (id, interval, time-until-next, model, prompt)
  :every cancel <id>           remove

Defaults:
  - Model: "fast" preset if defined in config.models, else active model
    (per the issue's "recurring prompts should default to fast preset").
  - In-memory only — jobs don't persist across restarts.
  - Suppressed while ctx.norris_active (planner stays on goal anchor).
  - Quotes around the prompt are stripped if present.
  - Each tick fires the job once, re-schedules next_fire = now + interval
    (no catch-up if the interval elapsed multiple times during a long
    user input).

Tested: 11 interval-parse cases (30s, 5m, 2h, bare int, malformed),
load via require, end-to-end :every list / cancel surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:23:07 +00:00
marfrit 17e62c0326 safety: permission policy DSL — allow/confirm/deny rule lists (closes #9)
The confirm_cmd boolean was too coarse: true interrupts every harmless
ls; false ungates everything. Most workflows want trust for read-only
ops while still gating writes/network/sudo.

New config:

    permissions = {
        allow   = { "^ls%s", "^cat%s", "^git status" },
        confirm = { "^rm%s", "^git push", "^docker%s", "^sudo%s" },
        deny    = { "^ssh%s+root@", "^curl%s+http[^s]" },
    }

Verdict order: deny > confirm > allow. First match in the chosen
category wins. Unmatched defaults to "confirm". Patterns are Lua
patterns (not regex) per PHASE0.md §3 — no compiled extensions.

Verdict behavior in the interactive CMD: loop:
  - allow   → run without prompt
  - deny    → status line, skip
  - confirm → [y/N] prompt (same UX as legacy confirm_cmd=true)

Backward compat:
  - permissions unset + confirm_cmd=true  → always confirm
  - permissions unset + confirm_cmd=false → always allow
  - permissions set                        → policy table is authoritative

Scope deliberately limited to the interactive AI-suggested CMD: gate.
Norris autonomous mode keeps its own safety.is_destructive machinery
(combining the two would double-gate or replace the LLM probe — both
non-obvious behavioral changes that belong in their own issues).
User-typed shell-routed lines (`router.classify → "shell"`) and
:exec also bypass the policy by design — those are direct user intent.

New introspection:
  :perms list           — show the configured rule lists
  :perms check <cmd>    — report verdict + matching rule (debug)

safety.classify_command is exported and unit-tested with 12 cases
covering each category, priority order (deny > allow on overlap),
and both fallback paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:20:56 +00:00
marfrit 518c01a9f5 repl: user-defined skills loader (closes #2)
PHASE0.md §5.2 froze the meta-command set at compile time. Skills let
the user package repeatable workflows (project queries, prompt
templates, audit routines) without forking aish.

Discovery: scan ~/.config/aish/skills/*.lua at startup (or whatever
$AISH_SKILLS_DIR points at — used both by users with non-XDG layouts
and by CI). Each module exports:

    return {
        name        = "<meta-cmd-name>",     -- must match [%w_-]+
        description = "<one-line>",          -- shown by :skills
        run         = function(args, h) ... end,
    }

Helpers passed to run():
    h.ask(text)   — same path as :ask (with @path expansion)
    h.status(s)   — emit "[aish] s"
    h.exec(cmd)   — run a shell command (subject to plan_mode, hooks)
    h.model()     — current active model name
    h.ctx         — raw Context object (advanced)
    h.config      — the loaded config table

Validation rejects modules that miss name/run, use whitespace in the
name, or collide with an existing meta command (built-in or earlier
skill). Each rejection emits a status line so the user sees why a
skill didn't appear.

New meta command :skills lists what's loaded (sorted, with description).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:17:00 +00:00
marfrit fb15f7a690 repl: pre/post CMD hooks via config.hooks (closes #3)
Optional shell scripts trigger around every CMD: execution. Use cases:
audit logging, auto-format-after-edit, custom safety gates beyond the
existing confirm_cmd boolean.

Config shape:

    hooks = {
        pre_cmd  = "/path/to/pre-script",
        post_cmd = "/path/to/post-script",
    }

Contract per hook invocation:
  - The command line is piped to the hook on stdin.
  - Env vars: AISH_CMD (the command), AISH_TURN (#ctx.turns at the
    moment of dispatch), AISH_CWD (libc.getcwd() result).
  - Hook stdout is streamed live to the terminal via executor.exec
    (so the user sees its output regardless of exit status).

Pre-hook: non-zero exit aborts the command and emits a status line
including the exit code. last_exec_code is set to the hook's exit
so the {last_status} prompt template variable reflects the abort.

Post-hook: exit code is ignored (the spec says so); only the visible
stdout matters. Runs after the command's exec_end frame.

Tested with success, abort, and stdin-matches-env paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:16:11 +00:00
marfrit ce1378edee repl: fix {name} pattern to accept underscores (#10 follow-up)
%w excludes underscore in Lua patterns, so {ctx_used}, {ctx_max},
{cwd_short}, {last_status} were left literal in the prompt. Use
[%w_] to accept identifiers with underscores.

Surfaced during higgs smoke test of the new template.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:14:57 +00:00
marfrit d738f339cb repl: configurable prompt template via config.shell.prompt (closes #10)
At-a-glance situational awareness: see the active model, context fill,
mode flags, and cwd in the prompt itself — prevents "wait, am I still
in plan mode?" surprises.

Example config:

    shell = {
        prompt = "[{model} {ctx_used}/{ctx_max}t T{turn} {mode}] {cwd_short} > ",
    }

Variables (substituted via {name}):
  {model}        active preset name
  {ctx_used}     char/4 token heuristic (Phase 0 §8; accurate is Q1)
  {ctx_max}      config.context.token_budget
  {turn}         #ctx.turns
  {cwd}          libc.getcwd() (chdir-aware; PWD env may drift)
  {cwd_short}    cwd with $HOME -> ~
  {last_status}  last exec exit code, "" if none yet
  {mode}         "norris" | "plan" | "normal"

Default behavior unchanged when shell.prompt is unset — keeps the
"[aish:<model>]>" form with norris  and plan markers.

Side wiring:
  - ffi/libc.lua gains getcwd() (chdir() doesn't update PWD).
  - run_shell records exit code into last_exec_code for {last_status}.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:14:43 +00:00
marfrit 10d2501cff repl: peel trailing punctuation from @path mentions (#7 follow-up)
Natural-language prose like "look at @README.md, then..." or
"@foo.lua." at sentence end previously failed to expand because the
trailing comma/period was included in the path.

Now: if the raw token doesn't resolve, peel trailing chars from
[.,;:?!)] one at a time until the path resolves or no more peels are
possible. On success, the peeled chars are emitted verbatim AFTER the
closing fence so the original punctuation is preserved.

Surfaced during higgs smoke test (TC: "say the first line of
@README.md, then stop" — the trailing comma broke resolution).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:11:22 +00:00
marfrit bb374c2ad2 repl: @path mention expansion in input lines (closes #7)
Saves the user from manual copy/paste: typing "show me @repl.lua" or
"compare @config.lua and @config.example.lua" auto-expands each mention
to a fenced code block carrying the file contents, language-tagged by
extension, and feeds the composed text to the broker.

Wired on the "ai" branch of the input loop and inside :ask. Meta and
shell branches pass through unchanged — "@foo" in shell context is a
literal program argument; meta commands store text verbatim.

Trigger rule: "@" must follow start-of-string or whitespace — avoids
false positives on email addresses ("user@example.com") and shell
short-options. Path extends to next whitespace.

Other behavior:
  - Language tag derived from extension via a small lookup; unknown
    extensions yield an untagged fence.
  - Files over 32 KB are truncated head/tail (16K + 8K) with a marker.
  - Missing files leave the literal "@path" token in place and emit
    a "[aish] @path: not found" status — non-fatal, lets the user
    correct the path and re-type.
  - Each successful expansion emits "[aish] @path expanded (N bytes
    [, truncated])" so the user sees what was inlined.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:10:54 +00:00
marfrit dccd9e90cc repl: :plan toggle — CMD: lines become PLAN: notes (closes #5)
Plan mode is a safer entry point than going straight to Norris: the user
iterates with the model on what to do, sees each CMD: as a PLAN: line,
and the would-have-run notes feed back into the next-turn context so the
model can refine without side effects.

Toggle with :plan (flip), :plan on, :plan off. Off by default.

When plan_mode is true:
  - CMD: lines extracted from the assistant turn print as "PLAN: <cmd>"
  - The note "[plan] would run: <cmd>" is appended via the existing
    append_exec_output channel — same context flow as a real exec, so
    the model sees its proposed action on the next turn.
  - run_shell is NOT called; no executor, no cd intercept, no capture.

The prompt shows "[aish:<model> plan]>" while active (mirrors the
norris  marker convention).

Orthogonal to Norris: plan_mode only gates the interactive CMD:
extraction path. Norris has its own halt protocol; combining them is
not supported (the planner would be confused by skipped actions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:09:08 +00:00
marfrit 0700dce881 repl: enforce budget per Norris step, not just post-loop (closes #51)
PHASE3.md §2 specifies sliding-window eviction "including mid-Norris-
session if the loop runs long". Implementation only called
enforce_budget() once, after the planning loop exited — so for a tight
max_turns with a multi-step Norris session the model saw the FULL
conversation throughout, defeating context budgeting and preventing
R-C3 (NORRIS suffix goal anchor surviving eviction) from being
exercised end-to-end.

Move status_evictions(ctx:enforce_budget()) inside the while loop so
it runs after every safety.norris_step return. Drop the now-redundant
post-loop call.

Surfaced during TC #38 (Qwen3-30B-A3B, max_turns=4) where the
"oldest 4 turns evicted" status arrived AFTER NORRIS DONE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:05:34 +00:00
marfrit 0c93e31186 repl: warn on stale MCP auto_approve keys (closes #33)
Auto-approve policy keys that point at unconnected aliases, mistyped
tool names, or malformed forms were silently ignored — leaving the user
with surprise confirm prompts and no diagnostic.

validate_auto_approve() now walks config.mcp.auto_approve at startup
(after the MCP connect loop) and after each :mcp connect. For each key:

  - "alias__*"       — warn if alias has no live session
  - "alias__tool"    — warn if alias unknown OR tool not in registry
  - anything else    — warn as malformed (not in alias__tool form)

Non-fatal. The re-run on :mcp connect lets a key that referenced a
not-yet-connected alias become live without a restart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:05:08 +00:00
marfrit 299dcce78f repl: validate MCP tool names against Bedrock regex (closes #32)
Anthropic-via-Bedrock enforces ^[a-zA-Z0-9_-]{1,128}$ on tool names.
We already moved the alias separator from "." to "__" (commit f26cbd9),
but a future MCP server could still register a tool whose name (or whose
combination with the alias) contains characters outside that class —
silently breaking calls to strict providers.

connect_mcp now warns at startup for:
  - aliases containing "__" (would misparse on tool dispatch)
  - emitted alias__name strings that violate the regex or exceed 128 chars

Behavior preserved: validation is informative-only. tools_schema() still
emits the offending tool; local llama.cpp users accept lenient names
and shouldn't be penalized for downstream strictness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:04:29 +00:00
marfrit 8e0e735e15 repl: fallback patterns — add 'Could not connect to server' (CURLE_COULDNT_CONNECT)
Surfaced by autonomous run of TC #48: pointing models.fast at
http://localhost:9999 (port closed, host resolves) emits
"transport: Could not connect to server" — CURLE_COULDNT_CONNECT
(7) which the Phase 5 fallback pattern set didn't include.

Added "Could not connect to server" to FALLBACK_PATTERNS in repl.lua.
Now fallback fires for the full set of common libcurl/HTTP transport
failure shapes:

  HTTP 5xx              server-side
  HTTP 404 model_not_found
  HTTP 408              gateway request timeout
  Couldn't resolve host CURLE_COULDNT_RESOLVE_HOST
  Could not connect to server   CURLE_COULDNT_CONNECT  (← added)
  Connection refused
  Timeout was reached   CURLE_OPERATION_TIMEDOUT (variant A)
  Operation timed out   CURLE_OPERATION_TIMEDOUT (variant B)

Re-tested #48 end-to-end:
  fast pointed at dead port → fast fails → status fires →
  cloud (anthropic/claude-haiku-4.5 via openrouter) responds normally

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:49:13 +00:00
marfrit 40ea0b49b0 repl: routing + fallback + summarize_fn wiring (Phase 5 commit #3)
Phase 5 commit #3 per docs/PHASE5.md §3 / §11. Wires the Phase 5
machinery into the REPL.

make_summarize_fn():
  Returns a closure that maps (prior_summary, evicted_turns) onto a
  broker.chat call against cfg.context.summarizer_model (default
  "fast"). Three dispatch paths matching the R-B1 callback contract:
    evicted == nil      → compress signal
    prior present       → additive ("extend the prior summary ...")
    prior nil           → first-time ("summarize the following turns")
  All use a system prompt enforcing "exactly one short paragraph",
  max_tokens=300, timeout_ms=30000. Broker failure returns nil so
  Context falls back to silent eviction. Renderer status is logged
  on failure for visibility.

Context construction:
  Build ctx_opts as a fresh table (copies config.context to avoid
  mutating it), adds summarize_fn ONLY when
  config.context.summarize_on_evict == true. Defaults stay OFF —
  Phase 4 regression coverage.

Fallback machinery:
  - FALLBACK_PATTERNS table with 7 transport-error signatures
    (HTTP 5xx, 408, 404-model_not_found, DNS, connection refused,
    "Timeout was reached", "Operation timed out")
  - fallback_reason(err) strips the "transport: " prefix and matches.
  - should_fallback(err) gates on cfg.routing.fallback.
  - call_broker(cfg, name, msgs, on_delta, opts) wraps
    broker.chat_stream:
      • tracks any_delta via wrapped on_delta callback
      • retries ONCE against cfg.routing.fallback_model (default
        "cloud") when err matches AND no deltas arrived (N3:
        mid-stream failures aren't retried — partial text would
        duplicate)
      • emits "[aish] local <name> failed (<reason>); retrying via
        <fb>" status before the retry call

ask_ai routing:
  - Routing decision taken ONCE on entry (R-C2). req_name/req_cfg
    locals carry the choice through every tool-sub-loop iteration.
  - active_name/active_cfg are NOT mutated — user's :model selection
    survives the request.
  - When config.routing.auto is true, classify_model(text, config) is
    invoked. Non-nil model + non-active → swap req_cfg + status line.
  - broker.chat_stream call replaced with call_broker (fallback wrap).

Meta cmds:
  :route on/off           — toggle cfg.routing.auto at runtime
  :route classes          — show class → model mapping
  :route check <text>     — report classify_model result with
                            "(routing currently disabled)" suffix when
                            auto is off (N1)
  :fallback on/off        — toggle cfg.routing.fallback at runtime

HELP updated with the four new commands.

Smoke-tested: aish boots, all four metas behave correctly, classify_model
returns reasoning class for "Explain how MMAP works on Linux" (the model
slot is nil because no classes are configured by default — N2 cost-safety).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:31:14 +00:00
marfrit f22d21d754 repl: :memory summarize — LLM candidate extraction (Phase 4 commit #4)
Phase 4 commit #4 per docs/PHASE4.md §6.

:memory summarize:
  1. Source-of-truth: session log file via history.load(session_path),
     NOT ctx:to_messages() (R-C2). Skips turns tagged meta="summarize"
     so prior summarize exchanges don't self-amplify across multiple
     calls within the same session.
  2. Pick summarizer model from cfg.memory.summarizer_model (default
     active model).
  3. Build a transcript string ("role: content" per turn, 800 chars max
     per turn) and feed it as a single user turn alongside a system
     instruction asking for "(fact|pref|context): <content>" lines.
  4. broker.chat with max_tokens=1024 + timeout_ms=90000 (the deep
     model can take a while; we don't want a 15s probe-cap here).
  5. Log the response as an assistant turn with meta="summarize" so the
     next :memory summarize call filters it out.
  6. Parse response lines tolerating markdown bullets and bold markup:
     ^%s*[-*]?%s*[*_]*(fact|pref|context)[*_]*:%s*(.+)$
  7. Per-candidate prompt: y / N / edit.
       y    → memory:add(kind, content)
       edit → readline prompt for replacement text
       any other → drop
  8. status: "summarize: added N / M candidates".

Live-tested against hossenfelder/fast:
  Pipeline correct end-to-end. Model emitted one candidate; user
  confirmation prompt fired; item persisted; :memory list showed it.
  Candidate quality from the 1.5B model is poor — typical
  small-model behavior; deep/cloud models would do better but this
  isn't an aish bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 07:53:36 +00:00
marfrit 3b074afaee repl: memory handle + :remember + :memory meta (Phase 4 commit #3)
Phase 4 commit #3 per docs/PHASE4.md §12. End-to-end memory wiring.

Startup:
  - Opens memory handle at <history.dir>/memory.jsonl via
    history.open_memory(). Status-logs failure (e.g. flock held by
    another aish) and continues without memory.
  - inject_memory(): loads via history.load_memory(), truncates by
    cfg.memory.inject_max_chars (default 2000), populates
    ctx.memory_items. Status line announces N items injected.
  - shutdown_session() now also closes memory (releases flock).

Meta commands:
  :remember <text>       — shortcut for :memory add fact <text>;
                            auto-refreshes ctx.memory_items so the
                            next AI turn sees the new item without
                            restart
  :memory list           — show id / ts / kind / content (truncated
                            at 80 chars per line)
  :memory add <kind> <t> — fact|pref|context required; rejects other
                            kinds
  :memory forget <id>    — N1: checks active-set first, surfaces
                            "id N not active (already forgotten or
                            never existed)" without appending if
                            the id isn't live
  :memory clear          — [y/N] confirm prompt; tombstones every
                            active item
  :memory inject         — N4: reload memory.jsonl into
                            ctx.memory_items, replacing existing.
                            Useful after manual file edits.

Help block extended with the new commands.

End-to-end verified:
  Boot 1 → :remember×2 + :memory add → 3 items, :memory list shows all
           three with timestamps
  Boot 2 → memory: 3 items injected (startup status); :memory list
           same three; ctx.turns empty (history is sessions/, memory
           is separate)
  Boot 3 → :memory forget 2 succeeds; :memory forget 99 → "not
           active" status without writing a tombstone; :memory list
           shows 2 items; :memory clear → confirm prompt → "cleared 2
           items"; :memory list → "(no memory items)"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 05:11:48 +00:00
marfrit a404b2a152 repl: Norris driver + \C-n + :norris/:safety meta (Phase 3 commit #5)
Phase 3 commit #5 per docs/PHASE3.md §12. Wires safety.norris_step
(commit #4) into the REPL with the user-facing surface.

ffi/readline.lua extensions (A1 + R-C4):
  - rl_insert_text + rl_redisplay added to ffi.cdef block; M.insert_text
    and M.redisplay wrappers exposed.
  - M.bind: removed `:free()` on previous callback. Now keeps every
    bound callback pinned for process lifetime in `_pinned` list
    (alongside `_bound[seq]` for current lookup). Avoids the
    use-after-free window between unbind and rebind that R-C4 flagged.
    Memory cost is bounded — one closure per key sequence binding.

context.lua Norris suffix (R-C3 / §8):
  - to_messages() composes a dynamic NORRIS MODE block onto the
    system prompt when ctx.norris_active is set. The block carries
    ctx.norris_goal so eviction of the user's "[norris] goal:" turn
    doesn't lose the anchor. Returns to plain system prompt when
    Norris exits.

repl.lua Norris driver:
  - prompt() now shows  marker when ctx.norris_active per PHASE0.md §9.
  - \C-n bound to a real handler — inserts ":norris " at the cursor
    (replaces Phase 1 status placeholder).
  - run_norris(goal) function: sets norris_active + norris_goal,
    appends a "[norris] <goal>" user turn, renders the banner, then
    loops calling safety.norris_step with an injected helpers table
    until a terminal status returns. Renders the closing banner.
  - norris_halt(): the [N] proceed/skip/abort prompt called by
    safety.norris_step via helpers.halt. Empty input → abort (safe).
  - dispatch_tool(): factored from the Phase 2 ask_ai code so
    safety.norris_step can call it.
  - norris_exec(): factored exec path for autonomous mode (skips
    the interactive run_shell cd-status renderer).
  - :norris <goal>  meta — launches autonomous mode
  - :norris off     meta — drops Norris flag (rare; usually 'abort')
  - :safety patterns meta — lists active is_destructive rules
  - :safety check <cmd> meta — probes a hypothetical command

End-to-end mock-driven test:
  Submitted ":norris find files in /tmp" → banner → step 1 emits
  tool_call (auto_approved per policy) → dispatched → frame rendered
  → step 2 emits "GOAL: complete" → sub-loop exits → DONE banner.
  2 broker invocations, no stalls.

config.lua safety example block lands in commit #6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:42:14 +00:00
marfrit f26cbd9a3a phase2 amend: __ separator (Bedrock-safe) + post_sse error diagnostics
Phase 7 verify finding from TC #26 against :model cloud:
  HTTP 400 from openrouter→Amazon Bedrock:
  "tools.0.custom.name: String should match pattern
   '^[a-zA-Z0-9_-]{1,128}$'"

Anthropic via Bedrock validates tool names against that regex and
rejects dots. PHASE2 originally chose "." as the namespace separator
("boltzmann.list_dir"); OpenAI tolerated it, Bedrock does not.

Separator switched to "__" (two underscores) everywhere — internal
API matches on-wire shape, no transformation layer:

  - repl.lua:
    - tools_schema builds "alias__name"
    - dispatch_tool_call splits via "^(.-)__(.+)$" (non-greedy → leftmost __)
    - :mcp tool parser uses same split
    - :mcp tools formatter prints "alias__name"
    - HELP block shows <alias__name>
  - safety.lua confirm_tool_call: alias.* glob → alias__* glob
  - config.lua example block: keys rewritten
  - docs/PHASE2.md: amendment header added; §1, §2 row, §3 config.lua
    row, §5 wire-shape JSON examples, §6 auto_approve schema, §7
    meta-cmd table, §12 plan all updated. Original "." references
    preserved in commit history.

Constraint: aliases must not themselves contain "__" so the parse
stays unambiguous. Tool names from MCP servers may have underscores
freely.

Second fix bundled — uninformative broker error:
  Previously "broker error: transport: HTTP response code said error"
  Now      "broker error: transport: HTTP 400: {full body snippet}"

ffi/curl.lua M.post_sse changes:
  - FAILONERROR no longer set (was hiding the response body).
  - raw_body accumulator added alongside the SSE buffer; captures
    every byte regardless of SSE shape.
  - After perform, check status_code via curl_easy_getinfo. On >=400,
    return (nil, "HTTP <code>: <body[:400]>"). 2xx unchanged.
  - End-of-stream SSE flush only runs on 2xx (no false event on
    error bodies that aren't SSE-shaped).
  - Phase 1 callers reading just first return slot stay correct.

End-to-end verified:
  - :model cloud + tools=[boltzmann__read_file ...] +
    "Use boltzmann__read_file with path=/etc/hostname" →
    Claude emits tool_call with name="boltzmann__read_file",
    args='{"path": "/etc/hostname"}'. ok=true, transport clean.
  - Force-bad tool name "bad.name.with.dots" → err string carries
    the full bedrock 400 with the regex-pattern message visible.

TC #26 (sub-loop end-to-end) is now testable against cloud — the
error that blocked it is resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:04:57 +00:00
marfrit 3fa6279f5b repl: :mcp tool — disambiguate "no alias" vs "unknown alias" errors
Surfaced by Phase 7 verify test case #29: typing :mcp tool list_dir (no
dot) printed "unknown alias: nil" instead of a useful diagnostic. The
parse failure was being conflated with the alias-not-found case.

Now:
  :mcp tool list_dir          -> tool name missing alias prefix: list_dir
  :mcp tool unknown_alias.x   -> unknown alias: unknown_alias
  :mcp tool known_alias.bogus -> unknown tool: known_alias.bogus

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 18:55:01 +00:00
marfrit 7e9cfff04d repl: tool-call sub-loop + :mcp meta + system-prompt augmentation
Phase 2 commit #6 per docs/PHASE2.md §12. End-to-end wiring of the MCP
tool-call flow on top of broker/safety/context/renderer/mcp.

repl.lua additions:
  - mcp_sessions table populated from config.mcp.servers at startup.
    connect_mcp() helper does initialize + caches tools/list. Failures
    status-logged once; absent from mcp_sessions until manual reconnect
    (C4 — no auto-retry).
  - tools_schema() flattens connected sessions' tools into the OpenAI
    {type:"function", function:{name,description,parameters}} shape with
    "<alias>.<name>" namespacing.
  - flatten_content() concatenates content[type="text"] blocks; one-shot
    status warning when non-text blocks (image/resource) are dropped
    (§4 normative spec, v1 only handles text).
  - dispatch_tool_call(name, args_table) splits alias.tool, looks up
    session, calls. Returns (content_string, is_error). Errors of every
    flavor (missing alias, no session, rpc_error, transport_error)
    yield a synthesized "[aish] ..." string so callers always have a
    body for the role:"tool" turn — alternation preserved per C5/C7.
  - ask_ai rewritten as a sub-loop that re-issues the broker request
    until the model returns pure text or max_tool_depth (default 8) is
    hit. Each iteration: stream response → if tool_calls present,
    confirm-gate each → dispatch → append role:"tool" turn → continue.
    Argument-JSON parse failure produces a synthesized tool turn (C7).
    Decline at confirm produces "[aish] tool call declined by user"
    tool turn (alternation guarantee).
  - :mcp meta with sub-commands: list / tools / tool <a.n> / connect
    <url> [alias] / disconnect <alias>. HELP block extended.

context.lua: DEFAULT_SYSTEM_PROMPT grows by ~4 lines per PHASE2.md §8
(hybrid prompt: static frame about MCP + dynamic tools list in the
request body). Block is always present even when no MCP servers
configured — ~60 tokens for clarity that 'CMD:' remains the fallback.

CMD: extraction unchanged — runs on the FINAL pure-text response only
(not on intermediate iterations of the tool sub-loop). Substrate §3
invariant preserved.

End-to-end verified two ways:
  (1) Direct broker probe: aish's tools_schema fed through
      broker.chat_stream against hossenfelder → qwen-1.5b emits one
      tool_call payload with correct id + name="boltzmann.list_dir"
      + args='{"path":"/tmp"}'. Accumulator stitched the JSON-string
      across fragmented deltas.
  (2) Mocked-broker sub-loop test: ask_ai feeds 'list /tmp', mock
      emits text + tool_call, sub-loop dispatches against LIVE
      boltzmann lmcp (auto_approve via policy), 80+ files rendered
      inside the tool_call frame, broker re-invoked with the
      extended context, mock returns pure text, sub-loop terminates.
      Total broker invocations: 2.

Known: the loaded fast model (qwen-1.5b) tends to emit "CMD: ..."
suggestions even when an MCP tool is the better path; the small
model's system-prompt compliance is weak. Larger models and the
analyze-time direct probe confirm the tools_schema and tool_calls
flow is wire-correct — Phase 7 verify will exercise this against
qwen3-30b or cloud models when available.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:20:42 +00:00
marfrit efdc7281c7 broker: opts.tools passthrough + streaming tool_call accumulator
Phase 2 commit #5 per docs/PHASE2.md §12. Streaming broker grows
tool-call support without taking a dependency on mcp.lua (caller
supplies the tools array — B5 from review).

chat_stream signature widens to (cfg, msgs, on_delta, opts):
  opts.tools  - optional array, passed to the request body as the
                OpenAI-shape tools field. OMITTED entirely when nil or
                empty (#tools == 0) — some servers reject "tools": [].

on_delta callback shape widens to (kind, payload):
  kind = "text",      payload = string         (Phase 1 path; unchanged
                                                semantics, signature
                                                changes from (delta) to
                                                ("text", delta))
  kind = "tool_call", payload = {id, name, arguments}
                                                emitted ONCE per call on
                                                finish_reason "tool_calls"
                                                after the streaming
                                                accumulator pulls
                                                fragmented JSON-string
                                                arguments together.

Accumulator behavior:
  - Keyed by delta.tool_calls[i].index.
  - If index is absent on a delta (some llama.cpp builds omit it on
    single-call streams; C2 in review), default to 0 with a one-shot
    stderr debug status per stream.
  - id and name captured from the opening delta of each slot.
  - function.arguments concatenated across all deltas as the raw
    JSON-string; caller (repl.lua / future Phase 2 commit #6) does
    dkjson.decode.
  - On finish_reason "tool_calls" the accumulator emits all collected
    calls in index order and resets.

M.chat external contract unchanged (C1): wrapper now uses the new
(kind, payload) shape internally but exposes the same text-string
return. No caller of M.chat passes opts.tools so tool_call kinds are
silently dropped.

repl.lua minimal companion edit: ask_ai's chat_stream callback updated
to the new shape. Text path unchanged; tool_call kinds are no-op
placeholders until commit #6 lands the sub-loop. Keeps Phase 1 streaming
functional between #5 and #6.

Smoke-tested against hossenfelder/8082 (post-#23 fix):
  - text-only: ok=true, kind="text" deltas received
  - with opts.tools: model emitted one tool_call,
    accumulator collected id + name=get_weather + args={"city":"Paris"}
    correctly across fragmented deltas
  - opts.tools={}: server accepted (field omitted as required)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:20:32 +00:00
marfrit 7d62eb5659 review followups: pcall shield, :resume guard, shell quoting, nits
CONCERNs from the Phase 1 review pass:

ffi/curl.lua:
  - SSE write_cb body is now pcall-wrapped. A Lua error in on_event (or
    in the parse loop itself) is captured into cb_error and surfaced
    after curl_easy_perform rather than propagating across the FFI
    callback boundary (which LuaJIT documents as process-fatal). The
    EOS flush path gets the same shield. Errors return
    (nil, "callback: <msg>") from post_sse.

history.lua:
  - sh_singlequote() escapes shell metacharacters; the mkdir -p and
    ls -1 shell-outs no longer double-quote (where $(...) and $VAR
    still expand) — single-quote with embedded-' escaping is the
    safe form.
  - M.load now returns (turns, meta) instead of (meta, turns). turns
    is ALWAYS a table on success, never nil-when-no-header; failure
    path is the unambiguous (nil, err). Callers can `if not turns
    then` without the previous ambiguity. repl.lua :resume updated
    to the new shape.

repl.lua :resume:
  - Refuse to resume into a non-empty ctx — silent overwrite was the
    Q15 default, but the review surfaced the no-undo / no-warning
    failure mode. User must :reset (or :save then re-launch) to
    express intent. The current session's on-disk log is unaffected
    either way.

NITs:
  - ffi/libc.lua READ_BUF: comment noting it's module-shared and
    Phase 1 has no reentrant readers; revisit when that changes.
  - PHASE1.md §7: \C-x\C-c reservation pinned to Phase 3 ("deferred
    from Phase 1 — no consumer here") rather than the previous
    dangling "(or here)".

Regression suite verifies:
  - history.load new signature on success + failure paths
  - shell-quoted history.dir with $ doesn't trip
  - aish scripted run: ctx with 2 turns refuses :resume anchor with
    a clear status; user must :reset first

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:05:23 +00:00
marfrit 1f1065157e review BLOCKER: PTY input forwarding + raw mode toggle
Phase 1 review caught a structural gap: executor.exec only drained the
PTY master fd, never forwarded user keystrokes — vim/less/htop/nano
would render and hang on input. PHASE1.md §5 specified bidirectional
multiplex but only the read leg landed. tcgetattr/tcsetattr were also
missing, so even with input forwarding the parent's line discipline
would buffer until newline (breaking single-key UIs).

ffi/libc:
  - struct termios opaque buffer + tcgetattr/tcsetattr + cfmakeraw
  - M.set_raw(fd) saves termios + applies cfmakeraw; returns saved or
    (nil, err) when fd isn't a tty (scripted / piped-stdin runs)
  - M.restore_termios(fd, saved)
  - struct pollfd + M.poll (POLLIN constant)

executor:
  - multiplex(sess): poll(stdin, master); reads master on any revents
    (POLLHUP fires when child closes its slave end, not POLLIN — the
    revents != 0 check catches both); forwards stdin keystrokes to
    master; loop exits when master read returns 0 (EOF / child gone)
  - stdin polling is only enabled when stdin_is_tty (set_raw succeeded);
    piped-stdin runs (tests / scripted) would otherwise drain queued
    aish commands into the child of the *current* cmd, swallowing them
  - raw mode is restored before returning so the user lands back at the
    aish prompt in canonical mode

renderer + repl:
  - exec_output(out, code) split into exec_begin() (top rule, before
    spawn) + exec_end(code) (closing rule with exit, after wait). PTY
    multiplex streams the body live to stdout in between; the renderer
    never re-prints the body.

PHASE1.md §3:
  - tcgetattr/tcsetattr changed from "optional" to "required for
    single-key UIs to work — done-criteria #2"; poll added to the libc
    row description.

Verified:
  - non-interactive smoke (echo / false / exit 7 / ls /nonexistent /
    printf multi-line) — all exit codes correct, output streamed live,
    a\nb\nc\n preserved byte-for-byte
  - scripted-stdin run reaches all expected lines (no stdin draining
    into a non-interactive child)
  - aish prompt + framed exec block + exit-code line all render in
    correct order

Live interactive verification (vim / less / htop in a real terminal)
still needs a user-test pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:00:53 +00:00
marfrit a75118b2ae readline: bind() via rl_bind_keyseq; repl reserves \C-n no-op
Phase 1 readline binding wiring per PHASE1.md §7.

ffi/readline:
  M.bind(seq, lua_fn) -> bool
    Wraps lua_fn as a C callback (signature `int (int, int)` per
    readline's rl_command_func_t) and registers it via
    rl_bind_keyseq(seq, cb). Returns true on success (rl returns 0).
    Trampolines are pinned in module-local state so they outlive the
    bind call — readline retains the function pointer for the process
    lifetime. Rebinding the same seq frees the previous trampoline.
    Bound handlers are pcall-wrapped so a Lua error doesn't crash
    readline's input loop.

repl:
  Binds \C-n to a no-op that emits
    "[aish] Norris mode not yet implemented (Phase 3)"
  Verifies the mechanism end-to-end; Phase 3 (Norris autonomous mode)
  replaces the body with the actual toggle.

Smoke covers bind / rebind-same-seq (exercises the :free path) /
bind-different-seq with no errors. Live keyboard verification waits
on user-test.

Phase 1's 8(+1) inner loop is now functionally through `implement`;
next inner phase is `verify` (review pass) followed by memory-update.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 19:26:58 +00:00
marfrit 9d586870e8 repl: session persistence wiring — auto-log, :save, :resume, :sessions
Phase 1 session log integration per PHASE1.md §6.

On every M.run(), open a session file at
  <config.history.dir>/sessions/<utc-iso8601>.jsonl
with a meta header (started, model, aish_version). If history.dir is
unset or unwritable, status-log the disable and continue without
persistence.

ask_ai logs the merged user turn (after pending exec output is folded
in) and the assistant turn (after streaming completes). run_shell does
NOT log [exec output] — that becomes part of the next user turn when
ctx.pending_exec_output is flushed.

New meta commands:
  :sessions       list session files; "*" marks the active one
  :save <name>    rename current session log to <name>.jsonl (auto-
                  appends .jsonl); reopens for continued append
  :resume <name>  load <name>.jsonl into ctx (replaces current turns
                  via ctx:reset + append loop). The current process's
                  own session log is unaffected — Phase 1 chooses
                  per-process logs over chained continuations.

:quit and EOF (Ctrl-D) both close the session file via shutdown_session
before exiting.

HELP text updated (no longer "Phase 0:" header since meta set has
grown). Q15 noted in PHASE1.md §10 (resume into non-empty context) is
resolved by the ctx:reset() in :resume — silent overwrite for Phase 1,
revisit if anyone cares.

End-to-end live verified: chat -> auto-log; :save renames; :sessions
listings; :resume + :history shows the round-trip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 19:23:05 +00:00
marfrit a722f576ac repl + renderer: streaming assistant output (Phase 1)
repl.ask_ai now drives broker.chat_stream and pumps each delta into
renderer.assistant_delta(delta) as it arrives. renderer.assistant_flush
is called when the stream ends to add a trailing newline if missing.
The full reassembled response is then handed to executor.extract_cmd_lines
for the CMD: confirm-and-execute path (unchanged from Phase 0).

renderer.assistant() is kept for non-streaming callers (none in tree
right now, but cheap to keep around). assistant_delta/flush share no
state with assistant(); they use a module-local stream_buf that tracks
the in-progress streamed block.

Q12 deferred: incremental CMD: highlighting (cursor-positioning re-
render on flush) is not implemented in Phase 1 — deltas emit raw. The
§6 CMD: marker is still extractable on the reassembled string post-
stream, which is what executor cares about. Renderer's bold+cyan
treatment for CMD: lines stays available via M.assistant().

Broker error / SSE-framed api-error path still pops the user turn and
restores ctx.pending_exec_output. Order: assistant_flush always runs
(even on error) so the cursor lands on a fresh line before the broker-
error status renders.

Live verification: `Count one to ten` against hossenfelder fast streams
deltas through to stdout incrementally; CMD: extraction works on the
reassembled string; confirm gate intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 19:17:27 +00:00
marfrit 16490e6905 fix: buffer exec output for next user turn; alternation for strict templates
User-test surfaced the bug: with `deep` (mistral-nemo-12b) active,
running `list files` -> y on `CMD: ls` -> `Are there directory entries
beginning with "lor"?` returned a Jinja exception:

    api: ... Error: Jinja Exception: After the optional system message,
    conversation roles must alternate user/assistant/user/assistant/...

Cause: §6 specified "exec output injected into context uses role 'user'
with a prefix tag '[exec output]'." This works for permissive templates
(qwen2.5-coder-1.5b, the `fast` preset) but produces a back-to-back
user/user pair on strict templates that enforce the OpenAI alternation
contract — `[exec output]` user turn followed by the user's actual
follow-up question.

Fix:

context.lua:
  - new field `pending_exec_output` (initially nil)
  - new method `:append_exec_output(out)` buffers (concat on subsequent
    captures so multi-shell-then-ai still merges everything)
  - new method `:append_user(content)` flushes buffered exec output as
    a `[exec output]\n...\n\n` prefix and appends a user turn
  - `:reset()` also clears the buffer

repl.lua:
  - run_shell calls ctx:append_exec_output(out) instead of
    ctx:append({role="user", content="[exec output]\n"..out})
  - ask_ai calls ctx:append_user(text) instead of raw :append; saves
    prev_pending so a broker error can restore the buffer for retry

PHASE0.md §6:
  - amended the role-injection paragraph to describe the buffer-and-
    prepend policy; the §3 invariants list is untouched (this was a §6
    design detail, not a locked invariant)

Verification:
  - context unit tests cover: alternation after the failing sequence,
    multi-shell merge, reset clears buffer, broker-error retry path
  - live reproduction against `deep` (mistral-nemo) of the exact
    user-reported sequence succeeds; model responds with a sensible
    `CMD: ls | grep '^lor'` instead of a Jinja exception

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 18:41:21 +00:00
marfrit abc993aa49 review followup: empty-input guards, ~/ symmetry, CMD: filter
Addresses three concerns + one nit from the Phase 0 review pass.

executor.lua:
  - M.exec guards empty / whitespace-only cmd up front, returns
    "(empty command)" / -1 instead of running the wrapper on nothing.
  - On sentinel-parse failure with empty output (typical of shell
    parse errors — the syntax error itself escapes to the popen
    parent's stderr because 2>&1 is inside the unparsable subshell),
    surface "(no output — possible shell parse error)" rather than
    a silent empty frame.
  - extract_cmd_lines now skips whitespace-only / empty bodies; a
    bare `CMD: ` line in assistant output no longer turns into an
    "execute ''? [y/N]" prompt.
  - "what" comments cleaned in maybe_chdir.

router.lua:
  - path_like now matches `~` and `~/foo` so `~/scripts/build.sh`
    classifies as shell (was: ai). Restores symmetry with executor's
    maybe_chdir, which already expands `~` on `cd`.

repl.lua:
  - :exec and :ask trim args and renderer.status a usage line on
    empty rather than running an empty cmd / sending an empty turn
    to broker.

Regression: full prior smoke suite still passes — known_commands
shell paths, all maybe_chdir branches, CMD: extraction with non-empty
bodies, exec exit-code recovery, all router branches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 17:41:35 +00:00
marfrit e0e69f839b repl: readline loop, dispatch, all Phase 0 meta commands
Phase 0 implementation per PHASE0.md §5, §9.

Wires the lower-half modules into a single REPL:
  ffi/readline -> input + history
  router       -> classify(line) -> meta/shell/ai
  executor     -> run_shell with cd interception, frame output, capture
  broker       -> ask_ai, then extract+confirm CMD: lines from response
  context      -> turn list + eviction; status line on evict
  renderer     -> assistant text + exec frame + status

Prompt format `[aish:<model>]> ` per §9.

Meta commands all wired (§5.2): :quit/:q, :clear, :reset, :model <name>,
:models, :history, :exec <cmd>, :ask <text>, :help. Unknown meta names
report via renderer.status rather than crashing.

End-of-input (Ctrl-D on empty line) breaks the loop cleanly. Empty /
whitespace-only lines are skipped silently before dispatch — router
would otherwise classify them as ai with empty payload and pollute
context.

`CMD: ` extraction + confirm-and-execute is wired: when broker returns
an assistant turn, the response is scanned for §6 CMD: lines; each is
prompted via readline ("execute '...'? [y/N]") when config.shell
.confirm_cmd is true (default), else auto-executed.

On broker error, the user turn just appended is popped so the context
isn't polluted with a turn that has no assistant response.

Smoke covers :help, :models, shell exec via known_commands allowlist,
and Ctrl-D break. Live broker exchange deferred per issue #12.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 15:17:40 +00:00
claude-noether 4310207738 Phase 0: scaffold tree + manifest
- README, .gitignore, CLAUDE.md (project conventions)
- docs/PHASE0.md — full Phase 0 manifest (locked substrate)
- 10 root .lua modules + 4 ffi/ bindings, all stubs raising NotImplemented
  with module-scoped responsibilities matching the manifest
- config.lua wired to current dirac/hossenfelder endpoints (qwen-coder-7b
  snappy/32k + cloud via OpenRouter through hossenfelder)

File names match docs/PHASE0.md §4 exactly. Module bodies fill in across
later phases; the tree shape is locked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 23:16:07 +00:00