Commit Graph

87 Commits

Author SHA1 Message Date
marfrit fb15f7a690 repl: pre/post CMD hooks via config.hooks (closes #3)
Optional shell scripts trigger around every CMD: execution. Use cases:
audit logging, auto-format-after-edit, custom safety gates beyond the
existing confirm_cmd boolean.

Config shape:

    hooks = {
        pre_cmd  = "/path/to/pre-script",
        post_cmd = "/path/to/post-script",
    }

Contract per hook invocation:
  - The command line is piped to the hook on stdin.
  - Env vars: AISH_CMD (the command), AISH_TURN (#ctx.turns at the
    moment of dispatch), AISH_CWD (libc.getcwd() result).
  - Hook stdout is streamed live to the terminal via executor.exec
    (so the user sees its output regardless of exit status).

Pre-hook: non-zero exit aborts the command and emits a status line
including the exit code. last_exec_code is set to the hook's exit
so the {last_status} prompt template variable reflects the abort.

Post-hook: exit code is ignored (the spec says so); only the visible
stdout matters. Runs after the command's exec_end frame.

Tested with success, abort, and stdin-matches-env paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:16:11 +00:00
marfrit ce1378edee repl: fix {name} pattern to accept underscores (#10 follow-up)
%w excludes underscore in Lua patterns, so {ctx_used}, {ctx_max},
{cwd_short}, {last_status} were left literal in the prompt. Use
[%w_] to accept identifiers with underscores.

Surfaced during higgs smoke test of the new template.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:14:57 +00:00
marfrit d738f339cb repl: configurable prompt template via config.shell.prompt (closes #10)
At-a-glance situational awareness: see the active model, context fill,
mode flags, and cwd in the prompt itself — prevents "wait, am I still
in plan mode?" surprises.

Example config:

    shell = {
        prompt = "[{model} {ctx_used}/{ctx_max}t T{turn} {mode}] {cwd_short} > ",
    }

Variables (substituted via {name}):
  {model}        active preset name
  {ctx_used}     char/4 token heuristic (Phase 0 §8; accurate is Q1)
  {ctx_max}      config.context.token_budget
  {turn}         #ctx.turns
  {cwd}          libc.getcwd() (chdir-aware; PWD env may drift)
  {cwd_short}    cwd with $HOME -> ~
  {last_status}  last exec exit code, "" if none yet
  {mode}         "norris" | "plan" | "normal"

Default behavior unchanged when shell.prompt is unset — keeps the
"[aish:<model>]>" form with norris  and plan markers.

Side wiring:
  - ffi/libc.lua gains getcwd() (chdir() doesn't update PWD).
  - run_shell records exit code into last_exec_code for {last_status}.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:14:43 +00:00
marfrit 10d2501cff repl: peel trailing punctuation from @path mentions (#7 follow-up)
Natural-language prose like "look at @README.md, then..." or
"@foo.lua." at sentence end previously failed to expand because the
trailing comma/period was included in the path.

Now: if the raw token doesn't resolve, peel trailing chars from
[.,;:?!)] one at a time until the path resolves or no more peels are
possible. On success, the peeled chars are emitted verbatim AFTER the
closing fence so the original punctuation is preserved.

Surfaced during higgs smoke test (TC: "say the first line of
@README.md, then stop" — the trailing comma broke resolution).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:11:22 +00:00
marfrit bb374c2ad2 repl: @path mention expansion in input lines (closes #7)
Saves the user from manual copy/paste: typing "show me @repl.lua" or
"compare @config.lua and @config.example.lua" auto-expands each mention
to a fenced code block carrying the file contents, language-tagged by
extension, and feeds the composed text to the broker.

Wired on the "ai" branch of the input loop and inside :ask. Meta and
shell branches pass through unchanged — "@foo" in shell context is a
literal program argument; meta commands store text verbatim.

Trigger rule: "@" must follow start-of-string or whitespace — avoids
false positives on email addresses ("user@example.com") and shell
short-options. Path extends to next whitespace.

Other behavior:
  - Language tag derived from extension via a small lookup; unknown
    extensions yield an untagged fence.
  - Files over 32 KB are truncated head/tail (16K + 8K) with a marker.
  - Missing files leave the literal "@path" token in place and emit
    a "[aish] @path: not found" status — non-fatal, lets the user
    correct the path and re-type.
  - Each successful expansion emits "[aish] @path expanded (N bytes
    [, truncated])" so the user sees what was inlined.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:10:54 +00:00
marfrit dccd9e90cc repl: :plan toggle — CMD: lines become PLAN: notes (closes #5)
Plan mode is a safer entry point than going straight to Norris: the user
iterates with the model on what to do, sees each CMD: as a PLAN: line,
and the would-have-run notes feed back into the next-turn context so the
model can refine without side effects.

Toggle with :plan (flip), :plan on, :plan off. Off by default.

When plan_mode is true:
  - CMD: lines extracted from the assistant turn print as "PLAN: <cmd>"
  - The note "[plan] would run: <cmd>" is appended via the existing
    append_exec_output channel — same context flow as a real exec, so
    the model sees its proposed action on the next turn.
  - run_shell is NOT called; no executor, no cd intercept, no capture.

The prompt shows "[aish:<model> plan]>" while active (mirrors the
norris  marker convention).

Orthogonal to Norris: plan_mode only gates the interactive CMD:
extraction path. Norris has its own halt protocol; combining them is
not supported (the planner would be confused by skipped actions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:09:08 +00:00
marfrit 81c3b1b44a main: non-interactive -p/--prompt one-shot mode (closes #4)
Adds `aish -p "<text>"` for Unix-pipeline composability:

  tail app.log | aish -p "any anomalies?"
  aish -p "summarize: $(curl -sS https://...)"

The flag bypasses repl.lua entirely. On invocation:

  1. Stdin: when not a TTY, read to EOF and prepend to the prompt as a
     fenced block. ffi.libc.isatty(0) gates the read so interactive
     `aish -p "..."` (no pipe) doesn't hang.
  2. Resolve config.models[config.default_model].
  3. Stream broker.chat_stream replies to stdout; finalize with newline.
  4. Exit 0 on success, 1 on broker error, 2 on arg / config error.

Behavior NOT in -p mode (kept simple per the issue's "no repl.lua
involvement"):
  - No MCP, no tool loop, no Norris, no routing, no memory injection.
  - "CMD:" lines in the reply are printed verbatim, NOT executed —
    callers can grep / pipe them as they wish.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:06:27 +00:00
marfrit 0700dce881 repl: enforce budget per Norris step, not just post-loop (closes #51)
PHASE3.md §2 specifies sliding-window eviction "including mid-Norris-
session if the loop runs long". Implementation only called
enforce_budget() once, after the planning loop exited — so for a tight
max_turns with a multi-step Norris session the model saw the FULL
conversation throughout, defeating context budgeting and preventing
R-C3 (NORRIS suffix goal anchor surviving eviction) from being
exercised end-to-end.

Move status_evictions(ctx:enforce_budget()) inside the while loop so
it runs after every safety.norris_step return. Drop the now-redundant
post-loop call.

Surfaced during TC #38 (Qwen3-30B-A3B, max_turns=4) where the
"oldest 4 turns evicted" status arrived AFTER NORRIS DONE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:05:34 +00:00
marfrit 0c93e31186 repl: warn on stale MCP auto_approve keys (closes #33)
Auto-approve policy keys that point at unconnected aliases, mistyped
tool names, or malformed forms were silently ignored — leaving the user
with surprise confirm prompts and no diagnostic.

validate_auto_approve() now walks config.mcp.auto_approve at startup
(after the MCP connect loop) and after each :mcp connect. For each key:

  - "alias__*"       — warn if alias has no live session
  - "alias__tool"    — warn if alias unknown OR tool not in registry
  - anything else    — warn as malformed (not in alias__tool form)

Non-fatal. The re-run on :mcp connect lets a key that referenced a
not-yet-connected alias become live without a restart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:05:08 +00:00
marfrit 299dcce78f repl: validate MCP tool names against Bedrock regex (closes #32)
Anthropic-via-Bedrock enforces ^[a-zA-Z0-9_-]{1,128}$ on tool names.
We already moved the alias separator from "." to "__" (commit f26cbd9),
but a future MCP server could still register a tool whose name (or whose
combination with the alias) contains characters outside that class —
silently breaking calls to strict providers.

connect_mcp now warns at startup for:
  - aliases containing "__" (would misparse on tool dispatch)
  - emitted alias__name strings that violate the regex or exceed 128 chars

Behavior preserved: validation is informative-only. tools_schema() still
emits the offending tool; local llama.cpp users accept lenient names
and shouldn't be penalized for downstream strictness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:04:29 +00:00
marfrit 8e0e735e15 repl: fallback patterns — add 'Could not connect to server' (CURLE_COULDNT_CONNECT)
Surfaced by autonomous run of TC #48: pointing models.fast at
http://localhost:9999 (port closed, host resolves) emits
"transport: Could not connect to server" — CURLE_COULDNT_CONNECT
(7) which the Phase 5 fallback pattern set didn't include.

Added "Could not connect to server" to FALLBACK_PATTERNS in repl.lua.
Now fallback fires for the full set of common libcurl/HTTP transport
failure shapes:

  HTTP 5xx              server-side
  HTTP 404 model_not_found
  HTTP 408              gateway request timeout
  Couldn't resolve host CURLE_COULDNT_RESOLVE_HOST
  Could not connect to server   CURLE_COULDNT_CONNECT  (← added)
  Connection refused
  Timeout was reached   CURLE_OPERATION_TIMEDOUT (variant A)
  Operation timed out   CURLE_OPERATION_TIMEDOUT (variant B)

Re-tested #48 end-to-end:
  fast pointed at dead port → fast fails → status fires →
  cloud (anthropic/claude-haiku-4.5 via openrouter) responds normally

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:49:13 +00:00
marfrit d72689f709 config: deep model → deepseek-coder-v2-lite (temporary)
qwen3-30b-a3b-instruct isn't loaded on hossenfelder right now (per
/v1/models). deepseek-coder-v2-lite IS loaded — 16B MoE with ~2.4B
active params; fast enough that the 30-min timeout from the
qwen3-30b config was wildly over-budget.

Switched to deepseek-coder-v2-lite for the time being. Restore
qwen3-30b when the slot is back up.

Live-probed: YES/NO destructive probe via the deep model preset
returns "YES." in ~4.8s — well within the new 5-min timeout, and
fast enough that the Phase 3 LLM second-opinion path is now
functional again without falling back to "fail-safe YES" on every
ambiguous command.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:42:23 +00:00
marfrit a9b39cd435 config: Phase 5 routing + summarize-on-evict example (commit #5)
Phase 5 commit #5 (final) per docs/PHASE5.md §11. Documentation-only;
commented-out example showing:
  - routing.auto            (per-request auto-routing toggle)
  - routing.classes         (class → model mapping; reasoning = nil
                             by default per R-N2 cost-safety)
  - routing.fallback        (single-hop retry to cloud on transport fail)
  - routing.fallback_model  (default "cloud" if uncommented)
  - context.summarize_on_evict + summarizer_model + max_summary_chars
    (shown INSIDE the context = {...} block above)

All defaults OFF — Phase 5 is opt-in across the board. Existing configs
without `routing` or `context.summarize_on_evict` behave identically to
Phase 4.

Phase 5 implementation complete:
  #1 3e57824  router.classify_model + 31-case corpus
  #2 03497b5  context summarize_fn callback + summary block in to_messages
  #3 40ea0b4  repl routing + fallback + summarize_fn wiring + :route/:fallback
  #4 -        (bundled into #3 since meta cmds are trivial additions)
  #5 (this)   config example block

Phase 5 verify-partial:
  - router.classify_model: 31/31 case corpus passes
  - context summarize-on-evict: mock callback fires correctly (additive
    + compress paths), summary suppressed under Norris, :reset clears it
  - repl meta cmds: :route on/off/classes/check + :fallback on/off all
    work; :route check reports class + "routing currently disabled"
    suffix when auto is off (N1)

Verify-pending: end-to-end with real broker (route a code question, see
it land on deep; kill local backend, see fallback fire to cloud).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:32:20 +00:00
marfrit 40ea0b49b0 repl: routing + fallback + summarize_fn wiring (Phase 5 commit #3)
Phase 5 commit #3 per docs/PHASE5.md §3 / §11. Wires the Phase 5
machinery into the REPL.

make_summarize_fn():
  Returns a closure that maps (prior_summary, evicted_turns) onto a
  broker.chat call against cfg.context.summarizer_model (default
  "fast"). Three dispatch paths matching the R-B1 callback contract:
    evicted == nil      → compress signal
    prior present       → additive ("extend the prior summary ...")
    prior nil           → first-time ("summarize the following turns")
  All use a system prompt enforcing "exactly one short paragraph",
  max_tokens=300, timeout_ms=30000. Broker failure returns nil so
  Context falls back to silent eviction. Renderer status is logged
  on failure for visibility.

Context construction:
  Build ctx_opts as a fresh table (copies config.context to avoid
  mutating it), adds summarize_fn ONLY when
  config.context.summarize_on_evict == true. Defaults stay OFF —
  Phase 4 regression coverage.

Fallback machinery:
  - FALLBACK_PATTERNS table with 7 transport-error signatures
    (HTTP 5xx, 408, 404-model_not_found, DNS, connection refused,
    "Timeout was reached", "Operation timed out")
  - fallback_reason(err) strips the "transport: " prefix and matches.
  - should_fallback(err) gates on cfg.routing.fallback.
  - call_broker(cfg, name, msgs, on_delta, opts) wraps
    broker.chat_stream:
      • tracks any_delta via wrapped on_delta callback
      • retries ONCE against cfg.routing.fallback_model (default
        "cloud") when err matches AND no deltas arrived (N3:
        mid-stream failures aren't retried — partial text would
        duplicate)
      • emits "[aish] local <name> failed (<reason>); retrying via
        <fb>" status before the retry call

ask_ai routing:
  - Routing decision taken ONCE on entry (R-C2). req_name/req_cfg
    locals carry the choice through every tool-sub-loop iteration.
  - active_name/active_cfg are NOT mutated — user's :model selection
    survives the request.
  - When config.routing.auto is true, classify_model(text, config) is
    invoked. Non-nil model + non-active → swap req_cfg + status line.
  - broker.chat_stream call replaced with call_broker (fallback wrap).

Meta cmds:
  :route on/off           — toggle cfg.routing.auto at runtime
  :route classes          — show class → model mapping
  :route check <text>     — report classify_model result with
                            "(routing currently disabled)" suffix when
                            auto is off (N1)
  :fallback on/off        — toggle cfg.routing.fallback at runtime

HELP updated with the four new commands.

Smoke-tested: aish boots, all four metas behave correctly, classify_model
returns reasoning class for "Explain how MMAP works on Linux" (the model
slot is nil because no classes are configured by default — N2 cost-safety).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:31:14 +00:00
marfrit 03497b5eea context: summarize-on-evict callback + summary block (Phase 5 commit #2)
Phase 5 commit #2 per docs/PHASE5.md §3 / §6.

Context.new opts additions:
  - summarize_fn(prior_summary, evicted_turns) -> string|nil callback
    per R-B1 canonical signature:
      (nil, [turns])  → first-time summarize
      (str, [turns])  → additive: extend prior summary
      (str, nil)      → compress: re-summarize the prior
    nil return → silent eviction (Phase 0 behavior preserved)
  - max_summary_chars (default 2000) — when ctx.summary grows past
    this, the callback is invoked AGAIN with the compress signal
    so the summary stays bounded across long sessions

Context.summary (string|nil) is the rolling summary state. Composed
into the SYSTEM MESSAGE (not as a turns[] entry — A3 resolution
avoids system/system back-to-back). compose_summary() emits:

  [earlier conversation summary]
  <ctx.summary>

between [background] and the NORRIS suffix. Both [background] and
[earlier summary] are SUPPRESSED when ctx.norris_active (R-C4 —
mirrors R-C1 from Phase 4; planner stays focused on its goal).

enforce_budget() rewrite:
  - Collects the evicted pair before removing.
  - Calls summarize_fn(self.summary, pair) under pcall — wraps any
    callback error so a broken summarizer can't crash the REPL.
  - Updates self.summary if callback returned non-empty string.
  - If new summary exceeds max_summary_chars, invokes compress pass
    (callback with evicted=nil).
  - Removes pair from turns (same final state as Phase 0).

Context:reset() clears the summary alongside turns + pending_exec_output.

Smoke-tested with a mock summarizer over a 10-turn context with
max_turns=4 and max_summary_chars=80:
  - 6 turns evicted to bring count down to 4
  - Callback fired 4 times (3 additive + 1 compress when summary
    crossed 80 chars)
  - to_messages includes [earlier conversation summary] block
  - Under norris_active=true, summary suppressed (block absent)
  - :reset clears ctx.summary

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:18:37 +00:00
marfrit 3e57824684 router: classify_model heuristic + 31-case corpus (Phase 5 commit #1)
Phase 5 commit #1 per docs/PHASE5.md §11. Pure-Lua per-request model
routing — no IO, no LLM probe in v1.

router.classify_model(text, cfg) -> (model_name | nil, class_label):
  1. classify_class(text) walks heuristics in priority order:
       code class:
         - triple-backtick fence anywhere
         - "traceback" / "stacktrace" / "stack trace" (ci)
         - "error:" / "exception:" in first 60 chars (ci)
         - path-with-code-extension token (.py/.lua/.c/.js/.go/.rs/.cpp/.h/.ts)
         - 5+ lines with indented content (looks like a paste)
       reasoning class (requires text >= 15 chars to skip bare keywords):
         - "explain" / "why " / "how does" / "compare" (ci)
         - "?" + length > 100 chars
       default class: everything else
  2. Map class via cfg.routing.classes[class] → model name (or nil = keep current).
  3. Return (model_name_or_nil, class_label).

ALWAYS evaluates regardless of cfg.routing.auto — caller (repl.ask_ai
in commit #3) gates on the flag. This separation lets `:route check`
introspect the heuristic even when routing is off (N1).

M._classify_class exposed for testing.

Test corpus (test_router_model.lua, 31 cases):
  - 13 code-class positives (fence, traceback, paths, multi-line paste)
  - 6 reasoning-class positives (explain/why/how does/compare/?+length)
  - 8 default-class (short queries, bare keywords below 15-char threshold,
    non-code paths like .md/.txt)
  - 3 model-mapping cases (code→"deep", reasoning→"cloud", default→nil)
  - 1 R-N2 default test: classes.reasoning=nil → reasoning text yields
    nil model override (heuristic still fires, no swap)
  - All 31 pass; 15-char threshold catches "how does ASLR work?" without
    false-positive on bare "explain".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:17:22 +00:00
marfrit 2e389c1475 docs/PHASE5: review fold-in — callback signature, Norris suppression, cost defaults
Independent review found 1 BLOCKER + 5 CONCERNs + 4 NITs. Resolutions:

B1 BLOCKER: summary callback signature was inconsistent across §3 and §6.
  Canonical now: summarize_fn(prior_summary, evicted_turns) -> string|nil
  dispatching on the two args:
    (nil, [turns])  — first-time summarize
    (str, [turns])  — additive (extend prior summary with new evictions)
    (str, nil)      — compress (re-summarize the prior summary itself)

C1: re-summarize trigger now uses the (str, nil) compress signal
  rather than degenerate (str, {}).

C2: routing decision is taken once on entry to ask_ai. The chosen
  active_cfg is used for every tool-sub-loop iteration. Original
  active_cfg restored after ask_ai returns.

C3: AUTO-routing does NOT fire inside the Norris loop. Model fixed
  at :norris launch time; planner stays on it for every iteration.
  Q39 resolved. Per-iteration fallback still gated by
  cfg.routing.fallback — retries the failing call against cloud
  without permanently switching the planner.

C4: Summary block suppressed in Norris (mirrors Phase 4 R-C1 for
  the [background] block). Both are "earlier context" the planner
  generally doesn't need.

C5: Fallback pattern coverage expanded — added HTTP 408 (Q41
  resolved) and "Operation timed out" (libcurl version variant).
  Dropped "HTTP response code said error" from A2 — FAILONERROR
  was removed in Phase 4 f26cbd9.

NITs folded:
  N1 :route check <text> always runs heuristic; suffix
     "(routing currently disabled)" when cfg.routing.auto = false
  N2 reasoning → nil by default (not → "cloud"); user explicitly
     opts in to map reasoning to a paid model. Same cost-safety
     rationale as confirm_cmd default true.
  N3 "Retry only when no deltas have arrived" promoted to §5
     normative rule (was in §11 risk row).
  N4 cfg.routing.cloud_fallback renamed cfg.routing.fallback to
     align with the :fallback meta verb.

Reviewer verdict: commit #1 (router.classify_model) is implement-
ready; B1/C1 resolution required before commit #2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:15:39 +00:00
marfrit 555fdd7717 docs/PHASE5: analyze — surface clean, summary lives on ctx.summary not turns
A1. router.lua surface clean; classify_model is a natural sibling of
    classify. No structural refactor.

A2. broker error message shapes confirmed: all transport errors carry
    "transport: " prefix; "api: " for SSE-framed semantic errors;
    "broker: " for config bugs. Fallback matcher must strip the prefix
    before testing — list of eligible patterns tightened in §5.

A3. Q38 RESOLVED — summary doesn't go in ctx.turns (would create
    system/system back-to-back, same gotcha as PHASE0 §6 user/user).
    Instead lives on ctx.summary (string) and composes into the
    system message between [background] and NORRIS suffix. No new
    role:"system" turn; no alternation risk. §3 + §6 reflect.

Module-changes table updated to specify ctx.summary string field +
the to_messages composition order. Storage shape diagram in §6
rewritten.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:12:50 +00:00
marfrit 4453b93ab5 docs/PHASE5: formulate — multi-model routing + cloud fallback + summarize-on-evict
Phase 5 formulate manifest. Three pillars per PHASE0 §11 row 5:
heuristic-based per-request model routing, single-hop cloud fallback
on local transport failure, and fast-model summarization at sliding-
window eviction time.

Resolutions baked in via §2:
  - Routing trigger: per-request in repl.ask_ai, gated by
    cfg.routing.auto (default off)
  - Classification: pure-Lua heuristics (length, keywords, code-fence
    detection, exception markers) — no LLM probe in v1
  - Classes: code → deep, reasoning → cloud, default → keep active
  - Fallback trigger: string-match on err for HTTP 5xx /
    model_not_found / "Connection refused" / DNS / timeout
  - Fallback: one retry against cfg.routing.fallback_model (default
    "cloud" if configured); status line on every retry
  - Summarize: enforce_budget invokes summarize_fn callback wired
    by repl.lua to broker.chat with the fast model
  - Summary turn: single rolling _summary at turns[1], appended to
    on each eviction, re-summarized when it exceeds max_summary_chars

Open questions (Q37-Q42) in §10:
  Q37 routing for :ask explicit ask
  Q38 summary turn vs system-role alternation
  Q39 fallback under Norris (proposal: single-request only)
  Q40 summary re-summarize fidelity loss (lossy by design)
  Q41 HTTP 408 pattern eligibility (default yes)
  Q42 routing inside tool-call sub-loop (proposal: fix at entry)

5-commit roadmap in §11. No new module files; mostly repl.lua and
router.lua growth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:11:26 +00:00
marfrit 27784f9b68 config: Phase 4 memory example block (commit #5)
Phase 4 commit #5 (final) per docs/PHASE4.md §12. Documentation-only;
commented-out example showing:
  - inject_max_chars (cap on startup injection; default 2000)
  - summarizer_model (which configured model :memory summarize uses)

The block is OFF by default. The :memory meta surface (:remember,
:memory list/forget/clear/inject/summarize) works without the block —
items persist to <history.dir>/memory.jsonl regardless. The block only
configures the injection-into-system-prompt behavior + summarizer model
choice.

Phase 4 implementation complete:
  #1 199dd87  history.lua memory store + ffi/libc.lua flock
  #2 c1a5c73  context.lua [background] block (suppressed in Norris)
  #3 3b074af  repl.lua memory handle + :remember + :memory meta
  #4 f22d21d  :memory summarize — LLM candidate extraction
  #5 (this)   config.lua memory example block

Phase 4 verify-partial:
  - history memory round-trip tests: add/forget/load all green
  - flock single-writer enforcement verified
  - context composition order (DEFAULT → [background] → NORRIS) +
    Norris suppression all green
  - End-to-end persistence across boots: :remember on boot 1 visible
    on boot 2 as injected memory items
  - :memory forget id-not-active surfaces clean status (N1)
  - :memory clear with [y/N] confirm gate works
  - :memory summarize wire-correct against fast model (candidate
    parsing tolerates bullets; per-candidate y/N/edit prompts fire)

Verify-pending: real-model summarizer quality test (deep/cloud);
multi-process flock contention test; long-running :memory inject
race with running broker stream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 07:53:58 +00:00
marfrit f22d21d754 repl: :memory summarize — LLM candidate extraction (Phase 4 commit #4)
Phase 4 commit #4 per docs/PHASE4.md §6.

:memory summarize:
  1. Source-of-truth: session log file via history.load(session_path),
     NOT ctx:to_messages() (R-C2). Skips turns tagged meta="summarize"
     so prior summarize exchanges don't self-amplify across multiple
     calls within the same session.
  2. Pick summarizer model from cfg.memory.summarizer_model (default
     active model).
  3. Build a transcript string ("role: content" per turn, 800 chars max
     per turn) and feed it as a single user turn alongside a system
     instruction asking for "(fact|pref|context): <content>" lines.
  4. broker.chat with max_tokens=1024 + timeout_ms=90000 (the deep
     model can take a while; we don't want a 15s probe-cap here).
  5. Log the response as an assistant turn with meta="summarize" so the
     next :memory summarize call filters it out.
  6. Parse response lines tolerating markdown bullets and bold markup:
     ^%s*[-*]?%s*[*_]*(fact|pref|context)[*_]*:%s*(.+)$
  7. Per-candidate prompt: y / N / edit.
       y    → memory:add(kind, content)
       edit → readline prompt for replacement text
       any other → drop
  8. status: "summarize: added N / M candidates".

Live-tested against hossenfelder/fast:
  Pipeline correct end-to-end. Model emitted one candidate; user
  confirmation prompt fired; item persisted; :memory list showed it.
  Candidate quality from the 1.5B model is poor — typical
  small-model behavior; deep/cloud models would do better but this
  isn't an aish bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 07:53:36 +00:00
marfrit 3b074afaee repl: memory handle + :remember + :memory meta (Phase 4 commit #3)
Phase 4 commit #3 per docs/PHASE4.md §12. End-to-end memory wiring.

Startup:
  - Opens memory handle at <history.dir>/memory.jsonl via
    history.open_memory(). Status-logs failure (e.g. flock held by
    another aish) and continues without memory.
  - inject_memory(): loads via history.load_memory(), truncates by
    cfg.memory.inject_max_chars (default 2000), populates
    ctx.memory_items. Status line announces N items injected.
  - shutdown_session() now also closes memory (releases flock).

Meta commands:
  :remember <text>       — shortcut for :memory add fact <text>;
                            auto-refreshes ctx.memory_items so the
                            next AI turn sees the new item without
                            restart
  :memory list           — show id / ts / kind / content (truncated
                            at 80 chars per line)
  :memory add <kind> <t> — fact|pref|context required; rejects other
                            kinds
  :memory forget <id>    — N1: checks active-set first, surfaces
                            "id N not active (already forgotten or
                            never existed)" without appending if
                            the id isn't live
  :memory clear          — [y/N] confirm prompt; tombstones every
                            active item
  :memory inject         — N4: reload memory.jsonl into
                            ctx.memory_items, replacing existing.
                            Useful after manual file edits.

Help block extended with the new commands.

End-to-end verified:
  Boot 1 → :remember×2 + :memory add → 3 items, :memory list shows all
           three with timestamps
  Boot 2 → memory: 3 items injected (startup status); :memory list
           same three; ctx.turns empty (history is sessions/, memory
           is separate)
  Boot 3 → :memory forget 2 succeeds; :memory forget 99 → "not
           active" status without writing a tombstone; :memory list
           shows 2 items; :memory clear → confirm prompt → "cleared 2
           items"; :memory list → "(no memory items)"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 05:11:48 +00:00
marfrit c1a5c736ec context: [background] memory injection block (Phase 4 commit #2)
Phase 4 commit #2 per docs/PHASE4.md §5/§12.

ctx.memory_items (array of {kind, content, ...}) loaded by repl.lua
at startup from history.load_memory(). When non-empty AND ctx not in
Norris mode, to_messages() appends a [background] block to the
system prompt:

  [background] (memory.jsonl; manage via :memory)
  - (fact) User prefers terse responses
  - (context) Project: aish (LuaJIT REPL)

Suppression under Norris (R-C1): when ctx.norris_active is true the
[background] block is omitted. Norris already anchors via its
NORRIS suffix carrying the goal; a 2KB background block per planning
iteration would add ~16K tokens of redundant input over an 8-step
run.

Suffix composition order is now:
  1. DEFAULT_SYSTEM_PROMPT (Phase 0 + Phase 2 MCP, statically embedded)
  2. [background] block — when memory_items non-empty AND NOT norris_active
  3. NORRIS MODE block — when norris_active

repl.lua wiring (memory_items population at startup, :memory meta cmds,
:remember shortcut, :memory inject for live refresh) lands in commit #3.

Verified composition order with 4 cases:
  default-only         → 697 chars, no background, no norris
  memory_items only    → 824 chars, background YES, no norris
  memory + norris      → 1451 chars, background NO, norris YES (suppressed)
  norris only          → 1451 chars, background NO, norris YES

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 04:52:42 +00:00
marfrit 199dd87eaa history: memory.jsonl store + flock (Phase 4 commit #1)
Phase 4 commit #1 per docs/PHASE4.md §12. Two file changes bundled
because R-B1 (flock for race-free single-writer enforcement) cannot
be deferred — adding it retroactively means reopening the memory
handle.

ffi/libc.lua extensions:
  - cdef flock(int fd, int op), open(...), lseek(int, long, int)
  - constants LOCK_EX=2, LOCK_NB=4, LOCK_UN=8
  - M.flock(fd, op) wrapper returning (true) on success or
    (false, errmsg) — errmsg is the strerror text so callers can
    surface "Resource temporarily unavailable" cleanly to the user.

history.lua additions (Phase 4 section appended at end):
  - M.open_memory(path) -> handle | nil, err
    Opens the file via libc.open(2) (need integer fd for flock —
    io.open's FILE* doesn't expose it), takes flock(LOCK_EX | LOCK_NB).
    Returns "memory.jsonl held by another aish process" on lock-held.
    Scans existing content for max id; caches as handle.next_id.
    Writes meta header on first creation (no id, ignored at load).
  - handle:add(kind, content, tags?, source?) -> id
    Assigns next id; appends one JSONL item with auto-timestamp.
    kind ∈ {fact, pref, context} enforced via assert.
  - handle:forget(target_id)
    Appends a tombstone {id, ts, kind:"forget", target}.
  - handle:close()
    Releases fd (flock auto-released on close).
  - M.load_memory(path) -> items_table
    Reads all lines, builds forget-target set from kind=="forget"
    entries, returns active items as an array sorted by ts desc.
    Items without id (meta header) silently dropped. Tombstones with
    non-matching targets are no-ops (N3 invariant).

Round-trip test passes:
  - open empty file → next_id=1
  - add 3 items → ids 1, 2, 3
  - forget id 2 (appends tombstone)
  - reopen → next_id correctly advances past the tombstone (=5)
  - load_memory → 2 active items (id 1 + id 3); tombstone resolved
  - lock-held detection: second open while first held → fails with
    "memory.jsonl held by another aish process" message
  - close releases the lock; reopen after release succeeds

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 04:52:03 +00:00
marfrit ffead3986c docs/PHASE4: review fold-in — flock for race, Norris suppression, summarizer self-amp
Independent review found 1 BLOCKER + 3 CONCERNs + 4 NITs.

R-B1 (BLOCKER): TOCTOU race on memory.jsonl — two aish processes
  scanning the same file compute identical next_ids. Resolution:
  flock(LOCK_EX | LOCK_NB) on the fd in M.open_memory, held until
  close. Bundled into commit #1 (per reviewer: cannot defer because
  adding flock retroactively means reopening the handle). Requires
  ffi/libc.lua extension: flock cdef + LOCK_EX/LOCK_NB/LOCK_UN
  constants + M.flock wrapper.

R-C1 (CONCERN, closes Q33): [background] block suppressed when
  ctx.norris_active. Avoids ~16K of redundant tokens per 8-step
  Norris run. Norris already anchors via its goal in the NORRIS
  suffix; memory items rarely change step-to-step planning.

R-C2 (CONCERN): summarizer self-amplification — running :memory
  summarize twice in one session would feed the prior summarize
  call's assistant turn into the next input. Resolution: operate
  on the session log file (history.load(session_path)) instead
  of ctx:to_messages(), and tag prior summarize turns with
  meta="summarize" so they're filterable.

R-C3 (CONCERN, cosmetic): §5 diagram clarified that
  DEFAULT_SYSTEM_PROMPT already carries the Phase 2 MCP block
  statically — not a separate dynamic block in v1.

NITs N1-N4 folded inline:
  N1 forget no-op for unknown id surfaces a status
  N2 path note: memory.jsonl is sibling of sessions/, no collision
  N3 item-id invariants: id >= 1; meta header has no id; tombstones
     with non-matching targets are no-ops
  N4 :memory inject semantics explicit (replace ctx.memory_items
     from a fresh load + LRU-by-ts truncation)

§3 module-changes table grew a new ffi/libc.lua row.
§12 commit #1 description tightened — flock work bundled inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 04:50:43 +00:00
marfrit 2146b909f8 docs/PHASE4: analyze — surface confirmed, counter strategy locked
A1. history.lua surface lines up cleanly for the memory additions —
    no structural refactor; pure additive functions mirroring the
    session pattern.

A2. Counter persistence: scan at open, cache next_id in handle.
    O(n) load (n bounded by curation, ~hundreds), no sidecar file.
    Persisted ids let forget-tombstones target items even across
    restarts.

A3. System-prompt suffix order locked: DEFAULT (carrying Phase 2 MCP
    block baked in) → Phase 4 [background] → Phase 3 NORRIS. Token
    cost measured: default ~174 toks, +NORRIS ~364 toks, +NORRIS+2KB
    background ~865 toks. Well within typical context budgets.

No manifest amendments needed — §3/§5 already match. Findings recorded
inline as Phase 7 anchors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 04:47:01 +00:00
marfrit bea717534c docs/PHASE4: formulate — memory.jsonl + startup injection + :memory meta
Phase 4 formulate manifest. Three pillars per PHASE0 §11 row 4:
memory.jsonl persistent cross-session store, startup context injection
into the system prompt, and the :memory management surface +
opt-in :memory summarize for candidate extraction.

Resolutions baked in via §2:
  - Storage:  append-only JSONL at <history.dir>/memory.jsonl
  - Format:   {id, ts, kind, content, tags?, source?}
  - Kinds:    fact / pref / context (lightly typed v1)
  - Forget:   tombstone append, resolve at load (set-based)
  - Cadence:  manual :memory summarize only in v1; auto-trigger Q-listed
  - Inject:   dynamic [background] block on system prompt, capped at
              2000 chars by default; LRU-by-ts selection if over-budget
  - Order:    DEFAULT → MCP block → [background] → NORRIS suffix
              (Norris last so it dominates when active)

New module surfaces:
  history.lua  M.open_memory / memory:add / memory:forget / M.load_memory
  context.lua  ctx.memory_items + [background] composer
  repl.lua     :remember, :memory add/list/forget/clear/inject/summarize
  config.lua   commented-out memory = {...} example

Open questions (Q31-Q36) tracked in §11:
  Q31 auto-summarize trigger (manual v1; auto-on-quit candidate)
  Q32 in-place edit vs forget+re-add
  Q33 Norris-mode interaction (proposal: both blocks stay)
  Q34 split prefs into a dedicated prompt section?
  Q35 redaction of sensitive content during summarize
  Q36 duplicate detection on :memory add

5-commit roadmap in §12 (history → context → repl → summarize → config).
No new module files. No substrate amendments to PHASE0 — entirely
additive on top of Phase 1's history.lua pattern and Phase 3's
dynamic-suffix pattern in context.lua.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 04:25:57 +00:00
marfrit 50666d092f config: Phase 3 safety example block (commit #6)
Phase 3 commit #6 (final) per docs/PHASE3.md §12. Documentation-only;
commented-out example showing the safety schema:
  - llm_second_opinion  (bool, default true)
  - llm_model           (string, default deep→default_model fallback)
  - max_norris_steps    (int,  default 8)

The block notes the model-selection trade-off (R-B2): cloud is the
independent-class fast option (costs money), deep is the local-but-slow
option, fast is self-policing and NOT recommended.

No behavior change to existing configs — safety defaults kick in when
the block is absent.

Phase 3 implementation complete:
  #1 bd59ce7  safety static patterns (34 rules) + 87-case test corpus
  #2 2abd5da  LLM second-opinion + session cache + opts.max_tokens
  #3 d2a53d2  renderer Norris frames
  #4 11b1f56  safety.norris_step planner (single iteration)
  #5 a404b2a  repl driver + \C-n real binding + :norris/:safety meta
              + readline rl_insert_text/rl_redisplay
  #6 (this)   config.lua safety example block

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:42:46 +00:00
marfrit a404b2a152 repl: Norris driver + \C-n + :norris/:safety meta (Phase 3 commit #5)
Phase 3 commit #5 per docs/PHASE3.md §12. Wires safety.norris_step
(commit #4) into the REPL with the user-facing surface.

ffi/readline.lua extensions (A1 + R-C4):
  - rl_insert_text + rl_redisplay added to ffi.cdef block; M.insert_text
    and M.redisplay wrappers exposed.
  - M.bind: removed `:free()` on previous callback. Now keeps every
    bound callback pinned for process lifetime in `_pinned` list
    (alongside `_bound[seq]` for current lookup). Avoids the
    use-after-free window between unbind and rebind that R-C4 flagged.
    Memory cost is bounded — one closure per key sequence binding.

context.lua Norris suffix (R-C3 / §8):
  - to_messages() composes a dynamic NORRIS MODE block onto the
    system prompt when ctx.norris_active is set. The block carries
    ctx.norris_goal so eviction of the user's "[norris] goal:" turn
    doesn't lose the anchor. Returns to plain system prompt when
    Norris exits.

repl.lua Norris driver:
  - prompt() now shows  marker when ctx.norris_active per PHASE0.md §9.
  - \C-n bound to a real handler — inserts ":norris " at the cursor
    (replaces Phase 1 status placeholder).
  - run_norris(goal) function: sets norris_active + norris_goal,
    appends a "[norris] <goal>" user turn, renders the banner, then
    loops calling safety.norris_step with an injected helpers table
    until a terminal status returns. Renders the closing banner.
  - norris_halt(): the [N] proceed/skip/abort prompt called by
    safety.norris_step via helpers.halt. Empty input → abort (safe).
  - dispatch_tool(): factored from the Phase 2 ask_ai code so
    safety.norris_step can call it.
  - norris_exec(): factored exec path for autonomous mode (skips
    the interactive run_shell cd-status renderer).
  - :norris <goal>  meta — launches autonomous mode
  - :norris off     meta — drops Norris flag (rare; usually 'abort')
  - :safety patterns meta — lists active is_destructive rules
  - :safety check <cmd> meta — probes a hypothetical command

End-to-end mock-driven test:
  Submitted ":norris find files in /tmp" → banner → step 1 emits
  tool_call (auto_approved per policy) → dispatched → frame rendered
  → step 2 emits "GOAL: complete" → sub-loop exits → DONE banner.
  2 broker invocations, no stalls.

config.lua safety example block lands in commit #6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:42:14 +00:00
marfrit 11b1f566b3 safety: norris_step planner (Phase 3 commit #4)
Phase 3 commit #4 per docs/PHASE3.md §12. Single-iteration planner.
The driver loop in repl.lua (commit #5) calls this in a while loop,
advancing step_n on every "continue" return.

M.norris_step(ctx, model_cfg, helpers, opts):
  1. One broker.chat_stream round-trip — text + tool_calls collected,
     text streamed via helpers.render_assistant_delta.
  2. Parse actions from response: tool_calls (already collected),
     CMD: lines (via helpers.extract_cmd_lines), GOAL: complete
     sentinel (line-level exact match per R-C5).
  3. Record the assistant turn (with tool_calls if any) and log it.
     If no actions AND no goal_done → status="stalled".
  4. Dispatch tool_calls (structured route first):
       - is_destructive check on serialized call.
       - If destructive → halt_fn(proceed/skip/abort).
       - Else → auto_approve lookup; absent → halt for consent (R-C6:
         Norris is conservative; auto_approve is the only consent
         bypass).
       - On skip: synthesize role:tool turn "[aish] tool call
         skipped by user" — alternation preserved per C5/C7.
       - On abort: return status="aborted".
       - On proceed: dispatch via helpers.dispatch_tool, append
         role:tool turn with result content.
       - Argument JSON parse failure also synthesizes a tool turn
         (same alternation rationale).
  5. Dispatch CMD: lines (legacy route):
       - is_destructive check.
       - Destructive → halt_fn.
       - Non-destructive → run directly (Norris user accepted
         autonomy for non-destructive shell).
       - skip → ctx:append_exec_output "[aish] CMD skipped by user".
       - proceed → exec via helpers.exec_cmd, frame via
         render_exec_begin/end.
  6. Skip-budget escalation (R-C1): after dispatch, if
     ctx.norris_consecutive_skips >= 3 → escalation halt; abort exits,
     proceed resets counter.
  7. Goal-done check AFTER all dispatch (R-C2 / Q25 resolution).
  8. Budget check: step_n >= max_steps → status="budget_exhausted".
  9. Otherwise → status="continue", driver advances.

Helpers are passed in as injected functions rather than directly
requiring repl/renderer/executor — keeps safety.lua's coupling clean
and norris_step testable with a mocked helpers table.

State carried across iterations on the ctx:
  - ctx.norris_consecutive_skips (resets on any successful proceed)
  - ctx.norris_goal / ctx.norris_active (set/cleared by the driver)

Existing test_safety.lua corpus (87 cases) still passes — norris_step
addition doesn't touch is_destructive's behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:37:53 +00:00
marfrit d2a53d2fc7 renderer: Norris autonomous-mode frames (Phase 3 commit #3)
Phase 3 commit #3 per docs/PHASE3.md §12. Four new renderer functions
for Norris mode visual feedback.

M.norris_begin(goal)
  Bold cyan banner on Norris entry, with the goal text on a dim
  indented line. Frames the start of the planning loop.

M.norris_step(n, max_n, descr)
  Compact one-line step counter ("─ step 3/16 ─") with optional
  description. Renders before each iteration of the planner.

M.norris_halt(step_n, max_n, reason, action)
  Bold red banner when the destructive-op gate fires. Three
  indented lines: step counter, reason (red), action text
  (truncated at 400 chars, newlines collapsed). The interactive
  proceed/skip/abort prompt is shown after this banner by repl.lua.

M.norris_end(status, reason)
  Closing banner. status ∈ {"done", "aborted", "budget_exhausted",
  "stalled", "broker_error"}. Color cyan on "done", red otherwise.
  Optional reason text on a dim line.

The interactive prompt `[aish:<model> ]>` activation lands in
commit #5 (repl.lua's prompt() function).

Smoke-tested all five frames visually — clean ANSI output, correct
truncation on long action strings, color discrimination on
done/aborted/budget_exhausted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:36:44 +00:00
marfrit 2abd5da3a6 safety: LLM second-opinion + session cache (Phase 3 commit #2)
Phase 3 commit #2 per docs/PHASE3.md §12. Adds the LLM-probe gate on
top of commit #1's static patterns. Together they form is_destructive.

broker.lua extension:
  - opts.max_tokens (A2) — passed through to the request body. Phase 3
    probes cap at 4 tokens for YES/NO replies.
  - opts.timeout_ms — overrides model_cfg.timeout_ms per-call. Probe
    uses 15000ms cap regardless of the model's normal timeout
    (the user's deep model has 1800000ms for long generations; the
    probe must stay snappy).
  - M.chat now accepts an opts table (same shape as chat_stream's).
    Backwards compatible — existing callers passing (cfg, msgs)
    unaffected.

safety.lua additions:
  - llm_probe(cfg, system, cmd): single broker.chat call returning
    "YES"/"NO"/"YES_FAILSAFE"/"YES_UNPARSEABLE" — fail-safe defaults.
  - llm_second_opinion(cmd, cfg): two-probe protocol per R-B2.
    Probe 1: "Is this destructive?" — YES → flag.
    Probe 2 (only if probe 1 said NO): "Is this safe?" inverted
    question — NO → flag (disagreement = HALT).
    Both NO → safe.
  - Session-scoped cache _llm_cache keyed by normalized command
    (lowercased + whitespace-collapsed). Mitigates Q23 latency for
    repeated commands within a Norris run.
  - Model-selection precedence: cfg.safety.llm_model (explicit)
    → cfg.models.deep (independent local class) → cfg.models[default].
    Fail-safe YES if none configured.
  - is_destructive(cmd, cfg): runs static patterns first (always),
    then LLM if cfg present + not explicitly opted-out. cfg=nil
    yields static-only mode (handy for tests).

End-to-end verified against hossenfelder using qwen-coder-7b-32k as
the deep probe (qwen3-30b-a3b-instruct in repo's config.lua isn't
currently loaded on the local backend):
  cat /etc/hostname              → hit=false (LLM: NO, NO inverted = safe)
  rm /tmp/x.log                  → hit=true  (LLM flagged; static missed
                                              because no -r/-f flags)
  cp /etc/passwd /tmp/passwd.bak → hit=false (safe copy)
  cache: second probe on same cmd → 0s wall time
  static-only (cfg=nil): rm -rf /tmp/x → static hit, no LLM call
  opt-out (llm_second_opinion=false): cp x y → hit=false, no probe

Test corpus (test_safety.lua, 87 cases) still all pass — cfg=nil
preserves the static-only behavior.

Note: production config.lua currently has `deep = qwen3-30b-a3b-instruct`
which isn't loaded on the proxy backend right now; Norris users will
hit the fail-safe (everything flagged destructive) until either the
deep model is brought up OR cfg.safety.llm_model = "cloud" is set
to route the probe through anthropic/claude-haiku-4.5. Update the
config or model deployment for production use — covered by Phase 3
verify test case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:36:06 +00:00
marfrit bd59ce7243 safety: is_destructive static pattern matcher (Phase 3 commit #1)
Phase 3 commit #1 per docs/PHASE3.md §12. Static-pattern destructive-op
heuristic; no LLM second-opinion yet (lands in commit #2).

Implementation:
  - 34 patterns in DESTRUCTIVE_PATTERNS table, grouped:
      9 shell-wrapper patterns (R-B1 — bash -c / sh -c / zsh -c / eval /
        python -c / perl -e / pipe-to-sh both forms / pipe-to-bash both
        forms / xargs ... rm). HALT on the wrapper itself; user reads
        the inner before proceeding.
     10 filesystem destructive (rm -rf, find -delete, dd to device, mkfs,
        shred, wipefs, truncate -s 0, ...).
      5 version-control destructive (git push --force/-f, git reset
        --hard, git clean -fd, git branch -D).
      5 database/process (DROP TABLE/DATABASE, TRUNCATE TABLE,
        kill/pkill -9).
      2 permission (chmod 777, chown on root path).
  - ci=true flag for case-insensitive SQL patterns; rule patterns must
    be lowercase when ci is set (matcher lowercases input).
  - pkill -9 ordered BEFORE kill -9; kill rule uses %f[%w] frontier so
    "pkill -9 nginx" reports "pkill -9" not "kill -9" substring match.
  - M._patterns exposes the rule table for :safety patterns meta (Phase
    3 commit #5) and for the test corpus.
  - M.norris_step stub stays — lands in commit #4.

Test corpus (test_safety.lua, 87 cases):
  - 49 destructive cases across all categories (incl. all 11 wrapper
    forms, the canonical curl|sh end-of-string bypass, sudo-prefixed
    rm -rf, etc.).
  - 38 safe cases (read-only commands, non-destructive variants
    of risky verbs like "git push" without --force, "find" without
    -delete, "chmod 644", "kill 1234" without -9, etc.).
  - Documented one accepted false positive: echo "rm -rf /" matches
    the rm pattern by substring — Norris user can proceed after
    reading; tradeoff between false positives and false negatives,
    biased toward false positives per §5.
  - Run from repo root: `luajit test_safety.lua`. Exit 0 on pass.
  - Verified all 87 pass at commit time.

R-C4 / readline rebind, broker opts.max_tokens, LLM second-opinion,
norris_step planner, repl driver, and the wider Norris UX land in
subsequent commits per §12.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:47:10 +00:00
marfrit 125f800513 docs/PHASE3: re-review NIT fold-in — pipe-to-sh EOL, ci= note, §12 sync
Re-review surfaced one new BLOCKER + two CONCERNs + four NITs. Folded:

N1 BLOCKER: `|%s*sh%f[%s]` missed `curl x | sh` (end-of-string canonical
   wrapper-bypass — Lua's `%f[%s]` requires transition INTO whitespace,
   which doesn't happen at EOL). Replaced with two patterns each for
   sh and bash: `|%s*sh%s` (followed by whitespace/args) and
   `|%s*sh%s*$` (end-of-string). Same for bash. Verified against 18
   wrapper-bypass test cases — all canonical idioms now HALT.

N2 CONCERN: `ci=true` rule flag had no implementation note. Added one
   sentence to §5 explaining the matcher lowercases the input string
   when ci is set.

N3 CONCERN: §12 commit #5 description was stale — still said
   "extends interactive CMD: extraction to consult is_destructive"
   which contradicts the R-B3 resolution (Norris-only). Rewrote
   commit #5 description to match R-B3, and bundled the
   ffi/readline.lua `_bound[seq]:free()` removal into commit #5's
   scope with explicit "Phase 1 amendment" callout. Same for the
   §12 risk note that still referenced the dropped behavior change.

Other NITs (N4 skip threshold, N5 approved-turn mention, N6 :model
swap interaction, N7 commit-attribution wording) are cosmetic and
will fold in-flight during implement if material.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:45:25 +00:00
marfrit 91ddcb005d docs/PHASE3: review fold-in — security-layer BLOCKERs resolved
Independent review surfaced 3 BLOCKERs + 6 CONCERNs + 7 NITs against
the analyze-tier draft. Resolutions applied:

BLOCKERs:
  B1 Shell-wrapper bypass — static patterns leaked on bash -c, sh -c,
     eval, pipe-to-shell, python -c, xargs|rm. Added 9 wrapper
     patterns to §5. Norris HALTs on any wrapper invocation; user
     reads the inner before proceed. The patterns are the
     conservative floor against the wrapper bypass class.
  B2 LLM second-opinion was self-policing — same model class
     generating actions then judging them. Switched probe model
     from `fast` to `deep` (qwen3-30b). Added re-roll inversion:
     if first probe says NO, ask "is this SAFE?". Disagreement
     between two probes → HALT. Cheap independent-class insurance.
  B3 `is_destructive` would have run on interactive CMD: extraction
     — a PHASE0 §6/§10 substrate amendment in disguise. Resolved
     Q24: heuristic runs ONLY when norris_active == true. No
     substrate change; interactive `confirm_cmd` semantics unchanged.

CONCERNs:
  C1 Skip-budget: consecutive_user_skips counter; 3+ similar skips
     escalate to abort/force-proceed prompt.
  C2 Algorithm-vs-Q25-resolution contradiction: §4 reordered to
     dispatch ALL pending actions before checking GOAL: complete.
  C3 Norris-goal eviction: goal embedded directly in the dynamic
     system-prompt suffix; survives sliding-window eviction.
  C4 Readline use-after-free window: M.bind no longer frees old
     callbacks; pin for process lifetime (bounded memory cost).
  C5 GOAL: complete matcher: line-level scan, exact match after
     trim — substrate-aligned with CMD: rigor.
  C6 §4 step 4 tightened: auto_approve does NOT bypass destructive
     heuristic; tool_call without auto_approve still HALTs even
     when destructive-clear (Norris conservative).

NITs deferred or rolled into pattern table:
  - chown root-path pattern tightened (NIT 2 in-line)
  - Test corpus expansion noted in §12 commit #1 risk
  - Other NITs are wording-level

Status: Plan (review folded). Ready for commit #1 (safety static
patterns) once another review pass clears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:42:58 +00:00
marfrit cf4d79dd9d docs/PHASE3: analyze + baseline — \C-n mechanics, LLM latency, module pre-state
Analyze findings folded into the manifest:

  A1. \C-n binding can't toggle mid-prompt without rl_insert_text /
      rl_redisplay. Solution: bind those (one cdef + 2 wrappers in
      ffi/readline.lua) so \C-n inserts ":norris " at the cursor; user
      types goal + Enter. Routes through existing meta dispatch.

  A2. broker has no max_tokens passthrough. Add opts.max_tokens for
      the LLM second-opinion path (terminates at ~2 tokens; verified
      proxy honors it).

  A3. Phase 2 tool-sub-loop pattern IS the planner shape. safety.norris_step
      is the per-iteration extraction; driver loop in repl.lua.

Module-changes table (§3) updated with the rl_insert_text and
max_tokens rows.

Baseline doc (PHASE3-baseline.md, 80 lines) captures:
  - LLM second-opinion latency: 425-1162ms per probe, all 5 test
    cases correct. Worst-case 16-step Norris = ~20s overhead; with
    static-pattern fast-path + session cache, ~5s realistic.
  - Module pre-state at commit f26cbd9 (Phase 2 tip): LOC + state
    per file before Phase 3 edits.
  - Six static-pattern Lua-match sanity checks (all correct).
  - Carries: aish#15 (still open), aish#14, aish#32/#33.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:37:58 +00:00
marfrit b58a842e49 docs/PHASE3: formulate — Norris autonomous mode + destructive-op gate
Phase 3 formulate manifest. Three pillars per PHASE0.md §11 row 3:
Chuck Norris autonomous mode (planning loop), destructive-op heuristic
(static patterns + LLM second-opinion), and HALT/confirm protocol.

Resolutions baked in via §2:
  Q2  iterative re-plan after each action (not top-down tree)
  Action sources    CMD: lines AND MCP tool_calls — Phase 2 contract honored
  HALT trigger      static-pattern hit OR LLM-second-opinion flag
  HALT shape        3-way: proceed / skip / abort
  Auto-approve under Norris  honors Phase 2 auto_approve policy
                             EXCEPT destructive-op heuristic always wins
  LLM second-opinion model   the `fast` preset (cheapest)
  Norris prompt suffix       appended to system prompt while active;
                             "GOAL: complete" sentinel for done

Key extensions:
  - safety.is_destructive: ~20 static shell-idiom patterns + LLM probe;
    runs on interactive CMD: extraction too (§9 — replaces bare
    confirm_cmd for known-destructive cases). Q24 worth challenging
    at analyze.
  - safety.norris_step: single-iteration of the planner. Driver loop
    in repl.lua. \C-n toggle (real binding, replaces Phase 1
    placeholder); :norris <goal> explicit launch.
  - renderer.norris_begin/step/halt/end: visual parity with exec
    and tool_call frames. Prompt becomes [aish:fast ]> per
    PHASE0.md §9.
  - context.to_messages dynamically appends NORRIS MODE suffix
    when norris_active.

New open questions (Q23–Q30) tracked in §11:
  Q23 LLM second-opinion latency budget (caching mitigation)
  Q24 interactive CMD: also subject to is_destructive? (proposal: yes)
  Q25 GOAL: complete + pending actions in same response — dispatch first
  Q26 context preservation on abort/done/budget — all preserve
  Q27 :norris continue (resume after abort) — deferred to v2
  Q28 side-effect MCP tools not in *__shell/*__write_file patterns
  Q29 goal-implies-authorization for destructive ops — no, always confirm
  Q30 :norris no-arg vs \C-n share goal-prompt path — yes, trivial

Module-layout (PHASE0 §4) untouched — all changes are growth of
existing files. 6 commits expected at implement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:45:03 +00:00
marfrit f26cbd9a3a phase2 amend: __ separator (Bedrock-safe) + post_sse error diagnostics
Phase 7 verify finding from TC #26 against :model cloud:
  HTTP 400 from openrouter→Amazon Bedrock:
  "tools.0.custom.name: String should match pattern
   '^[a-zA-Z0-9_-]{1,128}$'"

Anthropic via Bedrock validates tool names against that regex and
rejects dots. PHASE2 originally chose "." as the namespace separator
("boltzmann.list_dir"); OpenAI tolerated it, Bedrock does not.

Separator switched to "__" (two underscores) everywhere — internal
API matches on-wire shape, no transformation layer:

  - repl.lua:
    - tools_schema builds "alias__name"
    - dispatch_tool_call splits via "^(.-)__(.+)$" (non-greedy → leftmost __)
    - :mcp tool parser uses same split
    - :mcp tools formatter prints "alias__name"
    - HELP block shows <alias__name>
  - safety.lua confirm_tool_call: alias.* glob → alias__* glob
  - config.lua example block: keys rewritten
  - docs/PHASE2.md: amendment header added; §1, §2 row, §3 config.lua
    row, §5 wire-shape JSON examples, §6 auto_approve schema, §7
    meta-cmd table, §12 plan all updated. Original "." references
    preserved in commit history.

Constraint: aliases must not themselves contain "__" so the parse
stays unambiguous. Tool names from MCP servers may have underscores
freely.

Second fix bundled — uninformative broker error:
  Previously "broker error: transport: HTTP response code said error"
  Now      "broker error: transport: HTTP 400: {full body snippet}"

ffi/curl.lua M.post_sse changes:
  - FAILONERROR no longer set (was hiding the response body).
  - raw_body accumulator added alongside the SSE buffer; captures
    every byte regardless of SSE shape.
  - After perform, check status_code via curl_easy_getinfo. On >=400,
    return (nil, "HTTP <code>: <body[:400]>"). 2xx unchanged.
  - End-of-stream SSE flush only runs on 2xx (no false event on
    error bodies that aren't SSE-shaped).
  - Phase 1 callers reading just first return slot stay correct.

End-to-end verified:
  - :model cloud + tools=[boltzmann__read_file ...] +
    "Use boltzmann__read_file with path=/etc/hostname" →
    Claude emits tool_call with name="boltzmann__read_file",
    args='{"path": "/etc/hostname"}'. ok=true, transport clean.
  - Force-bad tool name "bad.name.with.dots" → err string carries
    the full bedrock 400 with the regex-pattern message visible.

TC #26 (sub-loop end-to-end) is now testable against cloud — the
error that blocked it is resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:04:57 +00:00
marfrit 3fa6279f5b repl: :mcp tool — disambiguate "no alias" vs "unknown alias" errors
Surfaced by Phase 7 verify test case #29: typing :mcp tool list_dir (no
dot) printed "unknown alias: nil" instead of a useful diagnostic. The
parse failure was being conflated with the alias-not-found case.

Now:
  :mcp tool list_dir          -> tool name missing alias prefix: list_dir
  :mcp tool unknown_alias.x   -> unknown alias: unknown_alias
  :mcp tool known_alias.bogus -> unknown tool: known_alias.bogus

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 18:55:01 +00:00
marfrit 09800d192a config: Phase 2 mcp example block + deep model switch
Phase 2 commit #7 (final) per docs/PHASE2.md §12. Two changes bundled:

(1) commented-out mcp = {...} example block (~40 lines) at the end of
    config.lua showing the Phase 2 schema:
      - mcp.servers — alias → {url, auth_token | auth_env}
      - mcp.auto_approve — "<alias>.<tool>" or "<alias>.*" globs
      - mcp.max_tool_depth — sub-loop budget per ask_ai turn
    The block is OFF by default; uncomment + adjust per fleet to
    activate. Documentation-only; no behavior change to existing
    configs (mcp_sessions stays empty, tools_schema() returns [],
    broker omits the field — full Phase 1 compatibility).

(2) User-authored: deep model preset switched from
    mistral-nemo-12b-instruct to qwen3-30b-a3b-instruct, with a 10-min
    timeout_ms accommodating the larger model's RK3588 inference time.
    Reason: nemo backend is dormant per the proxy /v1/models discovery
    (aish#23 now returns 404 cleanly for unknown models instead of
    silent fallback); qwen3-30b is the practical "deep" alternative.

Phase 2 implementation is now complete — 7 of 7 commits landed:
  #1 6c194de  mcp.lua + ffi/curl status_code + PHASE0 §4 amendment
  #2 0fde77f  safety.lua confirm_tool_call
  #3 7c221a8  context.lua tool turns + use_tool_role fallback
  #4 c736d0e  renderer.lua tool-call frames
  #5 efdc728  broker.lua opts.tools + tool_call accumulator
  #6 7e9cfff  repl.lua sub-loop + :mcp meta + system-prompt block
  #7 (this)   config.lua example + deep model switch

Next phase-loop step: verify (Phase 7). Files written are wired and
isolated-tested; end-to-end model-driven verification waits on either
a more compliant model or explicit forcing of tool_calls from the
prompt — known to be marginal with the loaded qwen-1.5b but proven
correct against direct probes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:40:21 +00:00
marfrit 7e9cfff04d repl: tool-call sub-loop + :mcp meta + system-prompt augmentation
Phase 2 commit #6 per docs/PHASE2.md §12. End-to-end wiring of the MCP
tool-call flow on top of broker/safety/context/renderer/mcp.

repl.lua additions:
  - mcp_sessions table populated from config.mcp.servers at startup.
    connect_mcp() helper does initialize + caches tools/list. Failures
    status-logged once; absent from mcp_sessions until manual reconnect
    (C4 — no auto-retry).
  - tools_schema() flattens connected sessions' tools into the OpenAI
    {type:"function", function:{name,description,parameters}} shape with
    "<alias>.<name>" namespacing.
  - flatten_content() concatenates content[type="text"] blocks; one-shot
    status warning when non-text blocks (image/resource) are dropped
    (§4 normative spec, v1 only handles text).
  - dispatch_tool_call(name, args_table) splits alias.tool, looks up
    session, calls. Returns (content_string, is_error). Errors of every
    flavor (missing alias, no session, rpc_error, transport_error)
    yield a synthesized "[aish] ..." string so callers always have a
    body for the role:"tool" turn — alternation preserved per C5/C7.
  - ask_ai rewritten as a sub-loop that re-issues the broker request
    until the model returns pure text or max_tool_depth (default 8) is
    hit. Each iteration: stream response → if tool_calls present,
    confirm-gate each → dispatch → append role:"tool" turn → continue.
    Argument-JSON parse failure produces a synthesized tool turn (C7).
    Decline at confirm produces "[aish] tool call declined by user"
    tool turn (alternation guarantee).
  - :mcp meta with sub-commands: list / tools / tool <a.n> / connect
    <url> [alias] / disconnect <alias>. HELP block extended.

context.lua: DEFAULT_SYSTEM_PROMPT grows by ~4 lines per PHASE2.md §8
(hybrid prompt: static frame about MCP + dynamic tools list in the
request body). Block is always present even when no MCP servers
configured — ~60 tokens for clarity that 'CMD:' remains the fallback.

CMD: extraction unchanged — runs on the FINAL pure-text response only
(not on intermediate iterations of the tool sub-loop). Substrate §3
invariant preserved.

End-to-end verified two ways:
  (1) Direct broker probe: aish's tools_schema fed through
      broker.chat_stream against hossenfelder → qwen-1.5b emits one
      tool_call payload with correct id + name="boltzmann.list_dir"
      + args='{"path":"/tmp"}'. Accumulator stitched the JSON-string
      across fragmented deltas.
  (2) Mocked-broker sub-loop test: ask_ai feeds 'list /tmp', mock
      emits text + tool_call, sub-loop dispatches against LIVE
      boltzmann lmcp (auto_approve via policy), 80+ files rendered
      inside the tool_call frame, broker re-invoked with the
      extended context, mock returns pure text, sub-loop terminates.
      Total broker invocations: 2.

Known: the loaded fast model (qwen-1.5b) tends to emit "CMD: ..."
suggestions even when an MCP tool is the better path; the small
model's system-prompt compliance is weak. Larger models and the
analyze-time direct probe confirm the tools_schema and tool_calls
flow is wire-correct — Phase 7 verify will exercise this against
qwen3-30b or cloud models when available.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:20:42 +00:00
marfrit efdc7281c7 broker: opts.tools passthrough + streaming tool_call accumulator
Phase 2 commit #5 per docs/PHASE2.md §12. Streaming broker grows
tool-call support without taking a dependency on mcp.lua (caller
supplies the tools array — B5 from review).

chat_stream signature widens to (cfg, msgs, on_delta, opts):
  opts.tools  - optional array, passed to the request body as the
                OpenAI-shape tools field. OMITTED entirely when nil or
                empty (#tools == 0) — some servers reject "tools": [].

on_delta callback shape widens to (kind, payload):
  kind = "text",      payload = string         (Phase 1 path; unchanged
                                                semantics, signature
                                                changes from (delta) to
                                                ("text", delta))
  kind = "tool_call", payload = {id, name, arguments}
                                                emitted ONCE per call on
                                                finish_reason "tool_calls"
                                                after the streaming
                                                accumulator pulls
                                                fragmented JSON-string
                                                arguments together.

Accumulator behavior:
  - Keyed by delta.tool_calls[i].index.
  - If index is absent on a delta (some llama.cpp builds omit it on
    single-call streams; C2 in review), default to 0 with a one-shot
    stderr debug status per stream.
  - id and name captured from the opening delta of each slot.
  - function.arguments concatenated across all deltas as the raw
    JSON-string; caller (repl.lua / future Phase 2 commit #6) does
    dkjson.decode.
  - On finish_reason "tool_calls" the accumulator emits all collected
    calls in index order and resets.

M.chat external contract unchanged (C1): wrapper now uses the new
(kind, payload) shape internally but exposes the same text-string
return. No caller of M.chat passes opts.tools so tool_call kinds are
silently dropped.

repl.lua minimal companion edit: ask_ai's chat_stream callback updated
to the new shape. Text path unchanged; tool_call kinds are no-op
placeholders until commit #6 lands the sub-loop. Keeps Phase 1 streaming
functional between #5 and #6.

Smoke-tested against hossenfelder/8082 (post-#23 fix):
  - text-only: ok=true, kind="text" deltas received
  - with opts.tools: model emitted one tool_call,
    accumulator collected id + name=get_weather + args={"city":"Paris"}
    correctly across fragmented deltas
  - opts.tools={}: server accepted (field omitted as required)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:20:32 +00:00
marfrit c736d0e129 renderer: tool-call begin/end frames
Phase 2 commit #4 per docs/PHASE2.md §12. Adds M.tool_call_begin(name, args)
and M.tool_call_end(content, is_error) for visual parity with the existing
exec_begin/exec_end frame.

Visual cadence:
  ─── tool: <name (cyan)> ───
  <args, dim, truncated at 200 chars; omitted if empty/"{}">
  <content>
  ─── ok ───            (dim, success)
  ─── error ───         (red status word inside dim rule, on is_error=true)

Same rule glyph (━) and ANSI palette as the exec frame so the user reads
tool dispatch and shell dispatch the same way.

Smoke-tested all five shapes: success with args / empty args / error /
long args truncated / empty content.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 14:11:42 +00:00
marfrit 7c221a8aae context: tool turns + tool_calls on assistant; use_tool_role fallback
Phase 2 commit #3 per docs/PHASE2.md §12. Three concrete edits per §3
context.lua row (the BLOCKER-fold-in from review):

  (a) Loosen Context:append shape-per-role: assistant may carry empty
      content if tool_calls is non-empty; role:"tool" requires
      tool_call_id + content.

  (b) Preserve tool_calls / tool_call_id on store (Phase 1 :append
      built {role, content} only and silently dropped extras).

  (c) Extend to_messages() with two emission modes selected by
      use_tool_role:
        true  (default) — OpenAI-standard role:"tool" + assistant
          turns with tool_calls (wrapped as {id, type:"function",
          function:{name, arguments}}).
        false (fallback) — collapse assistant-with-tool_calls + its
          following role:"tool" turns into a single assistant text
          turn with synthesized "[tool: name]\n<args>\n[result]\n
          <content>" body; merge consecutive assistant turns so the
          trailing post-tool-result text doesn't yield asst/asst
          back-to-back (same strict-template gotcha PHASE0.md §6
          warned about for user/user).

Alternation assert added (N4): role:"tool" turns must trace back
through zero-or-more prior tool turns to an assistant-with-tool_calls.
Catches sub-loop bugs at append time. Orphan tool turns rejected.

pending_exec_output behavior unchanged per §3 row: buffer persists
across tool-call sub-loops, flushes on next genuine user turn (B4).

Smoke-tested §12 verify-row #3:
  (i)   default mode round-trip — 5 OpenAI-shape messages, tool_calls
        + tool_call_id preserved.
  (ii)  fallback mode round-trip — collapsed into 3 messages
        (system/user/assistant), tool_calls + role:"tool" not emitted.
  (iii) multi-call: 2 tool_calls in one assistant turn followed by 2
        tool replies, both modes render correctly.
  (iv)  orphan tool turn after user — assertion fires.
  (v)   B4: pending_exec_output survives a tool sub-loop, flushes on
        next :append_user.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:10:47 +00:00
marfrit 0fde77fe35 safety: confirm_tool_call gate with auto-approve policy
Phase 2 commit #2 per docs/PHASE2.md §12. Implements just the per-call
confirm-gate surface; Phase 3 stubs (is_destructive, norris_step) stay
unimplemented with their error() bodies.

M.confirm_tool_call(name, args, cfg) checks cfg.mcp.auto_approve for:
  - exact match on "<alias>.<tool>"
  - "<alias>.*" glob covering a whole server

Miss falls back to a [y/N] readline prompt. Empty or non-"y" answer
rejects (matches the existing confirm_cmd UX from PHASE0 §10).

Pretty-printing renders args as compact JSON, truncated at 80 chars
with "..." suffix so one-line prompts stay readable.

Smoke-test passes all eight cases per §12 verify-row #2:
  exact match / alias glob → auto-approve, no prompt
  miss + y / n / empty / nil-cfg → prompt shown, expected verdict
  empty args / long args → clean rendering, truncation works

Note: PHASE0 §4 module-layout had a "lands in Phase 2" hint on the
norris_step stub; the actual landing is Phase 3 per PHASE0 §11 row 3.
Comment in safety.lua updated to clarify.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:07:57 +00:00
marfrit 6c194deea0 mcp: JSON-RPC client + ffi/curl status_code; PHASE0 §4 amended
First commit of Phase 2 per docs/PHASE2.md §12. Three changes bundled:

mcp.lua (new, 153 lines):
  - M.connect(url, opts) returns a Session.
  - Session:initialize() round-trips initialize + notifications/initialized
    + tools/list. Caches tools for session lifetime (lmcp announces
    capabilities.tools.listChanged = false; no refetch).
  - Session:list_tools() returns the cached tool list.
  - Session:call_tool(name, args) returns (result_table, kind) where
    kind ∈ {"ok", "handler_error", "rpc_error", "transport_error"} per
    the §4 error split. Folded HTTP-level failure into transport_error.
  - Per-server Bearer auth via opts.auth_token or opts.auth_env env-var
    indirection.
  - Captures protocolVersion mismatch as a warning string rather than
    aborting (lmcp doesn't negotiate — N3 in review).

ffi/curl.lua extension:
  - Add curl_easy_getinfo to ffi.cdef.
  - Pre-cast as getinfo_long; helper get_response_code() fetches
    CURLINFO_RESPONSE_CODE (decimal 2097154 = CURLINFOTYPE_LONG | 2).
  - M.post now returns (body, status_code) on transport success;
    (nil, errmsg) on libcurl failure stays unchanged. Phase 1 callers
    reading only the first slot are unaffected.

docs/PHASE0.md §4:
  - Insert `mcp.lua` between broker.lua and router.lua per PHASE2.md §9.
  - Module-stability invariant clarified: rename prohibition is what
    matters; adding new files is additive.

Smoke-test passes for all four kinds against boltzmann lmcp v0.5.4:
  - initialize: ok (7 tools cached)
  - list_dir /tmp: ok (1.2KB content)
  - read_file /nonexistent: ok (boltzmann's baseline §3 quirk —
    isError:false even on failure; content is authoritative)
  - nope_tool: rpc_error (code=-32601)
  - wrong auth: transport_error (HTTP 401)
  - unreachable host: transport_error (DNS failure)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:06:39 +00:00
marfrit f5daa6afc0 docs/PHASE2: re-review NITs — M.post shape, getinfo cdef, content flattening normative
Three follow-up NITs from the post-fold-in review:

  (1) Disambiguate M.post return shape: (body, status_code) on transport
      success regardless of status; (nil, errmsg) on libcurl failure
      stays unchanged. Phase 1 callers reading only the first slot are
      unaffected.

  (2) Note that the M.post extension requires extending ffi.cdef to
      include curl_easy_getinfo + CURLINFO_RESPONSE_CODE (decimal
      2097154, CURLINFOTYPE_LONG | 2) and a long[1] out-param shim.
      Implementation detail the commit #1 author will need.

  (3) Move the tool-result content-flattening rule from §12 risk note
      into §4 normative spec (forward-referenced both ways) — §4 is
      where a future reader looking for the tool-invocation contract
      will scan.

No design changes; clarifications only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:02:35 +00:00
marfrit d3570ccea4 docs/PHASE2: review fold-in — 5 BLOCKERs + 7 CONCERNs + key NITs
Independent review of the formulate+analyze+plan draft surfaced design
gaps that would have shipped as silent bugs. Resolutions applied:

BLOCKERs:
  B1 context.lua impact widened — Phase 1 :append asserts content and
     discards extra fields. Need (a) shape-per-role assert, (b) preserve
     tool_calls/tool_call_id on store, (c) emit from to_messages().
  B2 ffi/curl.M.post extended to return (body, status_code). lmcp's
     401 returns a non-JSON-RPC body that would have been mis-decoded.
  B3 §3 typo schema -> inputSchema.
  B4 pending_exec_output × tool-call sub-loop interaction specified.
  B5 §3/§12 broker dependency contradiction — broker takes opts.tools
     from caller; no layering inversion.

CONCERNs:
  C1 M.chat return polymorphism dropped (no consumer).
  C2 tool_calls[].index absent fallback: default to 0.
  C3 Re-injection stores accumulated text, not hard-coded empty.
  C4 :mcp connect failure: no auto-retry, status-log once.
  C5/C7 JSON-RPC error AND argument-parse failure both synthesize a
     role:"tool" turn — keeps strict-template alternation legal
     exactly the way PHASE0 §6 demanded for exec output.
  C6 §9 confirms §4 amendment is additive (preserves §3 invariant).

NITs:
  N3 protocolVersion fallback (lmcp doesn't negotiate).
  N4 Alternation assert in Context:append.
  N7 Model-routing bug filed as aish#23.
  N8 Day-one fallback test for use_tool_role=false in commit #3.

Manifest status: Plan (review folded). Status line and Resolutions
sections updated; commit-by-commit roadmap reflects revised specs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:00:07 +00:00
marfrit 447e430254 docs/PHASE2 §12: implementation plan — 7-commit roadmap
Bottom-up: mcp.lua → safety.lua → context.lua → renderer.lua → broker.lua
→ repl.lua → config.lua. Same cadence as Phase 0/1.

Risks called out explicitly:
- Empty tools array → omit field entirely (some servers reject [])
- isError:false on actual failure (baseline §3 finding) → pass content
  through regardless; let model read error text
- JSON-RPC error from tools/call → aish status only, no tool turn
  appended, no model recovery
- max_tool_depth=8 cap on tool-call sub-loop
- Argument JSON streaming may yield malformed JSON → status warn + skip
- Q18 fallback (use_tool_role=true default; prefix-injection plumbed
  but dead-coded; verify can flip)
- Connect-at-startup is sequential (~30ms × N); fine for N≤3

Two items left open for review: Q18 default flip vs ship-true-flip-on-fail,
and whether :mcp connect should re-fetch tools after the initial cache.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:37:27 +00:00
marfrit c5116bf129 docs/PHASE2-baseline: pre-implementation measurements
Phase 7 (verify) anchor. Captures:

- MCP RPC round-trip timings against boltzmann lmcp v0.5.4 (all sub-100ms
  on LAN; LLM is the latency floor, not the transport).
- 6 fixture responses saved to /tmp/aish-baseline/ covering initialize,
  notifications/initialized, tools/list, tools/call success, isError,
  and JSON-RPC unknown-tool error.
- Baseline design finding: boltzmann's read_file returns isError:false
  even on failure (error text in content). aish should treat content as
  authoritative, isError as advisory; feed both to the model. PHASE2.md
  §4's "pass-through" stance already accommodates; no manifest amendment
  needed.
- Streaming tool_calls delta shape verified against hossenfelder; matches
  PHASE2.md §5.
- Pre-MCP aish behavior snapshot: loaded model emits markdown code-fence
  ignoring the CMD: contract — once MCP tools exist the model gets a
  structured path that doesn't depend on prose-formatting compliance.
- Module pre-state at Phase 1 head 5878f73: LOC + capability snapshot
  per module so Phase 2 diff has a reference frame.
- Two boltzmann-proxy blockers (SSE buffering, model-field routing)
  carried explicitly into Phase 7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:34:32 +00:00