marfrit/aish - aish - marfrit's space

Author	SHA1	Message	Date
marfrit	81c3b1b44a	main: non-interactive `-p`/`--prompt` one-shot mode (closes #4 ) Adds `aish -p "<text>"` for Unix-pipeline composability: tail app.log \| aish -p "any anomalies?" aish -p "summarize: $(curl -sS https://...)" The flag bypasses repl.lua entirely. On invocation: 1. Stdin: when not a TTY, read to EOF and prepend to the prompt as a fenced block. ffi.libc.isatty(0) gates the read so interactive `aish -p "..."` (no pipe) doesn't hang. 2. Resolve config.models[config.default_model]. 3. Stream broker.chat_stream replies to stdout; finalize with newline. 4. Exit 0 on success, 1 on broker error, 2 on arg / config error. Behavior NOT in -p mode (kept simple per the issue's "no repl.lua involvement"): - No MCP, no tool loop, no Norris, no routing, no memory injection. - "CMD:" lines in the reply are printed verbatim, NOT executed — callers can grep / pipe them as they wish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:06:27 +00:00
marfrit	0700dce881	repl: enforce budget per Norris step, not just post-loop (closes #51 ) PHASE3.md §2 specifies sliding-window eviction "including mid-Norris- session if the loop runs long". Implementation only called enforce_budget() once, after the planning loop exited — so for a tight max_turns with a multi-step Norris session the model saw the FULL conversation throughout, defeating context budgeting and preventing R-C3 (NORRIS suffix goal anchor surviving eviction) from being exercised end-to-end. Move status_evictions(ctx:enforce_budget()) inside the while loop so it runs after every safety.norris_step return. Drop the now-redundant post-loop call. Surfaced during TC #38 (Qwen3-30B-A3B, max_turns=4) where the "oldest 4 turns evicted" status arrived AFTER NORRIS DONE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:05:34 +00:00
marfrit	0c93e31186	repl: warn on stale MCP auto_approve keys (closes #33 ) Auto-approve policy keys that point at unconnected aliases, mistyped tool names, or malformed forms were silently ignored — leaving the user with surprise confirm prompts and no diagnostic. validate_auto_approve() now walks config.mcp.auto_approve at startup (after the MCP connect loop) and after each :mcp connect. For each key: - "alias__*" — warn if alias has no live session - "alias__tool" — warn if alias unknown OR tool not in registry - anything else — warn as malformed (not in alias__tool form) Non-fatal. The re-run on :mcp connect lets a key that referenced a not-yet-connected alias become live without a restart. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:05:08 +00:00
marfrit	299dcce78f	repl: validate MCP tool names against Bedrock regex (closes #32 ) Anthropic-via-Bedrock enforces ^[a-zA-Z0-9_-]{1,128}$ on tool names. We already moved the alias separator from "." to "__" (commit `f26cbd9`), but a future MCP server could still register a tool whose name (or whose combination with the alias) contains characters outside that class — silently breaking calls to strict providers. connect_mcp now warns at startup for: - aliases containing "__" (would misparse on tool dispatch) - emitted alias__name strings that violate the regex or exceed 128 chars Behavior preserved: validation is informative-only. tools_schema() still emits the offending tool; local llama.cpp users accept lenient names and shouldn't be penalized for downstream strictness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:04:29 +00:00
marfrit	8e0e735e15	repl: fallback patterns — add 'Could not connect to server' (CURLE_COULDNT_CONNECT) Surfaced by autonomous run of TC #48: pointing models.fast at http://localhost:9999 (port closed, host resolves) emits "transport: Could not connect to server" — CURLE_COULDNT_CONNECT (7) which the Phase 5 fallback pattern set didn't include. Added "Could not connect to server" to FALLBACK_PATTERNS in repl.lua. Now fallback fires for the full set of common libcurl/HTTP transport failure shapes: HTTP 5xx server-side HTTP 404 model_not_found HTTP 408 gateway request timeout Couldn't resolve host CURLE_COULDNT_RESOLVE_HOST Could not connect to server CURLE_COULDNT_CONNECT (← added) Connection refused Timeout was reached CURLE_OPERATION_TIMEDOUT (variant A) Operation timed out CURLE_OPERATION_TIMEDOUT (variant B) Re-tested #48 end-to-end: fast pointed at dead port → fast fails → status fires → cloud (anthropic/claude-haiku-4.5 via openrouter) responds normally Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:49:13 +00:00
marfrit	d72689f709	config: deep model → deepseek-coder-v2-lite (temporary) qwen3-30b-a3b-instruct isn't loaded on hossenfelder right now (per /v1/models). deepseek-coder-v2-lite IS loaded — 16B MoE with ~2.4B active params; fast enough that the 30-min timeout from the qwen3-30b config was wildly over-budget. Switched to deepseek-coder-v2-lite for the time being. Restore qwen3-30b when the slot is back up. Live-probed: YES/NO destructive probe via the deep model preset returns "YES." in ~4.8s — well within the new 5-min timeout, and fast enough that the Phase 3 LLM second-opinion path is now functional again without falling back to "fail-safe YES" on every ambiguous command. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:42:23 +00:00
marfrit	a9b39cd435	config: Phase 5 routing + summarize-on-evict example (commit #5 ) Phase 5 commit #5 (final) per docs/PHASE5.md §11. Documentation-only; commented-out example showing: - routing.auto (per-request auto-routing toggle) - routing.classes (class → model mapping; reasoning = nil by default per R-N2 cost-safety) - routing.fallback (single-hop retry to cloud on transport fail) - routing.fallback_model (default "cloud" if uncommented) - context.summarize_on_evict + summarizer_model + max_summary_chars (shown INSIDE the context = {...} block above) All defaults OFF — Phase 5 is opt-in across the board. Existing configs without `routing` or `context.summarize_on_evict` behave identically to Phase 4. Phase 5 implementation complete: #1 `3e57824` router.classify_model + 31-case corpus #2 `03497b5` context summarize_fn callback + summary block in to_messages #3 `40ea0b4` repl routing + fallback + summarize_fn wiring + :route/:fallback #4 - (bundled into #3 since meta cmds are trivial additions) #5 (this) config example block Phase 5 verify-partial: - router.classify_model: 31/31 case corpus passes - context summarize-on-evict: mock callback fires correctly (additive + compress paths), summary suppressed under Norris, :reset clears it - repl meta cmds: :route on/off/classes/check + :fallback on/off all work; :route check reports class + "routing currently disabled" suffix when auto is off (N1) Verify-pending: end-to-end with real broker (route a code question, see it land on deep; kill local backend, see fallback fire to cloud). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:32:20 +00:00
marfrit	40ea0b49b0	repl: routing + fallback + summarize_fn wiring (Phase 5 commit #3 ) Phase 5 commit #3 per docs/PHASE5.md §3 / §11. Wires the Phase 5 machinery into the REPL. make_summarize_fn(): Returns a closure that maps (prior_summary, evicted_turns) onto a broker.chat call against cfg.context.summarizer_model (default "fast"). Three dispatch paths matching the R-B1 callback contract: evicted == nil → compress signal prior present → additive ("extend the prior summary ...") prior nil → first-time ("summarize the following turns") All use a system prompt enforcing "exactly one short paragraph", max_tokens=300, timeout_ms=30000. Broker failure returns nil so Context falls back to silent eviction. Renderer status is logged on failure for visibility. Context construction: Build ctx_opts as a fresh table (copies config.context to avoid mutating it), adds summarize_fn ONLY when config.context.summarize_on_evict == true. Defaults stay OFF — Phase 4 regression coverage. Fallback machinery: - FALLBACK_PATTERNS table with 7 transport-error signatures (HTTP 5xx, 408, 404-model_not_found, DNS, connection refused, "Timeout was reached", "Operation timed out") - fallback_reason(err) strips the "transport: " prefix and matches. - should_fallback(err) gates on cfg.routing.fallback. - call_broker(cfg, name, msgs, on_delta, opts) wraps broker.chat_stream: • tracks any_delta via wrapped on_delta callback • retries ONCE against cfg.routing.fallback_model (default "cloud") when err matches AND no deltas arrived (N3: mid-stream failures aren't retried — partial text would duplicate) • emits "[aish] local <name> failed (<reason>); retrying via <fb>" status before the retry call ask_ai routing: - Routing decision taken ONCE on entry (R-C2). req_name/req_cfg locals carry the choice through every tool-sub-loop iteration. - active_name/active_cfg are NOT mutated — user's :model selection survives the request. - When config.routing.auto is true, classify_model(text, config) is invoked. Non-nil model + non-active → swap req_cfg + status line. - broker.chat_stream call replaced with call_broker (fallback wrap). Meta cmds: :route on/off — toggle cfg.routing.auto at runtime :route classes — show class → model mapping :route check <text> — report classify_model result with "(routing currently disabled)" suffix when auto is off (N1) :fallback on/off — toggle cfg.routing.fallback at runtime HELP updated with the four new commands. Smoke-tested: aish boots, all four metas behave correctly, classify_model returns reasoning class for "Explain how MMAP works on Linux" (the model slot is nil because no classes are configured by default — N2 cost-safety). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:31:14 +00:00
marfrit	03497b5eea	context: summarize-on-evict callback + summary block (Phase 5 commit #2 ) Phase 5 commit #2 per docs/PHASE5.md §3 / §6. Context.new opts additions: - summarize_fn(prior_summary, evicted_turns) -> string\|nil callback per R-B1 canonical signature: (nil, [turns]) → first-time summarize (str, [turns]) → additive: extend prior summary (str, nil) → compress: re-summarize the prior nil return → silent eviction (Phase 0 behavior preserved) - max_summary_chars (default 2000) — when ctx.summary grows past this, the callback is invoked AGAIN with the compress signal so the summary stays bounded across long sessions Context.summary (string\|nil) is the rolling summary state. Composed into the SYSTEM MESSAGE (not as a turns[] entry — A3 resolution avoids system/system back-to-back). compose_summary() emits: [earlier conversation summary] <ctx.summary> between [background] and the NORRIS suffix. Both [background] and [earlier summary] are SUPPRESSED when ctx.norris_active (R-C4 — mirrors R-C1 from Phase 4; planner stays focused on its goal). enforce_budget() rewrite: - Collects the evicted pair before removing. - Calls summarize_fn(self.summary, pair) under pcall — wraps any callback error so a broken summarizer can't crash the REPL. - Updates self.summary if callback returned non-empty string. - If new summary exceeds max_summary_chars, invokes compress pass (callback with evicted=nil). - Removes pair from turns (same final state as Phase 0). Context:reset() clears the summary alongside turns + pending_exec_output. Smoke-tested with a mock summarizer over a 10-turn context with max_turns=4 and max_summary_chars=80: - 6 turns evicted to bring count down to 4 - Callback fired 4 times (3 additive + 1 compress when summary crossed 80 chars) - to_messages includes [earlier conversation summary] block - Under norris_active=true, summary suppressed (block absent) - :reset clears ctx.summary Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:18:37 +00:00
marfrit	3e57824684	router: classify_model heuristic + 31-case corpus (Phase 5 commit #1 ) Phase 5 commit #1 per docs/PHASE5.md §11. Pure-Lua per-request model routing — no IO, no LLM probe in v1. router.classify_model(text, cfg) -> (model_name \| nil, class_label): 1. classify_class(text) walks heuristics in priority order: code class: - triple-backtick fence anywhere - "traceback" / "stacktrace" / "stack trace" (ci) - "error:" / "exception:" in first 60 chars (ci) - path-with-code-extension token (.py/.lua/.c/.js/.go/.rs/.cpp/.h/.ts) - 5+ lines with indented content (looks like a paste) reasoning class (requires text >= 15 chars to skip bare keywords): - "explain" / "why " / "how does" / "compare" (ci) - "?" + length > 100 chars default class: everything else 2. Map class via cfg.routing.classes[class] → model name (or nil = keep current). 3. Return (model_name_or_nil, class_label). ALWAYS evaluates regardless of cfg.routing.auto — caller (repl.ask_ai in commit #3) gates on the flag. This separation lets `:route check` introspect the heuristic even when routing is off (N1). M._classify_class exposed for testing. Test corpus (test_router_model.lua, 31 cases): - 13 code-class positives (fence, traceback, paths, multi-line paste) - 6 reasoning-class positives (explain/why/how does/compare/?+length) - 8 default-class (short queries, bare keywords below 15-char threshold, non-code paths like .md/.txt) - 3 model-mapping cases (code→"deep", reasoning→"cloud", default→nil) - 1 R-N2 default test: classes.reasoning=nil → reasoning text yields nil model override (heuristic still fires, no swap) - All 31 pass; 15-char threshold catches "how does ASLR work?" without false-positive on bare "explain". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:17:22 +00:00
marfrit	2e389c1475	docs/PHASE5: review fold-in — callback signature, Norris suppression, cost defaults Independent review found 1 BLOCKER + 5 CONCERNs + 4 NITs. Resolutions: B1 BLOCKER: summary callback signature was inconsistent across §3 and §6. Canonical now: summarize_fn(prior_summary, evicted_turns) -> string\|nil dispatching on the two args: (nil, [turns]) — first-time summarize (str, [turns]) — additive (extend prior summary with new evictions) (str, nil) — compress (re-summarize the prior summary itself) C1: re-summarize trigger now uses the (str, nil) compress signal rather than degenerate (str, {}). C2: routing decision is taken once on entry to ask_ai. The chosen active_cfg is used for every tool-sub-loop iteration. Original active_cfg restored after ask_ai returns. C3: AUTO-routing does NOT fire inside the Norris loop. Model fixed at :norris launch time; planner stays on it for every iteration. Q39 resolved. Per-iteration fallback still gated by cfg.routing.fallback — retries the failing call against cloud without permanently switching the planner. C4: Summary block suppressed in Norris (mirrors Phase 4 R-C1 for the [background] block). Both are "earlier context" the planner generally doesn't need. C5: Fallback pattern coverage expanded — added HTTP 408 (Q41 resolved) and "Operation timed out" (libcurl version variant). Dropped "HTTP response code said error" from A2 — FAILONERROR was removed in Phase 4 `f26cbd9`. NITs folded: N1 :route check <text> always runs heuristic; suffix "(routing currently disabled)" when cfg.routing.auto = false N2 reasoning → nil by default (not → "cloud"); user explicitly opts in to map reasoning to a paid model. Same cost-safety rationale as confirm_cmd default true. N3 "Retry only when no deltas have arrived" promoted to §5 normative rule (was in §11 risk row). N4 cfg.routing.cloud_fallback renamed cfg.routing.fallback to align with the :fallback meta verb. Reviewer verdict: commit #1 (router.classify_model) is implement- ready; B1/C1 resolution required before commit #2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:15:39 +00:00
marfrit	555fdd7717	docs/PHASE5: analyze — surface clean, summary lives on ctx.summary not turns A1. router.lua surface clean; classify_model is a natural sibling of classify. No structural refactor. A2. broker error message shapes confirmed: all transport errors carry "transport: " prefix; "api: " for SSE-framed semantic errors; "broker: " for config bugs. Fallback matcher must strip the prefix before testing — list of eligible patterns tightened in §5. A3. Q38 RESOLVED — summary doesn't go in ctx.turns (would create system/system back-to-back, same gotcha as PHASE0 §6 user/user). Instead lives on ctx.summary (string) and composes into the system message between [background] and NORRIS suffix. No new role:"system" turn; no alternation risk. §3 + §6 reflect. Module-changes table updated to specify ctx.summary string field + the to_messages composition order. Storage shape diagram in §6 rewritten. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:12:50 +00:00
marfrit	4453b93ab5	docs/PHASE5: formulate — multi-model routing + cloud fallback + summarize-on-evict Phase 5 formulate manifest. Three pillars per PHASE0 §11 row 5: heuristic-based per-request model routing, single-hop cloud fallback on local transport failure, and fast-model summarization at sliding- window eviction time. Resolutions baked in via §2: - Routing trigger: per-request in repl.ask_ai, gated by cfg.routing.auto (default off) - Classification: pure-Lua heuristics (length, keywords, code-fence detection, exception markers) — no LLM probe in v1 - Classes: code → deep, reasoning → cloud, default → keep active - Fallback trigger: string-match on err for HTTP 5xx / model_not_found / "Connection refused" / DNS / timeout - Fallback: one retry against cfg.routing.fallback_model (default "cloud" if configured); status line on every retry - Summarize: enforce_budget invokes summarize_fn callback wired by repl.lua to broker.chat with the fast model - Summary turn: single rolling _summary at turns[1], appended to on each eviction, re-summarized when it exceeds max_summary_chars Open questions (Q37-Q42) in §10: Q37 routing for :ask explicit ask Q38 summary turn vs system-role alternation Q39 fallback under Norris (proposal: single-request only) Q40 summary re-summarize fidelity loss (lossy by design) Q41 HTTP 408 pattern eligibility (default yes) Q42 routing inside tool-call sub-loop (proposal: fix at entry) 5-commit roadmap in §11. No new module files; mostly repl.lua and router.lua growth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:11:26 +00:00
marfrit	27784f9b68	config: Phase 4 memory example block (commit #5 ) Phase 4 commit #5 (final) per docs/PHASE4.md §12. Documentation-only; commented-out example showing: - inject_max_chars (cap on startup injection; default 2000) - summarizer_model (which configured model :memory summarize uses) The block is OFF by default. The :memory meta surface (:remember, :memory list/forget/clear/inject/summarize) works without the block — items persist to <history.dir>/memory.jsonl regardless. The block only configures the injection-into-system-prompt behavior + summarizer model choice. Phase 4 implementation complete: #1 `199dd87` history.lua memory store + ffi/libc.lua flock #2 `c1a5c73` context.lua [background] block (suppressed in Norris) #3 `3b074af` repl.lua memory handle + :remember + :memory meta #4 `f22d21d` :memory summarize — LLM candidate extraction #5 (this) config.lua memory example block Phase 4 verify-partial: - history memory round-trip tests: add/forget/load all green - flock single-writer enforcement verified - context composition order (DEFAULT → [background] → NORRIS) + Norris suppression all green - End-to-end persistence across boots: :remember on boot 1 visible on boot 2 as injected memory items - :memory forget id-not-active surfaces clean status (N1) - :memory clear with [y/N] confirm gate works - :memory summarize wire-correct against fast model (candidate parsing tolerates bullets; per-candidate y/N/edit prompts fire) Verify-pending: real-model summarizer quality test (deep/cloud); multi-process flock contention test; long-running :memory inject race with running broker stream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 07:53:58 +00:00
marfrit	f22d21d754	repl: :memory summarize — LLM candidate extraction (Phase 4 commit #4 ) Phase 4 commit #4 per docs/PHASE4.md §6. :memory summarize: 1. Source-of-truth: session log file via history.load(session_path), NOT ctx:to_messages() (R-C2). Skips turns tagged meta="summarize" so prior summarize exchanges don't self-amplify across multiple calls within the same session. 2. Pick summarizer model from cfg.memory.summarizer_model (default active model). 3. Build a transcript string ("role: content" per turn, 800 chars max per turn) and feed it as a single user turn alongside a system instruction asking for "(fact\|pref\|context): <content>" lines. 4. broker.chat with max_tokens=1024 + timeout_ms=90000 (the deep model can take a while; we don't want a 15s probe-cap here). 5. Log the response as an assistant turn with meta="summarize" so the next :memory summarize call filters it out. 6. Parse response lines tolerating markdown bullets and bold markup: ^%s[-]?%s[_](fact\|pref\|context)[_]:%s(.+)$ 7. Per-candidate prompt: y / N / edit. y → memory:add(kind, content) edit → readline prompt for replacement text any other → drop 8. status: "summarize: added N / M candidates". Live-tested against hossenfelder/fast: Pipeline correct end-to-end. Model emitted one candidate; user confirmation prompt fired; item persisted; :memory list showed it. Candidate quality from the 1.5B model is poor — typical small-model behavior; deep/cloud models would do better but this isn't an aish bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 07:53:36 +00:00
marfrit	3b074afaee	repl: memory handle + :remember + :memory meta (Phase 4 commit #3 ) Phase 4 commit #3 per docs/PHASE4.md §12. End-to-end memory wiring. Startup: - Opens memory handle at <history.dir>/memory.jsonl via history.open_memory(). Status-logs failure (e.g. flock held by another aish) and continues without memory. - inject_memory(): loads via history.load_memory(), truncates by cfg.memory.inject_max_chars (default 2000), populates ctx.memory_items. Status line announces N items injected. - shutdown_session() now also closes memory (releases flock). Meta commands: :remember <text> — shortcut for :memory add fact <text>; auto-refreshes ctx.memory_items so the next AI turn sees the new item without restart :memory list — show id / ts / kind / content (truncated at 80 chars per line) :memory add <kind> <t> — fact\|pref\|context required; rejects other kinds :memory forget <id> — N1: checks active-set first, surfaces "id N not active (already forgotten or never existed)" without appending if the id isn't live :memory clear — [y/N] confirm prompt; tombstones every active item :memory inject — N4: reload memory.jsonl into ctx.memory_items, replacing existing. Useful after manual file edits. Help block extended with the new commands. End-to-end verified: Boot 1 → :remember×2 + :memory add → 3 items, :memory list shows all three with timestamps Boot 2 → memory: 3 items injected (startup status); :memory list same three; ctx.turns empty (history is sessions/, memory is separate) Boot 3 → :memory forget 2 succeeds; :memory forget 99 → "not active" status without writing a tombstone; :memory list shows 2 items; :memory clear → confirm prompt → "cleared 2 items"; :memory list → "(no memory items)" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 05:11:48 +00:00
marfrit	c1a5c736ec	context: [background] memory injection block (Phase 4 commit #2 ) Phase 4 commit #2 per docs/PHASE4.md §5/§12. ctx.memory_items (array of {kind, content, ...}) loaded by repl.lua at startup from history.load_memory(). When non-empty AND ctx not in Norris mode, to_messages() appends a [background] block to the system prompt: [background] (memory.jsonl; manage via :memory) - (fact) User prefers terse responses - (context) Project: aish (LuaJIT REPL) Suppression under Norris (R-C1): when ctx.norris_active is true the [background] block is omitted. Norris already anchors via its NORRIS suffix carrying the goal; a 2KB background block per planning iteration would add ~16K tokens of redundant input over an 8-step run. Suffix composition order is now: 1. DEFAULT_SYSTEM_PROMPT (Phase 0 + Phase 2 MCP, statically embedded) 2. [background] block — when memory_items non-empty AND NOT norris_active 3. NORRIS MODE block — when norris_active repl.lua wiring (memory_items population at startup, :memory meta cmds, :remember shortcut, :memory inject for live refresh) lands in commit #3. Verified composition order with 4 cases: default-only → 697 chars, no background, no norris memory_items only → 824 chars, background YES, no norris memory + norris → 1451 chars, background NO, norris YES (suppressed) norris only → 1451 chars, background NO, norris YES Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 04:52:42 +00:00
marfrit	199dd87eaa	history: memory.jsonl store + flock (Phase 4 commit #1 ) Phase 4 commit #1 per docs/PHASE4.md §12. Two file changes bundled because R-B1 (flock for race-free single-writer enforcement) cannot be deferred — adding it retroactively means reopening the memory handle. ffi/libc.lua extensions: - cdef flock(int fd, int op), open(...), lseek(int, long, int) - constants LOCK_EX=2, LOCK_NB=4, LOCK_UN=8 - M.flock(fd, op) wrapper returning (true) on success or (false, errmsg) — errmsg is the strerror text so callers can surface "Resource temporarily unavailable" cleanly to the user. history.lua additions (Phase 4 section appended at end): - M.open_memory(path) -> handle \| nil, err Opens the file via libc.open(2) (need integer fd for flock — io.open's FILE* doesn't expose it), takes flock(LOCK_EX \| LOCK_NB). Returns "memory.jsonl held by another aish process" on lock-held. Scans existing content for max id; caches as handle.next_id. Writes meta header on first creation (no id, ignored at load). - handle:add(kind, content, tags?, source?) -> id Assigns next id; appends one JSONL item with auto-timestamp. kind ∈ {fact, pref, context} enforced via assert. - handle:forget(target_id) Appends a tombstone {id, ts, kind:"forget", target}. - handle:close() Releases fd (flock auto-released on close). - M.load_memory(path) -> items_table Reads all lines, builds forget-target set from kind=="forget" entries, returns active items as an array sorted by ts desc. Items without id (meta header) silently dropped. Tombstones with non-matching targets are no-ops (N3 invariant). Round-trip test passes: - open empty file → next_id=1 - add 3 items → ids 1, 2, 3 - forget id 2 (appends tombstone) - reopen → next_id correctly advances past the tombstone (=5) - load_memory → 2 active items (id 1 + id 3); tombstone resolved - lock-held detection: second open while first held → fails with "memory.jsonl held by another aish process" message - close releases the lock; reopen after release succeeds Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 04:52:03 +00:00
marfrit	ffead3986c	docs/PHASE4: review fold-in — flock for race, Norris suppression, summarizer self-amp Independent review found 1 BLOCKER + 3 CONCERNs + 4 NITs. R-B1 (BLOCKER): TOCTOU race on memory.jsonl — two aish processes scanning the same file compute identical next_ids. Resolution: flock(LOCK_EX \| LOCK_NB) on the fd in M.open_memory, held until close. Bundled into commit #1 (per reviewer: cannot defer because adding flock retroactively means reopening the handle). Requires ffi/libc.lua extension: flock cdef + LOCK_EX/LOCK_NB/LOCK_UN constants + M.flock wrapper. R-C1 (CONCERN, closes Q33): [background] block suppressed when ctx.norris_active. Avoids ~16K of redundant tokens per 8-step Norris run. Norris already anchors via its goal in the NORRIS suffix; memory items rarely change step-to-step planning. R-C2 (CONCERN): summarizer self-amplification — running :memory summarize twice in one session would feed the prior summarize call's assistant turn into the next input. Resolution: operate on the session log file (history.load(session_path)) instead of ctx:to_messages(), and tag prior summarize turns with meta="summarize" so they're filterable. R-C3 (CONCERN, cosmetic): §5 diagram clarified that DEFAULT_SYSTEM_PROMPT already carries the Phase 2 MCP block statically — not a separate dynamic block in v1. NITs N1-N4 folded inline: N1 forget no-op for unknown id surfaces a status N2 path note: memory.jsonl is sibling of sessions/, no collision N3 item-id invariants: id >= 1; meta header has no id; tombstones with non-matching targets are no-ops N4 :memory inject semantics explicit (replace ctx.memory_items from a fresh load + LRU-by-ts truncation) §3 module-changes table grew a new ffi/libc.lua row. §12 commit #1 description tightened — flock work bundled inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 04:50:43 +00:00
marfrit	2146b909f8	docs/PHASE4: analyze — surface confirmed, counter strategy locked A1. history.lua surface lines up cleanly for the memory additions — no structural refactor; pure additive functions mirroring the session pattern. A2. Counter persistence: scan at open, cache next_id in handle. O(n) load (n bounded by curation, ~hundreds), no sidecar file. Persisted ids let forget-tombstones target items even across restarts. A3. System-prompt suffix order locked: DEFAULT (carrying Phase 2 MCP block baked in) → Phase 4 [background] → Phase 3 NORRIS. Token cost measured: default ~174 toks, +NORRIS ~364 toks, +NORRIS+2KB background ~865 toks. Well within typical context budgets. No manifest amendments needed — §3/§5 already match. Findings recorded inline as Phase 7 anchors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 04:47:01 +00:00
marfrit	bea717534c	docs/PHASE4: formulate — memory.jsonl + startup injection + :memory meta Phase 4 formulate manifest. Three pillars per PHASE0 §11 row 4: memory.jsonl persistent cross-session store, startup context injection into the system prompt, and the :memory management surface + opt-in :memory summarize for candidate extraction. Resolutions baked in via §2: - Storage: append-only JSONL at <history.dir>/memory.jsonl - Format: {id, ts, kind, content, tags?, source?} - Kinds: fact / pref / context (lightly typed v1) - Forget: tombstone append, resolve at load (set-based) - Cadence: manual :memory summarize only in v1; auto-trigger Q-listed - Inject: dynamic [background] block on system prompt, capped at 2000 chars by default; LRU-by-ts selection if over-budget - Order: DEFAULT → MCP block → [background] → NORRIS suffix (Norris last so it dominates when active) New module surfaces: history.lua M.open_memory / memory:add / memory:forget / M.load_memory context.lua ctx.memory_items + [background] composer repl.lua :remember, :memory add/list/forget/clear/inject/summarize config.lua commented-out memory = {...} example Open questions (Q31-Q36) tracked in §11: Q31 auto-summarize trigger (manual v1; auto-on-quit candidate) Q32 in-place edit vs forget+re-add Q33 Norris-mode interaction (proposal: both blocks stay) Q34 split prefs into a dedicated prompt section? Q35 redaction of sensitive content during summarize Q36 duplicate detection on :memory add 5-commit roadmap in §12 (history → context → repl → summarize → config). No new module files. No substrate amendments to PHASE0 — entirely additive on top of Phase 1's history.lua pattern and Phase 3's dynamic-suffix pattern in context.lua. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 04:25:57 +00:00
marfrit	50666d092f	config: Phase 3 safety example block (commit #6 ) Phase 3 commit #6 (final) per docs/PHASE3.md §12. Documentation-only; commented-out example showing the safety schema: - llm_second_opinion (bool, default true) - llm_model (string, default deep→default_model fallback) - max_norris_steps (int, default 8) The block notes the model-selection trade-off (R-B2): cloud is the independent-class fast option (costs money), deep is the local-but-slow option, fast is self-policing and NOT recommended. No behavior change to existing configs — safety defaults kick in when the block is absent. Phase 3 implementation complete: #1 `bd59ce7` safety static patterns (34 rules) + 87-case test corpus #2 `2abd5da` LLM second-opinion + session cache + opts.max_tokens #3 `d2a53d2` renderer Norris frames #4 `11b1f56` safety.norris_step planner (single iteration) #5 `a404b2a` repl driver + \C-n real binding + :norris/:safety meta + readline rl_insert_text/rl_redisplay #6 (this) config.lua safety example block Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 23:42:46 +00:00
marfrit	a404b2a152	repl: Norris driver + \C-n + :norris/:safety meta (Phase 3 commit #5 ) Phase 3 commit #5 per docs/PHASE3.md §12. Wires safety.norris_step (commit #4) into the REPL with the user-facing surface. ffi/readline.lua extensions (A1 + R-C4): - rl_insert_text + rl_redisplay added to ffi.cdef block; M.insert_text and M.redisplay wrappers exposed. - M.bind: removed `:free()` on previous callback. Now keeps every bound callback pinned for process lifetime in `_pinned` list (alongside `_bound[seq]` for current lookup). Avoids the use-after-free window between unbind and rebind that R-C4 flagged. Memory cost is bounded — one closure per key sequence binding. context.lua Norris suffix (R-C3 / §8): - to_messages() composes a dynamic NORRIS MODE block onto the system prompt when ctx.norris_active is set. The block carries ctx.norris_goal so eviction of the user's "[norris] goal:" turn doesn't lose the anchor. Returns to plain system prompt when Norris exits. repl.lua Norris driver: - prompt() now shows ⚡ marker when ctx.norris_active per PHASE0.md §9. - \C-n bound to a real handler — inserts ":norris " at the cursor (replaces Phase 1 status placeholder). - run_norris(goal) function: sets norris_active + norris_goal, appends a "[norris] <goal>" user turn, renders the banner, then loops calling safety.norris_step with an injected helpers table until a terminal status returns. Renders the closing banner. - norris_halt(): the [N] proceed/skip/abort prompt called by safety.norris_step via helpers.halt. Empty input → abort (safe). - dispatch_tool(): factored from the Phase 2 ask_ai code so safety.norris_step can call it. - norris_exec(): factored exec path for autonomous mode (skips the interactive run_shell cd-status renderer). - :norris <goal> meta — launches autonomous mode - :norris off meta — drops Norris flag (rare; usually 'abort') - :safety patterns meta — lists active is_destructive rules - :safety check <cmd> meta — probes a hypothetical command End-to-end mock-driven test: Submitted ":norris find files in /tmp" → banner → step 1 emits tool_call (auto_approved per policy) → dispatched → frame rendered → step 2 emits "GOAL: complete" → sub-loop exits → DONE banner. 2 broker invocations, no stalls. config.lua safety example block lands in commit #6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 23:42:14 +00:00
marfrit	11b1f566b3	safety: norris_step planner (Phase 3 commit #4 ) Phase 3 commit #4 per docs/PHASE3.md §12. Single-iteration planner. The driver loop in repl.lua (commit #5) calls this in a while loop, advancing step_n on every "continue" return. M.norris_step(ctx, model_cfg, helpers, opts): 1. One broker.chat_stream round-trip — text + tool_calls collected, text streamed via helpers.render_assistant_delta. 2. Parse actions from response: tool_calls (already collected), CMD: lines (via helpers.extract_cmd_lines), GOAL: complete sentinel (line-level exact match per R-C5). 3. Record the assistant turn (with tool_calls if any) and log it. If no actions AND no goal_done → status="stalled". 4. Dispatch tool_calls (structured route first): - is_destructive check on serialized call. - If destructive → halt_fn(proceed/skip/abort). - Else → auto_approve lookup; absent → halt for consent (R-C6: Norris is conservative; auto_approve is the only consent bypass). - On skip: synthesize role:tool turn "[aish] tool call skipped by user" — alternation preserved per C5/C7. - On abort: return status="aborted". - On proceed: dispatch via helpers.dispatch_tool, append role:tool turn with result content. - Argument JSON parse failure also synthesizes a tool turn (same alternation rationale). 5. Dispatch CMD: lines (legacy route): - is_destructive check. - Destructive → halt_fn. - Non-destructive → run directly (Norris user accepted autonomy for non-destructive shell). - skip → ctx:append_exec_output "[aish] CMD skipped by user". - proceed → exec via helpers.exec_cmd, frame via render_exec_begin/end. 6. Skip-budget escalation (R-C1): after dispatch, if ctx.norris_consecutive_skips >= 3 → escalation halt; abort exits, proceed resets counter. 7. Goal-done check AFTER all dispatch (R-C2 / Q25 resolution). 8. Budget check: step_n >= max_steps → status="budget_exhausted". 9. Otherwise → status="continue", driver advances. Helpers are passed in as injected functions rather than directly requiring repl/renderer/executor — keeps safety.lua's coupling clean and norris_step testable with a mocked helpers table. State carried across iterations on the ctx: - ctx.norris_consecutive_skips (resets on any successful proceed) - ctx.norris_goal / ctx.norris_active (set/cleared by the driver) Existing test_safety.lua corpus (87 cases) still passes — norris_step addition doesn't touch is_destructive's behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 23:37:53 +00:00
marfrit	d2a53d2fc7	renderer: Norris autonomous-mode frames (Phase 3 commit #3 ) Phase 3 commit #3 per docs/PHASE3.md §12. Four new renderer functions for Norris mode visual feedback. M.norris_begin(goal) Bold cyan banner on Norris entry, with the goal text on a dim indented line. Frames the start of the planning loop. M.norris_step(n, max_n, descr) Compact one-line step counter ("─ step 3/16 ─") with optional description. Renders before each iteration of the planner. M.norris_halt(step_n, max_n, reason, action) Bold red banner when the destructive-op gate fires. Three indented lines: step counter, reason (red), action text (truncated at 400 chars, newlines collapsed). The interactive proceed/skip/abort prompt is shown after this banner by repl.lua. M.norris_end(status, reason) Closing banner. status ∈ {"done", "aborted", "budget_exhausted", "stalled", "broker_error"}. Color cyan on "done", red otherwise. Optional reason text on a dim line. The interactive prompt `[aish:<model> ⚡]>` activation lands in commit #5 (repl.lua's prompt() function). Smoke-tested all five frames visually — clean ANSI output, correct truncation on long action strings, color discrimination on done/aborted/budget_exhausted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 23:36:44 +00:00
marfrit	2abd5da3a6	safety: LLM second-opinion + session cache (Phase 3 commit #2 ) Phase 3 commit #2 per docs/PHASE3.md §12. Adds the LLM-probe gate on top of commit #1's static patterns. Together they form is_destructive. broker.lua extension: - opts.max_tokens (A2) — passed through to the request body. Phase 3 probes cap at 4 tokens for YES/NO replies. - opts.timeout_ms — overrides model_cfg.timeout_ms per-call. Probe uses 15000ms cap regardless of the model's normal timeout (the user's deep model has 1800000ms for long generations; the probe must stay snappy). - M.chat now accepts an opts table (same shape as chat_stream's). Backwards compatible — existing callers passing (cfg, msgs) unaffected. safety.lua additions: - llm_probe(cfg, system, cmd): single broker.chat call returning "YES"/"NO"/"YES_FAILSAFE"/"YES_UNPARSEABLE" — fail-safe defaults. - llm_second_opinion(cmd, cfg): two-probe protocol per R-B2. Probe 1: "Is this destructive?" — YES → flag. Probe 2 (only if probe 1 said NO): "Is this safe?" inverted question — NO → flag (disagreement = HALT). Both NO → safe. - Session-scoped cache _llm_cache keyed by normalized command (lowercased + whitespace-collapsed). Mitigates Q23 latency for repeated commands within a Norris run. - Model-selection precedence: cfg.safety.llm_model (explicit) → cfg.models.deep (independent local class) → cfg.models[default]. Fail-safe YES if none configured. - is_destructive(cmd, cfg): runs static patterns first (always), then LLM if cfg present + not explicitly opted-out. cfg=nil yields static-only mode (handy for tests). End-to-end verified against hossenfelder using qwen-coder-7b-32k as the deep probe (qwen3-30b-a3b-instruct in repo's config.lua isn't currently loaded on the local backend): cat /etc/hostname → hit=false (LLM: NO, NO inverted = safe) rm /tmp/x.log → hit=true (LLM flagged; static missed because no -r/-f flags) cp /etc/passwd /tmp/passwd.bak → hit=false (safe copy) cache: second probe on same cmd → 0s wall time static-only (cfg=nil): rm -rf /tmp/x → static hit, no LLM call opt-out (llm_second_opinion=false): cp x y → hit=false, no probe Test corpus (test_safety.lua, 87 cases) still all pass — cfg=nil preserves the static-only behavior. Note: production config.lua currently has `deep = qwen3-30b-a3b-instruct` which isn't loaded on the proxy backend right now; Norris users will hit the fail-safe (everything flagged destructive) until either the deep model is brought up OR cfg.safety.llm_model = "cloud" is set to route the probe through anthropic/claude-haiku-4.5. Update the config or model deployment for production use — covered by Phase 3 verify test case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 23:36:06 +00:00
marfrit	bd59ce7243	safety: is_destructive static pattern matcher (Phase 3 commit #1 ) Phase 3 commit #1 per docs/PHASE3.md §12. Static-pattern destructive-op heuristic; no LLM second-opinion yet (lands in commit #2). Implementation: - 34 patterns in DESTRUCTIVE_PATTERNS table, grouped: 9 shell-wrapper patterns (R-B1 — bash -c / sh -c / zsh -c / eval / python -c / perl -e / pipe-to-sh both forms / pipe-to-bash both forms / xargs ... rm). HALT on the wrapper itself; user reads the inner before proceeding. 10 filesystem destructive (rm -rf, find -delete, dd to device, mkfs, shred, wipefs, truncate -s 0, ...). 5 version-control destructive (git push --force/-f, git reset --hard, git clean -fd, git branch -D). 5 database/process (DROP TABLE/DATABASE, TRUNCATE TABLE, kill/pkill -9). 2 permission (chmod 777, chown on root path). - ci=true flag for case-insensitive SQL patterns; rule patterns must be lowercase when ci is set (matcher lowercases input). - pkill -9 ordered BEFORE kill -9; kill rule uses %f[%w] frontier so "pkill -9 nginx" reports "pkill -9" not "kill -9" substring match. - M._patterns exposes the rule table for :safety patterns meta (Phase 3 commit #5) and for the test corpus. - M.norris_step stub stays — lands in commit #4. Test corpus (test_safety.lua, 87 cases): - 49 destructive cases across all categories (incl. all 11 wrapper forms, the canonical curl\|sh end-of-string bypass, sudo-prefixed rm -rf, etc.). - 38 safe cases (read-only commands, non-destructive variants of risky verbs like "git push" without --force, "find" without -delete, "chmod 644", "kill 1234" without -9, etc.). - Documented one accepted false positive: echo "rm -rf /" matches the rm pattern by substring — Norris user can proceed after reading; tradeoff between false positives and false negatives, biased toward false positives per §5. - Run from repo root: `luajit test_safety.lua`. Exit 0 on pass. - Verified all 87 pass at commit time. R-C4 / readline rebind, broker opts.max_tokens, LLM second-opinion, norris_step planner, repl driver, and the wider Norris UX land in subsequent commits per §12. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:47:10 +00:00
marfrit	125f800513	docs/PHASE3: re-review NIT fold-in — pipe-to-sh EOL, ci= note, §12 sync Re-review surfaced one new BLOCKER + two CONCERNs + four NITs. Folded: N1 BLOCKER: `\|%ssh%f[%s]` missed `curl x \| sh` (end-of-string canonical wrapper-bypass — Lua's `%f[%s]` requires transition INTO whitespace, which doesn't happen at EOL). Replaced with two patterns each for sh and bash: `\|%ssh%s` (followed by whitespace/args) and `\|%ssh%s$` (end-of-string). Same for bash. Verified against 18 wrapper-bypass test cases — all canonical idioms now HALT. N2 CONCERN: `ci=true` rule flag had no implementation note. Added one sentence to §5 explaining the matcher lowercases the input string when ci is set. N3 CONCERN: §12 commit #5 description was stale — still said "extends interactive CMD: extraction to consult is_destructive" which contradicts the R-B3 resolution (Norris-only). Rewrote commit #5 description to match R-B3, and bundled the ffi/readline.lua `_bound[seq]:free()` removal into commit #5's scope with explicit "Phase 1 amendment" callout. Same for the §12 risk note that still referenced the dropped behavior change. Other NITs (N4 skip threshold, N5 approved-turn mention, N6 :model swap interaction, N7 commit-attribution wording) are cosmetic and will fold in-flight during implement if material. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:45:25 +00:00
marfrit	91ddcb005d	docs/PHASE3: review fold-in — security-layer BLOCKERs resolved Independent review surfaced 3 BLOCKERs + 6 CONCERNs + 7 NITs against the analyze-tier draft. Resolutions applied: BLOCKERs: B1 Shell-wrapper bypass — static patterns leaked on bash -c, sh -c, eval, pipe-to-shell, python -c, xargs\|rm. Added 9 wrapper patterns to §5. Norris HALTs on any wrapper invocation; user reads the inner before proceed. The patterns are the conservative floor against the wrapper bypass class. B2 LLM second-opinion was self-policing — same model class generating actions then judging them. Switched probe model from `fast` to `deep` (qwen3-30b). Added re-roll inversion: if first probe says NO, ask "is this SAFE?". Disagreement between two probes → HALT. Cheap independent-class insurance. B3 `is_destructive` would have run on interactive CMD: extraction — a PHASE0 §6/§10 substrate amendment in disguise. Resolved Q24: heuristic runs ONLY when norris_active == true. No substrate change; interactive `confirm_cmd` semantics unchanged. CONCERNs: C1 Skip-budget: consecutive_user_skips counter; 3+ similar skips escalate to abort/force-proceed prompt. C2 Algorithm-vs-Q25-resolution contradiction: §4 reordered to dispatch ALL pending actions before checking GOAL: complete. C3 Norris-goal eviction: goal embedded directly in the dynamic system-prompt suffix; survives sliding-window eviction. C4 Readline use-after-free window: M.bind no longer frees old callbacks; pin for process lifetime (bounded memory cost). C5 GOAL: complete matcher: line-level scan, exact match after trim — substrate-aligned with CMD: rigor. C6 §4 step 4 tightened: auto_approve does NOT bypass destructive heuristic; tool_call without auto_approve still HALTs even when destructive-clear (Norris conservative). NITs deferred or rolled into pattern table: - chown root-path pattern tightened (NIT 2 in-line) - Test corpus expansion noted in §12 commit #1 risk - Other NITs are wording-level Status: Plan (review folded). Ready for commit #1 (safety static patterns) once another review pass clears. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:42:58 +00:00
marfrit	cf4d79dd9d	docs/PHASE3: analyze + baseline — \C-n mechanics, LLM latency, module pre-state Analyze findings folded into the manifest: A1. \C-n binding can't toggle mid-prompt without rl_insert_text / rl_redisplay. Solution: bind those (one cdef + 2 wrappers in ffi/readline.lua) so \C-n inserts ":norris " at the cursor; user types goal + Enter. Routes through existing meta dispatch. A2. broker has no max_tokens passthrough. Add opts.max_tokens for the LLM second-opinion path (terminates at ~2 tokens; verified proxy honors it). A3. Phase 2 tool-sub-loop pattern IS the planner shape. safety.norris_step is the per-iteration extraction; driver loop in repl.lua. Module-changes table (§3) updated with the rl_insert_text and max_tokens rows. Baseline doc (PHASE3-baseline.md, 80 lines) captures: - LLM second-opinion latency: 425-1162ms per probe, all 5 test cases correct. Worst-case 16-step Norris = ~20s overhead; with static-pattern fast-path + session cache, ~5s realistic. - Module pre-state at commit `f26cbd9` (Phase 2 tip): LOC + state per file before Phase 3 edits. - Six static-pattern Lua-match sanity checks (all correct). - Carries: aish#15 (still open), aish#14, aish#32/#33. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:37:58 +00:00
marfrit	b58a842e49	docs/PHASE3: formulate — Norris autonomous mode + destructive-op gate Phase 3 formulate manifest. Three pillars per PHASE0.md §11 row 3: Chuck Norris autonomous mode (planning loop), destructive-op heuristic (static patterns + LLM second-opinion), and HALT/confirm protocol. Resolutions baked in via §2: Q2 iterative re-plan after each action (not top-down tree) Action sources CMD: lines AND MCP tool_calls — Phase 2 contract honored HALT trigger static-pattern hit OR LLM-second-opinion flag HALT shape 3-way: proceed / skip / abort Auto-approve under Norris honors Phase 2 auto_approve policy EXCEPT destructive-op heuristic always wins LLM second-opinion model the `fast` preset (cheapest) Norris prompt suffix appended to system prompt while active; "GOAL: complete" sentinel for done Key extensions: - safety.is_destructive: ~20 static shell-idiom patterns + LLM probe; runs on interactive CMD: extraction too (§9 — replaces bare confirm_cmd for known-destructive cases). Q24 worth challenging at analyze. - safety.norris_step: single-iteration of the planner. Driver loop in repl.lua. \C-n toggle (real binding, replaces Phase 1 placeholder); :norris <goal> explicit launch. - renderer.norris_begin/step/halt/end: visual parity with exec and tool_call frames. Prompt becomes [aish:fast ⚡]> per PHASE0.md §9. - context.to_messages dynamically appends NORRIS MODE suffix when norris_active. New open questions (Q23–Q30) tracked in §11: Q23 LLM second-opinion latency budget (caching mitigation) Q24 interactive CMD: also subject to is_destructive? (proposal: yes) Q25 GOAL: complete + pending actions in same response — dispatch first Q26 context preservation on abort/done/budget — all preserve Q27 :norris continue (resume after abort) — deferred to v2 Q28 side-effect MCP tools not in __shell/__write_file patterns Q29 goal-implies-authorization for destructive ops — no, always confirm Q30 :norris no-arg vs \C-n share goal-prompt path — yes, trivial Module-layout (PHASE0 §4) untouched — all changes are growth of existing files. 6 commits expected at implement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 20:45:03 +00:00
marfrit	f26cbd9a3a	phase2 amend: __ separator (Bedrock-safe) + post_sse error diagnostics Phase 7 verify finding from TC #26 against :model cloud: HTTP 400 from openrouter→Amazon Bedrock: "tools.0.custom.name: String should match pattern '^[a-zA-Z0-9_-]{1,128}$'" Anthropic via Bedrock validates tool names against that regex and rejects dots. PHASE2 originally chose "." as the namespace separator ("boltzmann.list_dir"); OpenAI tolerated it, Bedrock does not. Separator switched to "__" (two underscores) everywhere — internal API matches on-wire shape, no transformation layer: - repl.lua: - tools_schema builds "alias__name" - dispatch_tool_call splits via "^(.-)__(.+)$" (non-greedy → leftmost __) - :mcp tool parser uses same split - :mcp tools formatter prints "alias__name" - HELP block shows <alias__name> - safety.lua confirm_tool_call: alias.* glob → alias__* glob - config.lua example block: keys rewritten - docs/PHASE2.md: amendment header added; §1, §2 row, §3 config.lua row, §5 wire-shape JSON examples, §6 auto_approve schema, §7 meta-cmd table, §12 plan all updated. Original "." references preserved in commit history. Constraint: aliases must not themselves contain "__" so the parse stays unambiguous. Tool names from MCP servers may have underscores freely. Second fix bundled — uninformative broker error: Previously "broker error: transport: HTTP response code said error" Now "broker error: transport: HTTP 400: {full body snippet}" ffi/curl.lua M.post_sse changes: - FAILONERROR no longer set (was hiding the response body). - raw_body accumulator added alongside the SSE buffer; captures every byte regardless of SSE shape. - After perform, check status_code via curl_easy_getinfo. On >=400, return (nil, "HTTP <code>: <body[:400]>"). 2xx unchanged. - End-of-stream SSE flush only runs on 2xx (no false event on error bodies that aren't SSE-shaped). - Phase 1 callers reading just first return slot stay correct. End-to-end verified: - :model cloud + tools=[boltzmann__read_file ...] + "Use boltzmann__read_file with path=/etc/hostname" → Claude emits tool_call with name="boltzmann__read_file", args='{"path": "/etc/hostname"}'. ok=true, transport clean. - Force-bad tool name "bad.name.with.dots" → err string carries the full bedrock 400 with the regex-pattern message visible. TC #26 (sub-loop end-to-end) is now testable against cloud — the error that blocked it is resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 20:04:57 +00:00
marfrit	3fa6279f5b	repl: :mcp tool — disambiguate "no alias" vs "unknown alias" errors Surfaced by Phase 7 verify test case #29: typing :mcp tool list_dir (no dot) printed "unknown alias: nil" instead of a useful diagnostic. The parse failure was being conflated with the alias-not-found case. Now: :mcp tool list_dir -> tool name missing alias prefix: list_dir :mcp tool unknown_alias.x -> unknown alias: unknown_alias :mcp tool known_alias.bogus -> unknown tool: known_alias.bogus Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 18:55:01 +00:00
marfrit	09800d192a	config: Phase 2 mcp example block + deep model switch Phase 2 commit #7 (final) per docs/PHASE2.md §12. Two changes bundled: (1) commented-out mcp = {...} example block (~40 lines) at the end of config.lua showing the Phase 2 schema: - mcp.servers — alias → {url, auth_token \| auth_env} - mcp.auto_approve — "<alias>.<tool>" or "<alias>.*" globs - mcp.max_tool_depth — sub-loop budget per ask_ai turn The block is OFF by default; uncomment + adjust per fleet to activate. Documentation-only; no behavior change to existing configs (mcp_sessions stays empty, tools_schema() returns [], broker omits the field — full Phase 1 compatibility). (2) User-authored: deep model preset switched from mistral-nemo-12b-instruct to qwen3-30b-a3b-instruct, with a 10-min timeout_ms accommodating the larger model's RK3588 inference time. Reason: nemo backend is dormant per the proxy /v1/models discovery (aish#23 now returns 404 cleanly for unknown models instead of silent fallback); qwen3-30b is the practical "deep" alternative. Phase 2 implementation is now complete — 7 of 7 commits landed: #1 `6c194de` mcp.lua + ffi/curl status_code + PHASE0 §4 amendment #2 `0fde77f` safety.lua confirm_tool_call #3 `7c221a8` context.lua tool turns + use_tool_role fallback #4 `c736d0e` renderer.lua tool-call frames #5 `efdc728` broker.lua opts.tools + tool_call accumulator #6 `7e9cfff` repl.lua sub-loop + :mcp meta + system-prompt block #7 (this) config.lua example + deep model switch Next phase-loop step: verify (Phase 7). Files written are wired and isolated-tested; end-to-end model-driven verification waits on either a more compliant model or explicit forcing of tool_calls from the prompt — known to be marginal with the loaded qwen-1.5b but proven correct against direct probes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:40:21 +00:00
marfrit	7e9cfff04d	repl: tool-call sub-loop + :mcp meta + system-prompt augmentation Phase 2 commit #6 per docs/PHASE2.md §12. End-to-end wiring of the MCP tool-call flow on top of broker/safety/context/renderer/mcp. repl.lua additions: - mcp_sessions table populated from config.mcp.servers at startup. connect_mcp() helper does initialize + caches tools/list. Failures status-logged once; absent from mcp_sessions until manual reconnect (C4 — no auto-retry). - tools_schema() flattens connected sessions' tools into the OpenAI {type:"function", function:{name,description,parameters}} shape with "<alias>.<name>" namespacing. - flatten_content() concatenates content[type="text"] blocks; one-shot status warning when non-text blocks (image/resource) are dropped (§4 normative spec, v1 only handles text). - dispatch_tool_call(name, args_table) splits alias.tool, looks up session, calls. Returns (content_string, is_error). Errors of every flavor (missing alias, no session, rpc_error, transport_error) yield a synthesized "[aish] ..." string so callers always have a body for the role:"tool" turn — alternation preserved per C5/C7. - ask_ai rewritten as a sub-loop that re-issues the broker request until the model returns pure text or max_tool_depth (default 8) is hit. Each iteration: stream response → if tool_calls present, confirm-gate each → dispatch → append role:"tool" turn → continue. Argument-JSON parse failure produces a synthesized tool turn (C7). Decline at confirm produces "[aish] tool call declined by user" tool turn (alternation guarantee). - :mcp meta with sub-commands: list / tools / tool <a.n> / connect <url> [alias] / disconnect <alias>. HELP block extended. context.lua: DEFAULT_SYSTEM_PROMPT grows by ~4 lines per PHASE2.md §8 (hybrid prompt: static frame about MCP + dynamic tools list in the request body). Block is always present even when no MCP servers configured — ~60 tokens for clarity that 'CMD:' remains the fallback. CMD: extraction unchanged — runs on the FINAL pure-text response only (not on intermediate iterations of the tool sub-loop). Substrate §3 invariant preserved. End-to-end verified two ways: (1) Direct broker probe: aish's tools_schema fed through broker.chat_stream against hossenfelder → qwen-1.5b emits one tool_call payload with correct id + name="boltzmann.list_dir" + args='{"path":"/tmp"}'. Accumulator stitched the JSON-string across fragmented deltas. (2) Mocked-broker sub-loop test: ask_ai feeds 'list /tmp', mock emits text + tool_call, sub-loop dispatches against LIVE boltzmann lmcp (auto_approve via policy), 80+ files rendered inside the tool_call frame, broker re-invoked with the extended context, mock returns pure text, sub-loop terminates. Total broker invocations: 2. Known: the loaded fast model (qwen-1.5b) tends to emit "CMD: ..." suggestions even when an MCP tool is the better path; the small model's system-prompt compliance is weak. Larger models and the analyze-time direct probe confirm the tools_schema and tool_calls flow is wire-correct — Phase 7 verify will exercise this against qwen3-30b or cloud models when available. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:20:42 +00:00
marfrit	efdc7281c7	broker: opts.tools passthrough + streaming tool_call accumulator Phase 2 commit #5 per docs/PHASE2.md §12. Streaming broker grows tool-call support without taking a dependency on mcp.lua (caller supplies the tools array — B5 from review). chat_stream signature widens to (cfg, msgs, on_delta, opts): opts.tools - optional array, passed to the request body as the OpenAI-shape tools field. OMITTED entirely when nil or empty (#tools == 0) — some servers reject "tools": []. on_delta callback shape widens to (kind, payload): kind = "text", payload = string (Phase 1 path; unchanged semantics, signature changes from (delta) to ("text", delta)) kind = "tool_call", payload = {id, name, arguments} emitted ONCE per call on finish_reason "tool_calls" after the streaming accumulator pulls fragmented JSON-string arguments together. Accumulator behavior: - Keyed by delta.tool_calls[i].index. - If index is absent on a delta (some llama.cpp builds omit it on single-call streams; C2 in review), default to 0 with a one-shot stderr debug status per stream. - id and name captured from the opening delta of each slot. - function.arguments concatenated across all deltas as the raw JSON-string; caller (repl.lua / future Phase 2 commit #6) does dkjson.decode. - On finish_reason "tool_calls" the accumulator emits all collected calls in index order and resets. M.chat external contract unchanged (C1): wrapper now uses the new (kind, payload) shape internally but exposes the same text-string return. No caller of M.chat passes opts.tools so tool_call kinds are silently dropped. repl.lua minimal companion edit: ask_ai's chat_stream callback updated to the new shape. Text path unchanged; tool_call kinds are no-op placeholders until commit #6 lands the sub-loop. Keeps Phase 1 streaming functional between #5 and #6. Smoke-tested against hossenfelder/8082 (post-#23 fix): - text-only: ok=true, kind="text" deltas received - with opts.tools: model emitted one tool_call, accumulator collected id + name=get_weather + args={"city":"Paris"} correctly across fragmented deltas - opts.tools={}: server accepted (field omitted as required) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 14:20:32 +00:00
marfrit	c736d0e129	renderer: tool-call begin/end frames Phase 2 commit #4 per docs/PHASE2.md §12. Adds M.tool_call_begin(name, args) and M.tool_call_end(content, is_error) for visual parity with the existing exec_begin/exec_end frame. Visual cadence: ─── tool: <name (cyan)> ─── <args, dim, truncated at 200 chars; omitted if empty/"{}"> <content> ─── ok ─── (dim, success) ─── error ─── (red status word inside dim rule, on is_error=true) Same rule glyph (━) and ANSI palette as the exec frame so the user reads tool dispatch and shell dispatch the same way. Smoke-tested all five shapes: success with args / empty args / error / long args truncated / empty content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 14:11:42 +00:00
marfrit	7c221a8aae	context: tool turns + tool_calls on assistant; use_tool_role fallback Phase 2 commit #3 per docs/PHASE2.md §12. Three concrete edits per §3 context.lua row (the BLOCKER-fold-in from review): (a) Loosen Context:append shape-per-role: assistant may carry empty content if tool_calls is non-empty; role:"tool" requires tool_call_id + content. (b) Preserve tool_calls / tool_call_id on store (Phase 1 :append built {role, content} only and silently dropped extras). (c) Extend to_messages() with two emission modes selected by use_tool_role: true (default) — OpenAI-standard role:"tool" + assistant turns with tool_calls (wrapped as {id, type:"function", function:{name, arguments}}). false (fallback) — collapse assistant-with-tool_calls + its following role:"tool" turns into a single assistant text turn with synthesized "[tool: name]\n<args>\n[result]\n <content>" body; merge consecutive assistant turns so the trailing post-tool-result text doesn't yield asst/asst back-to-back (same strict-template gotcha PHASE0.md §6 warned about for user/user). Alternation assert added (N4): role:"tool" turns must trace back through zero-or-more prior tool turns to an assistant-with-tool_calls. Catches sub-loop bugs at append time. Orphan tool turns rejected. pending_exec_output behavior unchanged per §3 row: buffer persists across tool-call sub-loops, flushes on next genuine user turn (B4). Smoke-tested §12 verify-row #3: (i) default mode round-trip — 5 OpenAI-shape messages, tool_calls + tool_call_id preserved. (ii) fallback mode round-trip — collapsed into 3 messages (system/user/assistant), tool_calls + role:"tool" not emitted. (iii) multi-call: 2 tool_calls in one assistant turn followed by 2 tool replies, both modes render correctly. (iv) orphan tool turn after user — assertion fires. (v) B4: pending_exec_output survives a tool sub-loop, flushes on next :append_user. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 13:10:47 +00:00
marfrit	0fde77fe35	safety: confirm_tool_call gate with auto-approve policy Phase 2 commit #2 per docs/PHASE2.md §12. Implements just the per-call confirm-gate surface; Phase 3 stubs (is_destructive, norris_step) stay unimplemented with their error() bodies. M.confirm_tool_call(name, args, cfg) checks cfg.mcp.auto_approve for: - exact match on "<alias>.<tool>" - "<alias>.*" glob covering a whole server Miss falls back to a [y/N] readline prompt. Empty or non-"y" answer rejects (matches the existing confirm_cmd UX from PHASE0 §10). Pretty-printing renders args as compact JSON, truncated at 80 chars with "..." suffix so one-line prompts stay readable. Smoke-test passes all eight cases per §12 verify-row #2: exact match / alias glob → auto-approve, no prompt miss + y / n / empty / nil-cfg → prompt shown, expected verdict empty args / long args → clean rendering, truncation works Note: PHASE0 §4 module-layout had a "lands in Phase 2" hint on the norris_step stub; the actual landing is Phase 3 per PHASE0 §11 row 3. Comment in safety.lua updated to clarify. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 13:07:57 +00:00
marfrit	6c194deea0	mcp: JSON-RPC client + ffi/curl status_code; PHASE0 §4 amended First commit of Phase 2 per docs/PHASE2.md §12. Three changes bundled: mcp.lua (new, 153 lines): - M.connect(url, opts) returns a Session. - Session:initialize() round-trips initialize + notifications/initialized + tools/list. Caches tools for session lifetime (lmcp announces capabilities.tools.listChanged = false; no refetch). - Session:list_tools() returns the cached tool list. - Session:call_tool(name, args) returns (result_table, kind) where kind ∈ {"ok", "handler_error", "rpc_error", "transport_error"} per the §4 error split. Folded HTTP-level failure into transport_error. - Per-server Bearer auth via opts.auth_token or opts.auth_env env-var indirection. - Captures protocolVersion mismatch as a warning string rather than aborting (lmcp doesn't negotiate — N3 in review). ffi/curl.lua extension: - Add curl_easy_getinfo to ffi.cdef. - Pre-cast as getinfo_long; helper get_response_code() fetches CURLINFO_RESPONSE_CODE (decimal 2097154 = CURLINFOTYPE_LONG \| 2). - M.post now returns (body, status_code) on transport success; (nil, errmsg) on libcurl failure stays unchanged. Phase 1 callers reading only the first slot are unaffected. docs/PHASE0.md §4: - Insert `mcp.lua` between broker.lua and router.lua per PHASE2.md §9. - Module-stability invariant clarified: rename prohibition is what matters; adding new files is additive. Smoke-test passes for all four kinds against boltzmann lmcp v0.5.4: - initialize: ok (7 tools cached) - list_dir /tmp: ok (1.2KB content) - read_file /nonexistent: ok (boltzmann's baseline §3 quirk — isError:false even on failure; content is authoritative) - nope_tool: rpc_error (code=-32601) - wrong auth: transport_error (HTTP 401) - unreachable host: transport_error (DNS failure) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 13:06:39 +00:00
marfrit	f5daa6afc0	docs/PHASE2: re-review NITs — M.post shape, getinfo cdef, content flattening normative Three follow-up NITs from the post-fold-in review: (1) Disambiguate M.post return shape: (body, status_code) on transport success regardless of status; (nil, errmsg) on libcurl failure stays unchanged. Phase 1 callers reading only the first slot are unaffected. (2) Note that the M.post extension requires extending ffi.cdef to include curl_easy_getinfo + CURLINFO_RESPONSE_CODE (decimal 2097154, CURLINFOTYPE_LONG \| 2) and a long[1] out-param shim. Implementation detail the commit #1 author will need. (3) Move the tool-result content-flattening rule from §12 risk note into §4 normative spec (forward-referenced both ways) — §4 is where a future reader looking for the tool-invocation contract will scan. No design changes; clarifications only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 13:02:35 +00:00
marfrit	d3570ccea4	docs/PHASE2: review fold-in — 5 BLOCKERs + 7 CONCERNs + key NITs Independent review of the formulate+analyze+plan draft surfaced design gaps that would have shipped as silent bugs. Resolutions applied: BLOCKERs: B1 context.lua impact widened — Phase 1 :append asserts content and discards extra fields. Need (a) shape-per-role assert, (b) preserve tool_calls/tool_call_id on store, (c) emit from to_messages(). B2 ffi/curl.M.post extended to return (body, status_code). lmcp's 401 returns a non-JSON-RPC body that would have been mis-decoded. B3 §3 typo schema -> inputSchema. B4 pending_exec_output × tool-call sub-loop interaction specified. B5 §3/§12 broker dependency contradiction — broker takes opts.tools from caller; no layering inversion. CONCERNs: C1 M.chat return polymorphism dropped (no consumer). C2 tool_calls[].index absent fallback: default to 0. C3 Re-injection stores accumulated text, not hard-coded empty. C4 :mcp connect failure: no auto-retry, status-log once. C5/C7 JSON-RPC error AND argument-parse failure both synthesize a role:"tool" turn — keeps strict-template alternation legal exactly the way PHASE0 §6 demanded for exec output. C6 §9 confirms §4 amendment is additive (preserves §3 invariant). NITs: N3 protocolVersion fallback (lmcp doesn't negotiate). N4 Alternation assert in Context:append. N7 Model-routing bug filed as aish#23. N8 Day-one fallback test for use_tool_role=false in commit #3. Manifest status: Plan (review folded). Status line and Resolutions sections updated; commit-by-commit roadmap reflects revised specs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 13:00:07 +00:00
marfrit	447e430254	docs/PHASE2 §12: implementation plan — 7-commit roadmap Bottom-up: mcp.lua → safety.lua → context.lua → renderer.lua → broker.lua → repl.lua → config.lua. Same cadence as Phase 0/1. Risks called out explicitly: - Empty tools array → omit field entirely (some servers reject []) - isError:false on actual failure (baseline §3 finding) → pass content through regardless; let model read error text - JSON-RPC error from tools/call → aish status only, no tool turn appended, no model recovery - max_tool_depth=8 cap on tool-call sub-loop - Argument JSON streaming may yield malformed JSON → status warn + skip - Q18 fallback (use_tool_role=true default; prefix-injection plumbed but dead-coded; verify can flip) - Connect-at-startup is sequential (~30ms × N); fine for N≤3 Two items left open for review: Q18 default flip vs ship-true-flip-on-fail, and whether :mcp connect should re-fetch tools after the initial cache. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 12:37:27 +00:00
marfrit	c5116bf129	docs/PHASE2-baseline: pre-implementation measurements Phase 7 (verify) anchor. Captures: - MCP RPC round-trip timings against boltzmann lmcp v0.5.4 (all sub-100ms on LAN; LLM is the latency floor, not the transport). - 6 fixture responses saved to /tmp/aish-baseline/ covering initialize, notifications/initialized, tools/list, tools/call success, isError, and JSON-RPC unknown-tool error. - Baseline design finding: boltzmann's read_file returns isError:false even on failure (error text in content). aish should treat content as authoritative, isError as advisory; feed both to the model. PHASE2.md §4's "pass-through" stance already accommodates; no manifest amendment needed. - Streaming tool_calls delta shape verified against hossenfelder; matches PHASE2.md §5. - Pre-MCP aish behavior snapshot: loaded model emits markdown code-fence ignoring the CMD: contract — once MCP tools exist the model gets a structured path that doesn't depend on prose-formatting compliance. - Module pre-state at Phase 1 head `5878f73`: LOC + capability snapshot per module so Phase 2 diff has a reference frame. - Two boltzmann-proxy blockers (SSE buffering, model-field routing) carried explicitly into Phase 7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 12:34:32 +00:00
marfrit	5878f7347b	docs/PHASE2: analyze — lmcp v0.5.4 probed, transport simplified Live-probed against lmcp v0.5.4 (boltzmann) + hossenfelder broker proxy: Transport simpler than spec: - lmcp only implements POST-per-RPC with Connection: close; no held-open SSE channel. Combined with capabilities.tools.listChanged=false, no client-side listener is needed in v1. Drops the planned M.get_sse addition to ffi/curl.lua — Phase 1's M.post covers MCP. Bearer auth is universal across the fleet — config schema grew auth_token (literal) and auth_env (env-var indirection) fields per server, mirroring PHASE0 §10's key_env convention. Streaming tool_calls delta shape verified — accumulator by `index`, function.arguments arrives as chunked JSON-string. Matches the formulate-phase assumption in §5. Resolutions: Q17 transport abstraction — POST-only, no SSE channel for lmcp. Q21 error mapping — result.isError (model-recoverable, feed back as tool turn) vs JSON-RPC error (unknown method/tool, transport-level). Q18 role:"tool" turn — accepted at protocol level (live-probed). Mistral-nemo template verification blocked by the hossenfelder model-field routing bug; full closure carried to Phase 7 verify. Open-end recorded in §11: the hossenfelder proxy routes every request to the loaded fast model regardless of model field, blocking Phase 2 testing against mistral-nemo specifically. Parallel to the SSE buffering issue at marfrit/aish#15; same root (boltzmann proxy code). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 09:51:03 +00:00
marfrit	ec6793c93c	docs/PHASE2: formulate — MCP client + tool-calling bridge Phase 2 formulate manifest. Three pillars per PHASE0.md §11 row 2: mcp.lua (JSON-RPC 2.0 over HTTP+SSE, target: lmcp), tool-calling bridge (OpenAI tools field <-> MCP tools/call), and the safety.lua authorization gate (per-call confirm + auto_approve policy). Resolves PHASE0.md §13 Q6–Q10: Q6 CMD: + tool-calls coexist; substrate §3 unchanged Q7 config-declared servers + runtime :mcp connect Q8 per-call confirm default, auto_approve policy in config Q9 hybrid system prompt: static frame + dynamic tools body field Q10 streaming-from-day-one on Phase 1 SSE; on_delta widens to (kind, payload) New questions tracked in §11 (Q17–Q22): transport abstraction, role:tool vs prefix injection (mistral-nemo template verification needed), large tool-result handling, parallel dispatch, error mapping, aish-as-MCP-server (parked). §4 module layout amended: mcp.lua slots between broker.lua and router.lua. The amendment is documented in this manifest; the actual §4 table edit lands when implementation starts (Phase 2 implement phase). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 09:23:53 +00:00
marfrit	f7c3c32aa2	.claude: project-shared permission allowlist for read-only MCP/Bash Adds .claude/settings.json — 10 read-only entries (mcp____read_file, mcp__hub-tools__remote_list_hosts, Bash(ping ), Bash(dig *)) auto-allowed in any aish session, reducing per-call permission prompts during routine file-reading and host probing. Generated via /fewer-permission-prompts. settings.local.json stays user-private (per-user ad-hoc grants); .gitignore now covers it so it doesn't accidentally land in commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 08:08:26 +00:00
marfrit	7d62eb5659	review followups: pcall shield, :resume guard, shell quoting, nits CONCERNs from the Phase 1 review pass: ffi/curl.lua: - SSE write_cb body is now pcall-wrapped. A Lua error in on_event (or in the parse loop itself) is captured into cb_error and surfaced after curl_easy_perform rather than propagating across the FFI callback boundary (which LuaJIT documents as process-fatal). The EOS flush path gets the same shield. Errors return (nil, "callback: <msg>") from post_sse. history.lua: - sh_singlequote() escapes shell metacharacters; the mkdir -p and ls -1 shell-outs no longer double-quote (where $(...) and $VAR still expand) — single-quote with embedded-' escaping is the safe form. - M.load now returns (turns, meta) instead of (meta, turns). turns is ALWAYS a table on success, never nil-when-no-header; failure path is the unambiguous (nil, err). Callers can `if not turns then` without the previous ambiguity. repl.lua :resume updated to the new shape. repl.lua :resume: - Refuse to resume into a non-empty ctx — silent overwrite was the Q15 default, but the review surfaced the no-undo / no-warning failure mode. User must :reset (or :save then re-launch) to express intent. The current session's on-disk log is unaffected either way. NITs: - ffi/libc.lua READ_BUF: comment noting it's module-shared and Phase 1 has no reentrant readers; revisit when that changes. - PHASE1.md §7: \C-x\C-c reservation pinned to Phase 3 ("deferred from Phase 1 — no consumer here") rather than the previous dangling "(or here)". Regression suite verifies: - history.load new signature on success + failure paths - shell-quoted history.dir with $ doesn't trip - aish scripted run: ctx with 2 turns refuses :resume anchor with a clear status; user must :reset first Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:05:23 +00:00
marfrit	1f1065157e	review BLOCKER: PTY input forwarding + raw mode toggle Phase 1 review caught a structural gap: executor.exec only drained the PTY master fd, never forwarded user keystrokes — vim/less/htop/nano would render and hang on input. PHASE1.md §5 specified bidirectional multiplex but only the read leg landed. tcgetattr/tcsetattr were also missing, so even with input forwarding the parent's line discipline would buffer until newline (breaking single-key UIs). ffi/libc: - struct termios opaque buffer + tcgetattr/tcsetattr + cfmakeraw - M.set_raw(fd) saves termios + applies cfmakeraw; returns saved or (nil, err) when fd isn't a tty (scripted / piped-stdin runs) - M.restore_termios(fd, saved) - struct pollfd + M.poll (POLLIN constant) executor: - multiplex(sess): poll(stdin, master); reads master on any revents (POLLHUP fires when child closes its slave end, not POLLIN — the revents != 0 check catches both); forwards stdin keystrokes to master; loop exits when master read returns 0 (EOF / child gone) - stdin polling is only enabled when stdin_is_tty (set_raw succeeded); piped-stdin runs (tests / scripted) would otherwise drain queued aish commands into the child of the current cmd, swallowing them - raw mode is restored before returning so the user lands back at the aish prompt in canonical mode renderer + repl: - exec_output(out, code) split into exec_begin() (top rule, before spawn) + exec_end(code) (closing rule with exit, after wait). PTY multiplex streams the body live to stdout in between; the renderer never re-prints the body. PHASE1.md §3: - tcgetattr/tcsetattr changed from "optional" to "required for single-key UIs to work — done-criteria #2"; poll added to the libc row description. Verified: - non-interactive smoke (echo / false / exit 7 / ls /nonexistent / printf multi-line) — all exit codes correct, output streamed live, a\nb\nc\n preserved byte-for-byte - scripted-stdin run reaches all expected lines (no stdin draining into a non-interactive child) - aish prompt + framed exec block + exit-code line all render in correct order Live interactive verification (vim / less / htop in a real terminal) still needs a user-test pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:00:53 +00:00
marfrit	a75118b2ae	readline: bind() via rl_bind_keyseq; repl reserves \C-n no-op Phase 1 readline binding wiring per PHASE1.md §7. ffi/readline: M.bind(seq, lua_fn) -> bool Wraps lua_fn as a C callback (signature `int (int, int)` per readline's rl_command_func_t) and registers it via rl_bind_keyseq(seq, cb). Returns true on success (rl returns 0). Trampolines are pinned in module-local state so they outlive the bind call — readline retains the function pointer for the process lifetime. Rebinding the same seq frees the previous trampoline. Bound handlers are pcall-wrapped so a Lua error doesn't crash readline's input loop. repl: Binds \C-n to a no-op that emits "[aish] Norris mode not yet implemented (Phase 3)" Verifies the mechanism end-to-end; Phase 3 (Norris autonomous mode) replaces the body with the actual toggle. Smoke covers bind / rebind-same-seq (exercises the :free path) / bind-different-seq with no errors. Live keyboard verification waits on user-test. Phase 1's 8(+1) inner loop is now functionally through `implement`; next inner phase is `verify` (review pass) followed by memory-update. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:26:58 +00:00

1 2

81 Commits