marfrit/aish - aish - marfrit's space

Author	SHA1	Message	Date
marfrit	4d5f93aaa5	repl: :diff meta + _git_clean_cmd helper (Phase 6 commit #3 ) User-driven git diff injection. The model sees the diff on the next ask_ai turn through the existing exec_output channel. Changes: - _git_clean_cmd(subcmd_and_args) helper near _scan_project_tree. B1: every git invocation that flows into context MUST use `--no-pager -c color.ui=never`. Forkpty makes git think stdout is a TTY, enabling both color and the pager's keypad/line-clear escapes — these would pollute the captured context block. The helper is the single chokepoint; commit #4's @<r1>..<r2> retry will reuse it. - :diff [<args>] meta: - Reads cwd at meta invocation (R6: differs from :tree's scan-time cwd capture; documented in §5). - Runs `_git_clean_cmd("diff " .. args)` via executor.exec. - Empty output -> "(no diff): <label>" status, no context append. - Non-zero exit -> "diff failed (exit N): <label>" status, no context append. git's stderr already streamed to the user via executor.exec's live multiplex, so the failure reason is visible. - Success -> appends "[diff <label>]\n<output>" via ctx:append_exec_output. Label is "(working tree)" for empty args, else verbatim args. - Status confirms injection size: "diff injected: <label> (N bytes)". - HELP gains :diff line with three example arg shapes; N3-resolved (no `staged` alias — the meta is thin pass-through to git's grammar). Smoke verified across four scenarios in an ephemeral test repo: - Working-tree dirty -> 110-byte diff injected, no ANSI escapes - --cached -> 118-byte staged diff injected, clean - garbage..nonexistent -> exit 128, status + skip - Clean working tree -> "(no diff)", status + skip Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:17:18 +00:00
marfrit	d1dce832da	repl: _scan_project_tree + :tree meta + auto_tree (Phase 6 commit #2 ) First user-visible Phase 6 verb. Builds on commit #1's compose_project plumbing — sets ctx.project from either the :tree meta or the cfg.project.auto_tree startup hook. Changes: - _scan_project_tree(dir, opts) helper near _run_hook: git -C <dir> ls-files --cached --others --exclude-standard when <dir> is inside a git repo (N4: no subshell); find <dir> -mindepth 1 -maxdepth <depth+1> -type f -not -path '/.' otherwise. Returns (body, info={file_count, truncated, in_git}). Sorted paths, truncated to max_chars (default 4096 per cfg). - :tree [<depth>\|refresh\|off] meta: no arg -> scan with config defaults; resets _project_opts <N> -> scan with depth=N; caches as _project_opts refresh -> re-scan with cached _project_opts (else defaults) off -> clear ctx.project AND ctx._project_opts (R5) Status line reports file count + truncation flag + which backend fired (git/find). - cfg.project.auto_tree startup hook before the main loop: if true, scan libc.getcwd() once and set ctx.project. Failures status-logged once; REPL continues. Default off (existing configs unchanged). - HELP updated with three :tree lines. Plan §12 deliberately defers the config.lua example block to commit #6 along with the status header bump (R9 single-owner). Smoke (aish repo cwd): - :tree no-arg -> "33 files (git ls-files)" - :tree refresh -> same - :tree off -> "project tree cleared" - :tree 1 -> rescans - cfg.project.auto_tree=true at startup -> auto-injected status visible Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:14:36 +00:00
marfrit	c4fc7fde01	context: [project] block plumbing (Phase 6 commit #1 ) Foundation for Phase 6 — adds the field + composer + composition order with no callers yet. Nothing sets ctx.project; the meta hookup and startup auto-inject land in commit #2. Changes: - Context.new gains `project` (string, nil) and `_project_opts` (cached scan opts for `:tree refresh`; R7). - compose_project(text) helper mirrors compose_background / compose_summary. Returns "" for nil/empty; otherwise emits "\n\n[project]\n" + text. - to_messages inserts compose_project BETWEEN compose_background and compose_summary so the model reads memory facts -> project tree -> earlier conversation -> NORRIS suffix. - Same Norris-suppression guard as the other two dynamic blocks (R-C1 / R-C4 parity; planner stays on goal anchor). - Context:reset preserves ctx.project (R8 — matches the Phase 4 memory_items rule; startup-injected facts survive a user-driven context reset). Smoke verified (14/14 inline cases): - project nil -> no [project] block in sys_content - project set -> block present with contents - ordering: [background] < [project] < [earlier conversation summary] - norris_active suppresses all three; NORRIS suffix still appears - :reset clears turns/pending_exec_output/summary; preserves memory_items AND project Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:08:54 +00:00
marfrit	261b230be8	docs/PHASE6: review fold-in — 2 BLOCKERs resolved, 7 CONCERNs, 6 NITs Independent agent review of PHASE6 (manifest + baseline + plan at `4407029`). Status header: Plan -> Plan + review fold-in. BLOCKERs (RESOLVED in-place): R1. §4 fence detector's `outside`-state dropped the leading `'``'` chunk of a split fence — contradicted B2's local-model split-fence requirement (4-char median chunk size). Algorithm rewritten: outside-state now holds a tail (up to 10 chars) when the chunk's suffix could be a fence prefix; flushes on next push. Same accumulator pattern as the secrets streaming rehydrator. R2. `highlighted()` file placement was ambiguous (§3 vs §12). Lives in repl.lua (where _shq and executor are accessible); renderer.lua exposes set_highlight(enabled, detected, highlight_fn) and calls back. Keeps renderer.lua free of the executor require. CONCERNs (FOLDED): R3. PTY raw-mode toggle on every code-block render — smoke-test for cursor flicker / SIGWINCH races before locking in. Risk row 5. R4. tree-sitter highlight --lang X grammar is UNVERIFIED — upstream CLI canonically takes a path with extension. Implement-time check required; fallback path documented (extension-based tmpfile + path arg). Added to risk row 5 + open-at-plan. R5. :tree off semantics clarified — one-shot clear of ctx.project + ctx._project_opts; no "disabled" flag. R6. cwd-coupling difference between :diff (call-time) and :tree (scan-time) now documented in §5. R7. :tree refresh opts caching specified — caches ctx._project_opts; `:tree refresh` reuses last explicit opts. R8. :reset preserves ctx.project (parity with memory_items per Phase 4). §12 commit 1 smoke updated. R9. Status-bump duplication between §12 commits 5e and 6 resolved — commit 6 owns the bump. NITs (APPLIED): N1. §4 algorithm pseudocode now includes SOL/post-newline anchor (mid-line backticks in prose don't open a fence). N2. _detect_treesitter() gained a comment explaining the popen pattern doesn't gate on exit code (B3). N3. :diff staged shorthand dropped — meta is a thin pass-through to git's own grammar. N4. _scan_project_tree switched from `cd && git ...` to `git -C <dir> ...` — no subshell, more idiomatic. N5. Open-at-plan dir-arg bullet dropped (already decided in §6); replaced with R3 + R4 implement-time verification items. N6. §11 wording on #52 left as-is (cosmetic only). PHASE6.md now 896 lines (was 701 after plan). +264/-69. Ready for implementation phase 6 of the inner loop pending user gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:06:19 +00:00
marfrit	4407029296	docs/PHASE6: plan — fold B1/B3/B4 + add §12 commit roadmap Status header: Analyze -> Plan. Baseline findings folded into the design sections: §1 (highlighter pillar) gains B4: tree-sitter absent on every probed host; :highlight on emits install-hint when missing. §4 (highlighter sketch) revised per B3: io.popen():close() doesn't expose exit codes in LuaJIT. Route via executor.exec("cat tmp \| tree-sitter ...") which uses pty.spawn+waitpid and returns code reliably. Tmpfile design retained (avoids ARGMAX + shell-escape). §5 (:diff impl + @<r1>..<r2> retry) revised per B1: every git invocation must use `--no-pager -c color.ui=never` to suppress the color/keypad/line-clear escapes forkpty triggers. Factored recommendation: helper `_git_clean_cmd(subcmd)` shared by :diff and the @-mention diff retry. New §12 Implementation Plan — 6 commits, bottom-up: 1. context.lua: ctx.project + compose_project + composition order 2. repl.lua: _scan_project_tree helper + :tree meta 3. repl.lua: :diff meta + _git_clean_cmd helper (B1) 4. repl.lua: expand_mentions tiered resolution (@<r1>..<r2> per A6) 5. renderer.lua + repl.lua: tree-sitter detect + fence filter + :highlight meta (B3-revised tmpfile dispatch) 6. config.lua project example + status -> Implement Per-commit risk index + smoke criteria. Highlighter (commit 5) is the largest experimental surface — placed last so the rest of Phase 6 ships even if highlighter slips. Order is independent enough that swapping 3<->4 or 5<->6 doesn't break anything; bottom-up keeps each commit individually green. Things deliberately not split: _shq reuse, lang map duplication for v1, streaming-rehydration order (rehydrate -> highlight -> emit inherits naturally from existing chunk pipeline). Two items open at plan time, resolve at implement: _scan_project_tree dir-arg vs hardcoded getcwd; :highlight status probing tree-sitter --print-langs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:01:40 +00:00
marfrit	9f50206ca6	docs/PHASE6-baseline: substrate probes ahead of implementation Six findings from probing the world before tree-sitter / diff / project tree implementation lands: B1. `git` subcommands through executor.exec emit ANSI color + DEC keypad/line-clear escapes by default (forkpty enables interactive mode). `:diff` impl MUST use `git --no-pager --color=never <args>`. Same flags apply to any future git verbs. B2. SSE chunk size envelope: local llama.cpp delivers tiny chunks (median 4 chars, max 13) AND splits code fences across boundaries (`'``'` then `'`'`). Cloud (Anthropic via OpenRouter) delivers big chunks (median 26 chars), fences intact. The §4 fence-aware filter accumulator design covers both — confirmed necessary by local-model behavior. B3. LuaJIT io.popen():close() does NOT return exit codes — Lua 5.1 contract, not 5.2+. Breaks the A4 highlighter resolution. Revised: route via `executor.exec("cat tmp \| tree-sitter ...")` which uses pty.spawn + waitpid and returns (out, code) reliably. B4. tree-sitter CLI absent on both probed hosts (noether, higgs). Highlighter is opt-in by design; absent-CLI path should emit a clear install hint, not silently no-op. B5. Project-tree envelope: aish 32 files / 449 chars; similar local repos 15-25 files; scan time ~1-5ms. The 4096-char default cap accommodates ~290 typical paths. Large repos handled via tree_depth or cap tuning per existing §9 risk row. B6. os.tmpname returns POSIX /tmp/lua_XXXXXX paths; acceptable for the B3-revised tmpfile-roundtrip pattern. No structural changes to formulate/analyze. B1, B3, B4 will fold into PHASE6.md §4 / §5 / §1 during plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:58:56 +00:00
marfrit	ad52fe4538	docs/PHASE6: analyze — substrate probes + Q resolutions in-place Analyze pass against tree at `f596743`. All 6 formulate-time questions resolved without structural changes; pillar shapes intact. A1. renderer.lua surface clean — assistant_delta/flush accumulate via stream_buf; fence-aware filter slots in between chunk receipt and emit without touching anything else. A2. executor.exec via pty.spawn already handles git diff / find; cwd-aware (inherits from libc.chdir). No new IO model. A3. context composition order locked: base + [background] + [earlier summary] + NORRIS. [project] inserts between [background] and [earlier summary]; Norris-suppression guard inherited. A4. Q-H1 RESOLVED: tmpfile roundtrip for tree-sitter popen3 (io.popen("w") + redirect stdout to tmp file; io.open reads back). Avoids ARGMAX + shell-escape complexity. Cost ~one syscall per code block. A5. Q-D1 RESOLVED: no confirm gate on :diff. git diff is read-only; matches :history / :sessions / :safety check. A6. Q-D2 RESOLVED: tiered @<token> resolution — file lookup first, then ref-range retry when path fails AND token contains "..". @origin/main..feature works naturally; @../sibling.txt unaffected. A7. Q-H2 RESOLVED: highlighter is assistant-output only in v1. @-mention echo via readline is a different code path; deferred to v2 (added to §8 out-of-scope). A8. Q-T1 RESOLVED: project tree captured at scan time, not auto- refreshed on cd. v1 verb is :tree refresh; cd-intercept auto- refresh deferred to v2. A9. Q-T2 RESOLVED: .gitignore via `git ls-files --exclude-standard` in repos; find fallback outside. Custom globs deferred to v2. A10. expand_mentions punct-peel doesn't strip "/", so HEAD~1..HEAD, peels comma cleanly and the diff retry catches the cleaned token. A11. Auto-injection ordering: memory load → tree scan → first ask_ai. Composition reads memory facts before file tree. A12. [project] Norris-suppressed (parity with R-C1/R-C4). §3 module-changes table: context.lua row updated (project string + compose_project + ordering note + Norris suppression). §4 highlighter code sample replaced with the tmpfile-roundtrip resolved form. §5 @-mention section rewritten as tiered-resolution with worked examples. §8 out-of-scope gained three v2-polish items (echo highlight, cd- intercept auto-refresh, custom globs) so they're tracked. §10 Open Questions table now shows all 6 Qs with their resolutions inline. §9 Risks row for @-mention collision updated to point at A6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:53:58 +00:00
marfrit	f596743834	docs/PHASE6: formulate — tree-sitter highlight + diff + project tree Phase 6 formulate manifest. Three pillars per PHASE0 §11 row 6: 1. Tree-sitter syntax highlighting hooks External `tree-sitter` CLI when present, no-op otherwise. Honors PHASE0 §3 (no compiled extensions). Toggleable at runtime; off by default so existing UX is unchanged. 2. Diff-aware code injection :diff [args] meta + @<ref1>..<ref2> @-mention extension. Shells out to `git diff`; output flows through the existing exec-output context channel. 3. Project-level file-tree context :tree meta + optional cfg.project.auto_tree startup inject. git ls-files in a repo, find fallback otherwise. Composed into the system prompt as a new [project] block between [background] and [earlier summary]. Suppressed under Norris (R-C1 / R-C4 parity). Module changes: renderer.lua (fence-aware highlight filter), context.lua (compose_project), repl.lua (3 new metas, 3 new helpers, expand_mentions extension). No new module files in v1. Doc covers: scope + done-when criteria, tech decisions table, module changes table, per-pillar deep dive with example code, UX surface summary, out-of-scope list, risks, and 6 open questions to resolve in analyze (Q-H1/Q-H2 highlighter, Q-D1/Q-D2 diff, Q-T1/Q-T2 tree). Scope confirmed via AskUserQuestion: all three subsurfaces in scope; tree-sitter approach is external CLI w/ no-op fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:47:00 +00:00
marfrit	d852acadc2	repl: wire #13 secrets — scrub outbound, rehydrate stream + tool args Plumbs the secrets.lua module (commit `e4b818b`) into the conversation pipeline. Hook points: ask_ai — scrub_messages(ctx:to_messages(), mode) before call_broker; rehydrate streamed deltas via streaming_rehydrator so the user sees real values while text_parts accumulates rehydrated chunks (final_resp is plain — CMD: / DELEGATE: extractors see plain values) MCP dispatch — dispatch_tool_call rehydrates the args table before sess:call_tool so the trusted MCP server receives real values (the model emitted placeholders because it saw a scrubbed context) DELEGATE: & :delegate — scrub sub_msgs before broker.chat; rehydrate sub_text before appending to context, so future turns see real values restored Phase 5 summarize-on-evict — scrub sum_msgs before broker.chat; rehydrate the reply that becomes ctx.summary :memory summarize — same scrub + rehydrate pair Mode resolution per call: model_cfg.redact → config.secrets.default → "vault+autodetect" if vault loaded, else "off". ctx storage convention: PLAIN values throughout. The scrub happens at the egress (broker call) per the active redact mode; ctx.turns never holds placeholders for content the user typed or executor produced. The model's own emissions (assistant tool_call arguments) may carry placeholders because the model saw the scrubbed context — rehydrated at MCP dispatch and otherwise harmless on re-serialization (idempotent re-scrubbing). New meta: :secrets [status] vault entries, placeholders allocated this session, active broker mode. Never prints actual values (vault file is itself a secret per gotcha 7). :secrets check <text> dry-run scrub against the active broker's mode — shows the output transformation. Documented in config.lua with a commented-out block + per-broker redact field example. Deferred to a follow-up issue (clearly scoped): - safety.lua broker call sites (Norris main loop, is_destructive LLM second-opinion probe) — same wiring pattern, but they don't currently see secrets_session; needs threading through helpers. - @-mention file content is appended PLAIN to ctx and scrubbed at egress alongside the rest of the user turn (covered by the ask_ai scrub). - exec output streamed live to terminal is pre-scrub (user sees real values in their own shell — by design); the captured-for- context copy is scrubbed at egress alongside the rest. This is the "full scope" implementation chosen via AskUserQuestion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:38:23 +00:00
marfrit	e4b818b0e9	secrets: vault loader + scrub/rehydrate + autodetect (#13 commit 1) Standalone module — no wiring yet. Lands the substrate for issue #13: secrets.load(path) — vault file loader; refuses non-0600 secrets.make_session(vault) — per-conversation scrub/rehydrate state session:scrub(text, mode) — substitute literals (+ autodetect) session:rehydrate(text) — restore placeholders secrets.streaming_rehydrator — chunk-boundary-tolerant streaming wrapper Mode semantics (chosen per call by the caller): "off" — identity, no mapping "vault" — vault literals only, placeholders, rehydratable "vault+autodetect" — + heuristic regexes, placeholders, rehydratable "stealth" — + heuristic regexes, opaque decoys, one-way Placeholders are stable across the session: the same literal always maps to the same $AISH_SECRET_NNN slot, so re-scrubbing the same context is idempotent and the model sees a consistent vocabulary. AUTODETECT_PATTERNS (ordered; longer prefixes first): sk-or-v<N>-... OpenRouter ghp_/gho_/ghs_ GitHub PATs AKIA<16> AWS access keys eyJ...x.y.z JWTs sk-... OpenAI (generic; matched after openrouter) -----BEGIN ... PRIVATE KEY----- SSH/GPG key headers Streaming rehydrator: tolerates a placeholder split across SSE chunks ($AISH_SE then CRET_001). It holds back the trailing partial-match in a buffer, emits the rest, and resolves on the next push or flush. Verified with 20 unit cases (vault sub, stable mapping, autodetect across all label kinds, stealth decoys, mode=off, streaming with mid-placeholder splits, non-placeholder $-prose pass-through). Vault file mode enforcement: 0600 only — matches ssh's behavior for ~/.ssh/id_rsa. Loud failure (status + skip) if mode is wider. Next commit (issue #13 follow-up): wire into broker / tool dispatch / display, add per-broker `redact` policy, :secrets meta, config example block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:36:39 +00:00
marfrit	cdf4e86679	repl: sub-broker delegation via DELEGATE: marker (closes #6 ) Cost and context-window control: a "heavy" preset's model can offload work to a cheaper preset without spending its own tokens on the result. Example: deep model is mid-conversation and asks fast to summarize a 20k-line build log; the summary comes back as exec-output for the next turn, deep stays small. Marker syntax: DELEGATE: <preset> "<prompt>" (Single or double quotes; one DELEGATE per line; lines without the quoted shape are dropped — let the user write about delegation in prose without accidental dispatch.) Dispatch flow (mirrors CMD: / CMD&: extraction): 1. ask_ai's stream completes 2. extract_delegate_lines walks the final response 3. For each {preset, prompt}: broker.chat(config.models[preset], ...) synchronously; result is appended via ctx:append_exec_output as "[delegate <preset>]: <result>" 4. The model sees the delegate result on its next turn Implementation choice — marker over tool: option 1 from the issue ("inline delegate marker") works with any model regardless of tool_calls support. Option 2 (aish_delegate as a tool dispatched in the existing Phase 2 sub-loop) is the better UX for capable models since it returns the result mid-turn — filed as follow-up if needed. Meta surface: :delegate <preset> <prompt> one-shot direct invocation (useful for testing without depending on the model emitting DELEGATE:, and as a manual "ask <preset> something" verb) Scope: - Plan mode: emits "PLAN: DELEGATE <preset> <prompt>" without dispatch - Norris: not extended; the planner's model anchor would conflict with mid-plan switching (R-C3-adjacent risk) - No self-delegation guard: each DELEGATE is a separate broker call, not recursive; a delegate result reaching the next turn could contain another DELEGATE but that's bounded by max_tool_depth-style iteration cap on the parent - No cost prompt: configuring a paid cloud preset already implies consent to spend on it - Unknown preset → error status + exec-output note "[delegate X failed: unknown preset]" Extractor unit-tested with 8 cases (single-quote, double-quote, multi- line prose, empty prompt, no-quotes, case-sensitive, wrong prefix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:29:09 +00:00
marfrit	f94d16fc89	repl: background CMD&: with handle/poll (closes #8 ) Builds, long-running network calls, and file watches no longer block the turn. A new "CMD&: <cmd>" marker (analogue of CMD:) tells the REPL to spawn the command in the background, return immediately, and poll for completion between user inputs. Process model: shell-wrapped to avoid needing fork()/execv() FFI. nohup sh -c '(<cmd>) > <log> 2>&1; echo $? > <status>' </dev/null >/dev/null 2>&1 & echo $! The child is reparented to init; we hold only the PID and the path to the .status sidecar. Completion is detected by the .status file existing (the wrapper writes it as its last act). No waitpid needed — the child isn't ours after the popen subshell exits. Storage: <history.dir>/bg/<id>.log + <id>.status. The directory is created lazily at startup (mkdir -p). Requires history.dir to be configured; without it CMD&: emits an error status and the model sees an "[bg failed to start]" exec-output note. check_bg_done() runs at the top of each main-loop iteration alongside check_every_due(). When a job is detected as exited, the REPL: - emits a status line "[bg:<id> exited <code>, <bytes>, <secs>s wall] <cmd>" - appends the same string to ctx as exec output, so the model sees the completion on its next turn (natural follow-up: "ok the build finished; let me check the log") Meta surface: :bg-spawn <cmd> start a bg job directly (no AI needed; also useful for testing without depending on the model emitting CMD&:) :bg-list show running/done jobs (id, pid, state, runtime, cmd) :bg-output <id> dump the log file to stdout :bg-kill <id> SIGTERM (note: only delivers if the PID is still the actual command — long-lived shells may need pkill by name) Scope (deliberately limited for v1): - No callback-mode readline: bg completion detection is pre-prompt, not mid-readline. If a build finishes while the user is typing, notification comes when they hit Enter. - Permission policy DSL (#9) does NOT apply to CMD&: — the asynchronous gating model wasn't designed for the y/N flow. Filed as follow-up if needed. - Norris not extended: helpers.exec_cmd is still synchronous; the planner doesn't dispatch bg jobs. - Plan mode interaction: CMD&: in plan mode emits "PLAN: & <cmd>" and a "[plan] would bg-run: <cmd>" exec-output note, no spawn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:25:55 +00:00
marfrit	67d80e1047	repl: :every recurring prompts via pre-prompt due-check (closes #11 ) In-session timer that re-injects a prompt every N seconds. "Watch this thing" workflows (`:every 5m "check journalctl -u nginx for errors"`) without spawning a separate aish process. Approach: minimum viable. check_every_due() runs at the top of each main-loop iteration — timers fire BETWEEN user inputs, not during readline waits or active broker calls. Mid-stream firing would require rewriting ffi/readline to callback mode (substantial scope). If the on-the-fly firing requirement matters in practice it can land as a follow-up issue against the readline FFI. Meta: :every <interval> <prompt> schedule (interval: 30s \| 5m \| 2h \| bare int) :every list show jobs (id, interval, time-until-next, model, prompt) :every cancel <id> remove Defaults: - Model: "fast" preset if defined in config.models, else active model (per the issue's "recurring prompts should default to fast preset"). - In-memory only — jobs don't persist across restarts. - Suppressed while ctx.norris_active (planner stays on goal anchor). - Quotes around the prompt are stripped if present. - Each tick fires the job once, re-schedules next_fire = now + interval (no catch-up if the interval elapsed multiple times during a long user input). Tested: 11 interval-parse cases (30s, 5m, 2h, bare int, malformed), load via require, end-to-end :every list / cancel surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:23:07 +00:00
marfrit	17e62c0326	safety: permission policy DSL — allow/confirm/deny rule lists (closes #9 ) The confirm_cmd boolean was too coarse: true interrupts every harmless ls; false ungates everything. Most workflows want trust for read-only ops while still gating writes/network/sudo. New config: permissions = { allow = { "^ls%s", "^cat%s", "^git status" }, confirm = { "^rm%s", "^git push", "^docker%s", "^sudo%s" }, deny = { "^ssh%s+root@", "^curl%s+http[^s]" }, } Verdict order: deny > confirm > allow. First match in the chosen category wins. Unmatched defaults to "confirm". Patterns are Lua patterns (not regex) per PHASE0.md §3 — no compiled extensions. Verdict behavior in the interactive CMD: loop: - allow → run without prompt - deny → status line, skip - confirm → [y/N] prompt (same UX as legacy confirm_cmd=true) Backward compat: - permissions unset + confirm_cmd=true → always confirm - permissions unset + confirm_cmd=false → always allow - permissions set → policy table is authoritative Scope deliberately limited to the interactive AI-suggested CMD: gate. Norris autonomous mode keeps its own safety.is_destructive machinery (combining the two would double-gate or replace the LLM probe — both non-obvious behavioral changes that belong in their own issues). User-typed shell-routed lines (`router.classify → "shell"`) and :exec also bypass the policy by design — those are direct user intent. New introspection: :perms list — show the configured rule lists :perms check <cmd> — report verdict + matching rule (debug) safety.classify_command is exported and unit-tested with 12 cases covering each category, priority order (deny > allow on overlap), and both fallback paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:20:56 +00:00
marfrit	518c01a9f5	repl: user-defined skills loader (closes #2 ) PHASE0.md §5.2 froze the meta-command set at compile time. Skills let the user package repeatable workflows (project queries, prompt templates, audit routines) without forking aish. Discovery: scan ~/.config/aish/skills/*.lua at startup (or whatever $AISH_SKILLS_DIR points at — used both by users with non-XDG layouts and by CI). Each module exports: return { name = "<meta-cmd-name>", -- must match [%w_-]+ description = "<one-line>", -- shown by :skills run = function(args, h) ... end, } Helpers passed to run(): h.ask(text) — same path as :ask (with @path expansion) h.status(s) — emit "[aish] s" h.exec(cmd) — run a shell command (subject to plan_mode, hooks) h.model() — current active model name h.ctx — raw Context object (advanced) h.config — the loaded config table Validation rejects modules that miss name/run, use whitespace in the name, or collide with an existing meta command (built-in or earlier skill). Each rejection emits a status line so the user sees why a skill didn't appear. New meta command :skills lists what's loaded (sorted, with description). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:17:00 +00:00
marfrit	fb15f7a690	repl: pre/post CMD hooks via config.hooks (closes #3 ) Optional shell scripts trigger around every CMD: execution. Use cases: audit logging, auto-format-after-edit, custom safety gates beyond the existing confirm_cmd boolean. Config shape: hooks = { pre_cmd = "/path/to/pre-script", post_cmd = "/path/to/post-script", } Contract per hook invocation: - The command line is piped to the hook on stdin. - Env vars: AISH_CMD (the command), AISH_TURN (#ctx.turns at the moment of dispatch), AISH_CWD (libc.getcwd() result). - Hook stdout is streamed live to the terminal via executor.exec (so the user sees its output regardless of exit status). Pre-hook: non-zero exit aborts the command and emits a status line including the exit code. last_exec_code is set to the hook's exit so the {last_status} prompt template variable reflects the abort. Post-hook: exit code is ignored (the spec says so); only the visible stdout matters. Runs after the command's exec_end frame. Tested with success, abort, and stdin-matches-env paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:16:11 +00:00
marfrit	ce1378edee	repl: fix {name} pattern to accept underscores (#10 follow-up) %w excludes underscore in Lua patterns, so {ctx_used}, {ctx_max}, {cwd_short}, {last_status} were left literal in the prompt. Use [%w_] to accept identifiers with underscores. Surfaced during higgs smoke test of the new template. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:14:57 +00:00
marfrit	d738f339cb	repl: configurable prompt template via config.shell.prompt (closes #10 ) At-a-glance situational awareness: see the active model, context fill, mode flags, and cwd in the prompt itself — prevents "wait, am I still in plan mode?" surprises. Example config: shell = { prompt = "[{model} {ctx_used}/{ctx_max}t T{turn} {mode}] {cwd_short} > ", } Variables (substituted via {name}): {model} active preset name {ctx_used} char/4 token heuristic (Phase 0 §8; accurate is Q1) {ctx_max} config.context.token_budget {turn} #ctx.turns {cwd} libc.getcwd() (chdir-aware; PWD env may drift) {cwd_short} cwd with $HOME -> ~ {last_status} last exec exit code, "" if none yet {mode} "norris" \| "plan" \| "normal" Default behavior unchanged when shell.prompt is unset — keeps the "[aish:<model>]>" form with norris ⚡ and plan markers. Side wiring: - ffi/libc.lua gains getcwd() (chdir() doesn't update PWD). - run_shell records exit code into last_exec_code for {last_status}. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:14:43 +00:00
marfrit	10d2501cff	repl: peel trailing punctuation from @path mentions (#7 follow-up) Natural-language prose like "look at @README.md, then..." or "@foo.lua." at sentence end previously failed to expand because the trailing comma/period was included in the path. Now: if the raw token doesn't resolve, peel trailing chars from [.,;:?!)] one at a time until the path resolves or no more peels are possible. On success, the peeled chars are emitted verbatim AFTER the closing fence so the original punctuation is preserved. Surfaced during higgs smoke test (TC: "say the first line of @README.md, then stop" — the trailing comma broke resolution). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:11:22 +00:00
marfrit	bb374c2ad2	repl: @path mention expansion in input lines (closes #7 ) Saves the user from manual copy/paste: typing "show me @repl.lua" or "compare @config.lua and @config.example.lua" auto-expands each mention to a fenced code block carrying the file contents, language-tagged by extension, and feeds the composed text to the broker. Wired on the "ai" branch of the input loop and inside :ask. Meta and shell branches pass through unchanged — "@foo" in shell context is a literal program argument; meta commands store text verbatim. Trigger rule: "@" must follow start-of-string or whitespace — avoids false positives on email addresses ("user@example.com") and shell short-options. Path extends to next whitespace. Other behavior: - Language tag derived from extension via a small lookup; unknown extensions yield an untagged fence. - Files over 32 KB are truncated head/tail (16K + 8K) with a marker. - Missing files leave the literal "@path" token in place and emit a "[aish] @path: not found" status — non-fatal, lets the user correct the path and re-type. - Each successful expansion emits "[aish] @path expanded (N bytes [, truncated])" so the user sees what was inlined. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:10:54 +00:00
marfrit	dccd9e90cc	repl: :plan toggle — CMD: lines become PLAN: notes (closes #5 ) Plan mode is a safer entry point than going straight to Norris: the user iterates with the model on what to do, sees each CMD: as a PLAN: line, and the would-have-run notes feed back into the next-turn context so the model can refine without side effects. Toggle with :plan (flip), :plan on, :plan off. Off by default. When plan_mode is true: - CMD: lines extracted from the assistant turn print as "PLAN: <cmd>" - The note "[plan] would run: <cmd>" is appended via the existing append_exec_output channel — same context flow as a real exec, so the model sees its proposed action on the next turn. - run_shell is NOT called; no executor, no cd intercept, no capture. The prompt shows "[aish:<model> plan]>" while active (mirrors the norris ⚡ marker convention). Orthogonal to Norris: plan_mode only gates the interactive CMD: extraction path. Norris has its own halt protocol; combining them is not supported (the planner would be confused by skipped actions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:09:08 +00:00
marfrit	81c3b1b44a	main: non-interactive `-p`/`--prompt` one-shot mode (closes #4 ) Adds `aish -p "<text>"` for Unix-pipeline composability: tail app.log \| aish -p "any anomalies?" aish -p "summarize: $(curl -sS https://...)" The flag bypasses repl.lua entirely. On invocation: 1. Stdin: when not a TTY, read to EOF and prepend to the prompt as a fenced block. ffi.libc.isatty(0) gates the read so interactive `aish -p "..."` (no pipe) doesn't hang. 2. Resolve config.models[config.default_model]. 3. Stream broker.chat_stream replies to stdout; finalize with newline. 4. Exit 0 on success, 1 on broker error, 2 on arg / config error. Behavior NOT in -p mode (kept simple per the issue's "no repl.lua involvement"): - No MCP, no tool loop, no Norris, no routing, no memory injection. - "CMD:" lines in the reply are printed verbatim, NOT executed — callers can grep / pipe them as they wish. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:06:27 +00:00
marfrit	0700dce881	repl: enforce budget per Norris step, not just post-loop (closes #51 ) PHASE3.md §2 specifies sliding-window eviction "including mid-Norris- session if the loop runs long". Implementation only called enforce_budget() once, after the planning loop exited — so for a tight max_turns with a multi-step Norris session the model saw the FULL conversation throughout, defeating context budgeting and preventing R-C3 (NORRIS suffix goal anchor surviving eviction) from being exercised end-to-end. Move status_evictions(ctx:enforce_budget()) inside the while loop so it runs after every safety.norris_step return. Drop the now-redundant post-loop call. Surfaced during TC #38 (Qwen3-30B-A3B, max_turns=4) where the "oldest 4 turns evicted" status arrived AFTER NORRIS DONE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:05:34 +00:00
marfrit	0c93e31186	repl: warn on stale MCP auto_approve keys (closes #33 ) Auto-approve policy keys that point at unconnected aliases, mistyped tool names, or malformed forms were silently ignored — leaving the user with surprise confirm prompts and no diagnostic. validate_auto_approve() now walks config.mcp.auto_approve at startup (after the MCP connect loop) and after each :mcp connect. For each key: - "alias__*" — warn if alias has no live session - "alias__tool" — warn if alias unknown OR tool not in registry - anything else — warn as malformed (not in alias__tool form) Non-fatal. The re-run on :mcp connect lets a key that referenced a not-yet-connected alias become live without a restart. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:05:08 +00:00
marfrit	299dcce78f	repl: validate MCP tool names against Bedrock regex (closes #32 ) Anthropic-via-Bedrock enforces ^[a-zA-Z0-9_-]{1,128}$ on tool names. We already moved the alias separator from "." to "__" (commit `f26cbd9`), but a future MCP server could still register a tool whose name (or whose combination with the alias) contains characters outside that class — silently breaking calls to strict providers. connect_mcp now warns at startup for: - aliases containing "__" (would misparse on tool dispatch) - emitted alias__name strings that violate the regex or exceed 128 chars Behavior preserved: validation is informative-only. tools_schema() still emits the offending tool; local llama.cpp users accept lenient names and shouldn't be penalized for downstream strictness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:04:29 +00:00
marfrit	8e0e735e15	repl: fallback patterns — add 'Could not connect to server' (CURLE_COULDNT_CONNECT) Surfaced by autonomous run of TC #48: pointing models.fast at http://localhost:9999 (port closed, host resolves) emits "transport: Could not connect to server" — CURLE_COULDNT_CONNECT (7) which the Phase 5 fallback pattern set didn't include. Added "Could not connect to server" to FALLBACK_PATTERNS in repl.lua. Now fallback fires for the full set of common libcurl/HTTP transport failure shapes: HTTP 5xx server-side HTTP 404 model_not_found HTTP 408 gateway request timeout Couldn't resolve host CURLE_COULDNT_RESOLVE_HOST Could not connect to server CURLE_COULDNT_CONNECT (← added) Connection refused Timeout was reached CURLE_OPERATION_TIMEDOUT (variant A) Operation timed out CURLE_OPERATION_TIMEDOUT (variant B) Re-tested #48 end-to-end: fast pointed at dead port → fast fails → status fires → cloud (anthropic/claude-haiku-4.5 via openrouter) responds normally Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:49:13 +00:00
marfrit	d72689f709	config: deep model → deepseek-coder-v2-lite (temporary) qwen3-30b-a3b-instruct isn't loaded on hossenfelder right now (per /v1/models). deepseek-coder-v2-lite IS loaded — 16B MoE with ~2.4B active params; fast enough that the 30-min timeout from the qwen3-30b config was wildly over-budget. Switched to deepseek-coder-v2-lite for the time being. Restore qwen3-30b when the slot is back up. Live-probed: YES/NO destructive probe via the deep model preset returns "YES." in ~4.8s — well within the new 5-min timeout, and fast enough that the Phase 3 LLM second-opinion path is now functional again without falling back to "fail-safe YES" on every ambiguous command. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:42:23 +00:00
marfrit	a9b39cd435	config: Phase 5 routing + summarize-on-evict example (commit #5 ) Phase 5 commit #5 (final) per docs/PHASE5.md §11. Documentation-only; commented-out example showing: - routing.auto (per-request auto-routing toggle) - routing.classes (class → model mapping; reasoning = nil by default per R-N2 cost-safety) - routing.fallback (single-hop retry to cloud on transport fail) - routing.fallback_model (default "cloud" if uncommented) - context.summarize_on_evict + summarizer_model + max_summary_chars (shown INSIDE the context = {...} block above) All defaults OFF — Phase 5 is opt-in across the board. Existing configs without `routing` or `context.summarize_on_evict` behave identically to Phase 4. Phase 5 implementation complete: #1 `3e57824` router.classify_model + 31-case corpus #2 `03497b5` context summarize_fn callback + summary block in to_messages #3 `40ea0b4` repl routing + fallback + summarize_fn wiring + :route/:fallback #4 - (bundled into #3 since meta cmds are trivial additions) #5 (this) config example block Phase 5 verify-partial: - router.classify_model: 31/31 case corpus passes - context summarize-on-evict: mock callback fires correctly (additive + compress paths), summary suppressed under Norris, :reset clears it - repl meta cmds: :route on/off/classes/check + :fallback on/off all work; :route check reports class + "routing currently disabled" suffix when auto is off (N1) Verify-pending: end-to-end with real broker (route a code question, see it land on deep; kill local backend, see fallback fire to cloud). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:32:20 +00:00
marfrit	40ea0b49b0	repl: routing + fallback + summarize_fn wiring (Phase 5 commit #3 ) Phase 5 commit #3 per docs/PHASE5.md §3 / §11. Wires the Phase 5 machinery into the REPL. make_summarize_fn(): Returns a closure that maps (prior_summary, evicted_turns) onto a broker.chat call against cfg.context.summarizer_model (default "fast"). Three dispatch paths matching the R-B1 callback contract: evicted == nil → compress signal prior present → additive ("extend the prior summary ...") prior nil → first-time ("summarize the following turns") All use a system prompt enforcing "exactly one short paragraph", max_tokens=300, timeout_ms=30000. Broker failure returns nil so Context falls back to silent eviction. Renderer status is logged on failure for visibility. Context construction: Build ctx_opts as a fresh table (copies config.context to avoid mutating it), adds summarize_fn ONLY when config.context.summarize_on_evict == true. Defaults stay OFF — Phase 4 regression coverage. Fallback machinery: - FALLBACK_PATTERNS table with 7 transport-error signatures (HTTP 5xx, 408, 404-model_not_found, DNS, connection refused, "Timeout was reached", "Operation timed out") - fallback_reason(err) strips the "transport: " prefix and matches. - should_fallback(err) gates on cfg.routing.fallback. - call_broker(cfg, name, msgs, on_delta, opts) wraps broker.chat_stream: • tracks any_delta via wrapped on_delta callback • retries ONCE against cfg.routing.fallback_model (default "cloud") when err matches AND no deltas arrived (N3: mid-stream failures aren't retried — partial text would duplicate) • emits "[aish] local <name> failed (<reason>); retrying via <fb>" status before the retry call ask_ai routing: - Routing decision taken ONCE on entry (R-C2). req_name/req_cfg locals carry the choice through every tool-sub-loop iteration. - active_name/active_cfg are NOT mutated — user's :model selection survives the request. - When config.routing.auto is true, classify_model(text, config) is invoked. Non-nil model + non-active → swap req_cfg + status line. - broker.chat_stream call replaced with call_broker (fallback wrap). Meta cmds: :route on/off — toggle cfg.routing.auto at runtime :route classes — show class → model mapping :route check <text> — report classify_model result with "(routing currently disabled)" suffix when auto is off (N1) :fallback on/off — toggle cfg.routing.fallback at runtime HELP updated with the four new commands. Smoke-tested: aish boots, all four metas behave correctly, classify_model returns reasoning class for "Explain how MMAP works on Linux" (the model slot is nil because no classes are configured by default — N2 cost-safety). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:31:14 +00:00
marfrit	03497b5eea	context: summarize-on-evict callback + summary block (Phase 5 commit #2 ) Phase 5 commit #2 per docs/PHASE5.md §3 / §6. Context.new opts additions: - summarize_fn(prior_summary, evicted_turns) -> string\|nil callback per R-B1 canonical signature: (nil, [turns]) → first-time summarize (str, [turns]) → additive: extend prior summary (str, nil) → compress: re-summarize the prior nil return → silent eviction (Phase 0 behavior preserved) - max_summary_chars (default 2000) — when ctx.summary grows past this, the callback is invoked AGAIN with the compress signal so the summary stays bounded across long sessions Context.summary (string\|nil) is the rolling summary state. Composed into the SYSTEM MESSAGE (not as a turns[] entry — A3 resolution avoids system/system back-to-back). compose_summary() emits: [earlier conversation summary] <ctx.summary> between [background] and the NORRIS suffix. Both [background] and [earlier summary] are SUPPRESSED when ctx.norris_active (R-C4 — mirrors R-C1 from Phase 4; planner stays focused on its goal). enforce_budget() rewrite: - Collects the evicted pair before removing. - Calls summarize_fn(self.summary, pair) under pcall — wraps any callback error so a broken summarizer can't crash the REPL. - Updates self.summary if callback returned non-empty string. - If new summary exceeds max_summary_chars, invokes compress pass (callback with evicted=nil). - Removes pair from turns (same final state as Phase 0). Context:reset() clears the summary alongside turns + pending_exec_output. Smoke-tested with a mock summarizer over a 10-turn context with max_turns=4 and max_summary_chars=80: - 6 turns evicted to bring count down to 4 - Callback fired 4 times (3 additive + 1 compress when summary crossed 80 chars) - to_messages includes [earlier conversation summary] block - Under norris_active=true, summary suppressed (block absent) - :reset clears ctx.summary Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:18:37 +00:00
marfrit	3e57824684	router: classify_model heuristic + 31-case corpus (Phase 5 commit #1 ) Phase 5 commit #1 per docs/PHASE5.md §11. Pure-Lua per-request model routing — no IO, no LLM probe in v1. router.classify_model(text, cfg) -> (model_name \| nil, class_label): 1. classify_class(text) walks heuristics in priority order: code class: - triple-backtick fence anywhere - "traceback" / "stacktrace" / "stack trace" (ci) - "error:" / "exception:" in first 60 chars (ci) - path-with-code-extension token (.py/.lua/.c/.js/.go/.rs/.cpp/.h/.ts) - 5+ lines with indented content (looks like a paste) reasoning class (requires text >= 15 chars to skip bare keywords): - "explain" / "why " / "how does" / "compare" (ci) - "?" + length > 100 chars default class: everything else 2. Map class via cfg.routing.classes[class] → model name (or nil = keep current). 3. Return (model_name_or_nil, class_label). ALWAYS evaluates regardless of cfg.routing.auto — caller (repl.ask_ai in commit #3) gates on the flag. This separation lets `:route check` introspect the heuristic even when routing is off (N1). M._classify_class exposed for testing. Test corpus (test_router_model.lua, 31 cases): - 13 code-class positives (fence, traceback, paths, multi-line paste) - 6 reasoning-class positives (explain/why/how does/compare/?+length) - 8 default-class (short queries, bare keywords below 15-char threshold, non-code paths like .md/.txt) - 3 model-mapping cases (code→"deep", reasoning→"cloud", default→nil) - 1 R-N2 default test: classes.reasoning=nil → reasoning text yields nil model override (heuristic still fires, no swap) - All 31 pass; 15-char threshold catches "how does ASLR work?" without false-positive on bare "explain". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:17:22 +00:00
marfrit	2e389c1475	docs/PHASE5: review fold-in — callback signature, Norris suppression, cost defaults Independent review found 1 BLOCKER + 5 CONCERNs + 4 NITs. Resolutions: B1 BLOCKER: summary callback signature was inconsistent across §3 and §6. Canonical now: summarize_fn(prior_summary, evicted_turns) -> string\|nil dispatching on the two args: (nil, [turns]) — first-time summarize (str, [turns]) — additive (extend prior summary with new evictions) (str, nil) — compress (re-summarize the prior summary itself) C1: re-summarize trigger now uses the (str, nil) compress signal rather than degenerate (str, {}). C2: routing decision is taken once on entry to ask_ai. The chosen active_cfg is used for every tool-sub-loop iteration. Original active_cfg restored after ask_ai returns. C3: AUTO-routing does NOT fire inside the Norris loop. Model fixed at :norris launch time; planner stays on it for every iteration. Q39 resolved. Per-iteration fallback still gated by cfg.routing.fallback — retries the failing call against cloud without permanently switching the planner. C4: Summary block suppressed in Norris (mirrors Phase 4 R-C1 for the [background] block). Both are "earlier context" the planner generally doesn't need. C5: Fallback pattern coverage expanded — added HTTP 408 (Q41 resolved) and "Operation timed out" (libcurl version variant). Dropped "HTTP response code said error" from A2 — FAILONERROR was removed in Phase 4 `f26cbd9`. NITs folded: N1 :route check <text> always runs heuristic; suffix "(routing currently disabled)" when cfg.routing.auto = false N2 reasoning → nil by default (not → "cloud"); user explicitly opts in to map reasoning to a paid model. Same cost-safety rationale as confirm_cmd default true. N3 "Retry only when no deltas have arrived" promoted to §5 normative rule (was in §11 risk row). N4 cfg.routing.cloud_fallback renamed cfg.routing.fallback to align with the :fallback meta verb. Reviewer verdict: commit #1 (router.classify_model) is implement- ready; B1/C1 resolution required before commit #2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:15:39 +00:00
marfrit	555fdd7717	docs/PHASE5: analyze — surface clean, summary lives on ctx.summary not turns A1. router.lua surface clean; classify_model is a natural sibling of classify. No structural refactor. A2. broker error message shapes confirmed: all transport errors carry "transport: " prefix; "api: " for SSE-framed semantic errors; "broker: " for config bugs. Fallback matcher must strip the prefix before testing — list of eligible patterns tightened in §5. A3. Q38 RESOLVED — summary doesn't go in ctx.turns (would create system/system back-to-back, same gotcha as PHASE0 §6 user/user). Instead lives on ctx.summary (string) and composes into the system message between [background] and NORRIS suffix. No new role:"system" turn; no alternation risk. §3 + §6 reflect. Module-changes table updated to specify ctx.summary string field + the to_messages composition order. Storage shape diagram in §6 rewritten. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:12:50 +00:00
marfrit	4453b93ab5	docs/PHASE5: formulate — multi-model routing + cloud fallback + summarize-on-evict Phase 5 formulate manifest. Three pillars per PHASE0 §11 row 5: heuristic-based per-request model routing, single-hop cloud fallback on local transport failure, and fast-model summarization at sliding- window eviction time. Resolutions baked in via §2: - Routing trigger: per-request in repl.ask_ai, gated by cfg.routing.auto (default off) - Classification: pure-Lua heuristics (length, keywords, code-fence detection, exception markers) — no LLM probe in v1 - Classes: code → deep, reasoning → cloud, default → keep active - Fallback trigger: string-match on err for HTTP 5xx / model_not_found / "Connection refused" / DNS / timeout - Fallback: one retry against cfg.routing.fallback_model (default "cloud" if configured); status line on every retry - Summarize: enforce_budget invokes summarize_fn callback wired by repl.lua to broker.chat with the fast model - Summary turn: single rolling _summary at turns[1], appended to on each eviction, re-summarized when it exceeds max_summary_chars Open questions (Q37-Q42) in §10: Q37 routing for :ask explicit ask Q38 summary turn vs system-role alternation Q39 fallback under Norris (proposal: single-request only) Q40 summary re-summarize fidelity loss (lossy by design) Q41 HTTP 408 pattern eligibility (default yes) Q42 routing inside tool-call sub-loop (proposal: fix at entry) 5-commit roadmap in §11. No new module files; mostly repl.lua and router.lua growth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:11:26 +00:00
marfrit	27784f9b68	config: Phase 4 memory example block (commit #5 ) Phase 4 commit #5 (final) per docs/PHASE4.md §12. Documentation-only; commented-out example showing: - inject_max_chars (cap on startup injection; default 2000) - summarizer_model (which configured model :memory summarize uses) The block is OFF by default. The :memory meta surface (:remember, :memory list/forget/clear/inject/summarize) works without the block — items persist to <history.dir>/memory.jsonl regardless. The block only configures the injection-into-system-prompt behavior + summarizer model choice. Phase 4 implementation complete: #1 `199dd87` history.lua memory store + ffi/libc.lua flock #2 `c1a5c73` context.lua [background] block (suppressed in Norris) #3 `3b074af` repl.lua memory handle + :remember + :memory meta #4 `f22d21d` :memory summarize — LLM candidate extraction #5 (this) config.lua memory example block Phase 4 verify-partial: - history memory round-trip tests: add/forget/load all green - flock single-writer enforcement verified - context composition order (DEFAULT → [background] → NORRIS) + Norris suppression all green - End-to-end persistence across boots: :remember on boot 1 visible on boot 2 as injected memory items - :memory forget id-not-active surfaces clean status (N1) - :memory clear with [y/N] confirm gate works - :memory summarize wire-correct against fast model (candidate parsing tolerates bullets; per-candidate y/N/edit prompts fire) Verify-pending: real-model summarizer quality test (deep/cloud); multi-process flock contention test; long-running :memory inject race with running broker stream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 07:53:58 +00:00
marfrit	f22d21d754	repl: :memory summarize — LLM candidate extraction (Phase 4 commit #4 ) Phase 4 commit #4 per docs/PHASE4.md §6. :memory summarize: 1. Source-of-truth: session log file via history.load(session_path), NOT ctx:to_messages() (R-C2). Skips turns tagged meta="summarize" so prior summarize exchanges don't self-amplify across multiple calls within the same session. 2. Pick summarizer model from cfg.memory.summarizer_model (default active model). 3. Build a transcript string ("role: content" per turn, 800 chars max per turn) and feed it as a single user turn alongside a system instruction asking for "(fact\|pref\|context): <content>" lines. 4. broker.chat with max_tokens=1024 + timeout_ms=90000 (the deep model can take a while; we don't want a 15s probe-cap here). 5. Log the response as an assistant turn with meta="summarize" so the next :memory summarize call filters it out. 6. Parse response lines tolerating markdown bullets and bold markup: ^%s[-]?%s[_](fact\|pref\|context)[_]:%s(.+)$ 7. Per-candidate prompt: y / N / edit. y → memory:add(kind, content) edit → readline prompt for replacement text any other → drop 8. status: "summarize: added N / M candidates". Live-tested against hossenfelder/fast: Pipeline correct end-to-end. Model emitted one candidate; user confirmation prompt fired; item persisted; :memory list showed it. Candidate quality from the 1.5B model is poor — typical small-model behavior; deep/cloud models would do better but this isn't an aish bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 07:53:36 +00:00
marfrit	3b074afaee	repl: memory handle + :remember + :memory meta (Phase 4 commit #3 ) Phase 4 commit #3 per docs/PHASE4.md §12. End-to-end memory wiring. Startup: - Opens memory handle at <history.dir>/memory.jsonl via history.open_memory(). Status-logs failure (e.g. flock held by another aish) and continues without memory. - inject_memory(): loads via history.load_memory(), truncates by cfg.memory.inject_max_chars (default 2000), populates ctx.memory_items. Status line announces N items injected. - shutdown_session() now also closes memory (releases flock). Meta commands: :remember <text> — shortcut for :memory add fact <text>; auto-refreshes ctx.memory_items so the next AI turn sees the new item without restart :memory list — show id / ts / kind / content (truncated at 80 chars per line) :memory add <kind> <t> — fact\|pref\|context required; rejects other kinds :memory forget <id> — N1: checks active-set first, surfaces "id N not active (already forgotten or never existed)" without appending if the id isn't live :memory clear — [y/N] confirm prompt; tombstones every active item :memory inject — N4: reload memory.jsonl into ctx.memory_items, replacing existing. Useful after manual file edits. Help block extended with the new commands. End-to-end verified: Boot 1 → :remember×2 + :memory add → 3 items, :memory list shows all three with timestamps Boot 2 → memory: 3 items injected (startup status); :memory list same three; ctx.turns empty (history is sessions/, memory is separate) Boot 3 → :memory forget 2 succeeds; :memory forget 99 → "not active" status without writing a tombstone; :memory list shows 2 items; :memory clear → confirm prompt → "cleared 2 items"; :memory list → "(no memory items)" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 05:11:48 +00:00
marfrit	c1a5c736ec	context: [background] memory injection block (Phase 4 commit #2 ) Phase 4 commit #2 per docs/PHASE4.md §5/§12. ctx.memory_items (array of {kind, content, ...}) loaded by repl.lua at startup from history.load_memory(). When non-empty AND ctx not in Norris mode, to_messages() appends a [background] block to the system prompt: [background] (memory.jsonl; manage via :memory) - (fact) User prefers terse responses - (context) Project: aish (LuaJIT REPL) Suppression under Norris (R-C1): when ctx.norris_active is true the [background] block is omitted. Norris already anchors via its NORRIS suffix carrying the goal; a 2KB background block per planning iteration would add ~16K tokens of redundant input over an 8-step run. Suffix composition order is now: 1. DEFAULT_SYSTEM_PROMPT (Phase 0 + Phase 2 MCP, statically embedded) 2. [background] block — when memory_items non-empty AND NOT norris_active 3. NORRIS MODE block — when norris_active repl.lua wiring (memory_items population at startup, :memory meta cmds, :remember shortcut, :memory inject for live refresh) lands in commit #3. Verified composition order with 4 cases: default-only → 697 chars, no background, no norris memory_items only → 824 chars, background YES, no norris memory + norris → 1451 chars, background NO, norris YES (suppressed) norris only → 1451 chars, background NO, norris YES Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 04:52:42 +00:00
marfrit	199dd87eaa	history: memory.jsonl store + flock (Phase 4 commit #1 ) Phase 4 commit #1 per docs/PHASE4.md §12. Two file changes bundled because R-B1 (flock for race-free single-writer enforcement) cannot be deferred — adding it retroactively means reopening the memory handle. ffi/libc.lua extensions: - cdef flock(int fd, int op), open(...), lseek(int, long, int) - constants LOCK_EX=2, LOCK_NB=4, LOCK_UN=8 - M.flock(fd, op) wrapper returning (true) on success or (false, errmsg) — errmsg is the strerror text so callers can surface "Resource temporarily unavailable" cleanly to the user. history.lua additions (Phase 4 section appended at end): - M.open_memory(path) -> handle \| nil, err Opens the file via libc.open(2) (need integer fd for flock — io.open's FILE* doesn't expose it), takes flock(LOCK_EX \| LOCK_NB). Returns "memory.jsonl held by another aish process" on lock-held. Scans existing content for max id; caches as handle.next_id. Writes meta header on first creation (no id, ignored at load). - handle:add(kind, content, tags?, source?) -> id Assigns next id; appends one JSONL item with auto-timestamp. kind ∈ {fact, pref, context} enforced via assert. - handle:forget(target_id) Appends a tombstone {id, ts, kind:"forget", target}. - handle:close() Releases fd (flock auto-released on close). - M.load_memory(path) -> items_table Reads all lines, builds forget-target set from kind=="forget" entries, returns active items as an array sorted by ts desc. Items without id (meta header) silently dropped. Tombstones with non-matching targets are no-ops (N3 invariant). Round-trip test passes: - open empty file → next_id=1 - add 3 items → ids 1, 2, 3 - forget id 2 (appends tombstone) - reopen → next_id correctly advances past the tombstone (=5) - load_memory → 2 active items (id 1 + id 3); tombstone resolved - lock-held detection: second open while first held → fails with "memory.jsonl held by another aish process" message - close releases the lock; reopen after release succeeds Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 04:52:03 +00:00
marfrit	ffead3986c	docs/PHASE4: review fold-in — flock for race, Norris suppression, summarizer self-amp Independent review found 1 BLOCKER + 3 CONCERNs + 4 NITs. R-B1 (BLOCKER): TOCTOU race on memory.jsonl — two aish processes scanning the same file compute identical next_ids. Resolution: flock(LOCK_EX \| LOCK_NB) on the fd in M.open_memory, held until close. Bundled into commit #1 (per reviewer: cannot defer because adding flock retroactively means reopening the handle). Requires ffi/libc.lua extension: flock cdef + LOCK_EX/LOCK_NB/LOCK_UN constants + M.flock wrapper. R-C1 (CONCERN, closes Q33): [background] block suppressed when ctx.norris_active. Avoids ~16K of redundant tokens per 8-step Norris run. Norris already anchors via its goal in the NORRIS suffix; memory items rarely change step-to-step planning. R-C2 (CONCERN): summarizer self-amplification — running :memory summarize twice in one session would feed the prior summarize call's assistant turn into the next input. Resolution: operate on the session log file (history.load(session_path)) instead of ctx:to_messages(), and tag prior summarize turns with meta="summarize" so they're filterable. R-C3 (CONCERN, cosmetic): §5 diagram clarified that DEFAULT_SYSTEM_PROMPT already carries the Phase 2 MCP block statically — not a separate dynamic block in v1. NITs N1-N4 folded inline: N1 forget no-op for unknown id surfaces a status N2 path note: memory.jsonl is sibling of sessions/, no collision N3 item-id invariants: id >= 1; meta header has no id; tombstones with non-matching targets are no-ops N4 :memory inject semantics explicit (replace ctx.memory_items from a fresh load + LRU-by-ts truncation) §3 module-changes table grew a new ffi/libc.lua row. §12 commit #1 description tightened — flock work bundled inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 04:50:43 +00:00
marfrit	2146b909f8	docs/PHASE4: analyze — surface confirmed, counter strategy locked A1. history.lua surface lines up cleanly for the memory additions — no structural refactor; pure additive functions mirroring the session pattern. A2. Counter persistence: scan at open, cache next_id in handle. O(n) load (n bounded by curation, ~hundreds), no sidecar file. Persisted ids let forget-tombstones target items even across restarts. A3. System-prompt suffix order locked: DEFAULT (carrying Phase 2 MCP block baked in) → Phase 4 [background] → Phase 3 NORRIS. Token cost measured: default ~174 toks, +NORRIS ~364 toks, +NORRIS+2KB background ~865 toks. Well within typical context budgets. No manifest amendments needed — §3/§5 already match. Findings recorded inline as Phase 7 anchors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 04:47:01 +00:00
marfrit	bea717534c	docs/PHASE4: formulate — memory.jsonl + startup injection + :memory meta Phase 4 formulate manifest. Three pillars per PHASE0 §11 row 4: memory.jsonl persistent cross-session store, startup context injection into the system prompt, and the :memory management surface + opt-in :memory summarize for candidate extraction. Resolutions baked in via §2: - Storage: append-only JSONL at <history.dir>/memory.jsonl - Format: {id, ts, kind, content, tags?, source?} - Kinds: fact / pref / context (lightly typed v1) - Forget: tombstone append, resolve at load (set-based) - Cadence: manual :memory summarize only in v1; auto-trigger Q-listed - Inject: dynamic [background] block on system prompt, capped at 2000 chars by default; LRU-by-ts selection if over-budget - Order: DEFAULT → MCP block → [background] → NORRIS suffix (Norris last so it dominates when active) New module surfaces: history.lua M.open_memory / memory:add / memory:forget / M.load_memory context.lua ctx.memory_items + [background] composer repl.lua :remember, :memory add/list/forget/clear/inject/summarize config.lua commented-out memory = {...} example Open questions (Q31-Q36) tracked in §11: Q31 auto-summarize trigger (manual v1; auto-on-quit candidate) Q32 in-place edit vs forget+re-add Q33 Norris-mode interaction (proposal: both blocks stay) Q34 split prefs into a dedicated prompt section? Q35 redaction of sensitive content during summarize Q36 duplicate detection on :memory add 5-commit roadmap in §12 (history → context → repl → summarize → config). No new module files. No substrate amendments to PHASE0 — entirely additive on top of Phase 1's history.lua pattern and Phase 3's dynamic-suffix pattern in context.lua. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 04:25:57 +00:00
marfrit	50666d092f	config: Phase 3 safety example block (commit #6 ) Phase 3 commit #6 (final) per docs/PHASE3.md §12. Documentation-only; commented-out example showing the safety schema: - llm_second_opinion (bool, default true) - llm_model (string, default deep→default_model fallback) - max_norris_steps (int, default 8) The block notes the model-selection trade-off (R-B2): cloud is the independent-class fast option (costs money), deep is the local-but-slow option, fast is self-policing and NOT recommended. No behavior change to existing configs — safety defaults kick in when the block is absent. Phase 3 implementation complete: #1 `bd59ce7` safety static patterns (34 rules) + 87-case test corpus #2 `2abd5da` LLM second-opinion + session cache + opts.max_tokens #3 `d2a53d2` renderer Norris frames #4 `11b1f56` safety.norris_step planner (single iteration) #5 `a404b2a` repl driver + \C-n real binding + :norris/:safety meta + readline rl_insert_text/rl_redisplay #6 (this) config.lua safety example block Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 23:42:46 +00:00
marfrit	a404b2a152	repl: Norris driver + \C-n + :norris/:safety meta (Phase 3 commit #5 ) Phase 3 commit #5 per docs/PHASE3.md §12. Wires safety.norris_step (commit #4) into the REPL with the user-facing surface. ffi/readline.lua extensions (A1 + R-C4): - rl_insert_text + rl_redisplay added to ffi.cdef block; M.insert_text and M.redisplay wrappers exposed. - M.bind: removed `:free()` on previous callback. Now keeps every bound callback pinned for process lifetime in `_pinned` list (alongside `_bound[seq]` for current lookup). Avoids the use-after-free window between unbind and rebind that R-C4 flagged. Memory cost is bounded — one closure per key sequence binding. context.lua Norris suffix (R-C3 / §8): - to_messages() composes a dynamic NORRIS MODE block onto the system prompt when ctx.norris_active is set. The block carries ctx.norris_goal so eviction of the user's "[norris] goal:" turn doesn't lose the anchor. Returns to plain system prompt when Norris exits. repl.lua Norris driver: - prompt() now shows ⚡ marker when ctx.norris_active per PHASE0.md §9. - \C-n bound to a real handler — inserts ":norris " at the cursor (replaces Phase 1 status placeholder). - run_norris(goal) function: sets norris_active + norris_goal, appends a "[norris] <goal>" user turn, renders the banner, then loops calling safety.norris_step with an injected helpers table until a terminal status returns. Renders the closing banner. - norris_halt(): the [N] proceed/skip/abort prompt called by safety.norris_step via helpers.halt. Empty input → abort (safe). - dispatch_tool(): factored from the Phase 2 ask_ai code so safety.norris_step can call it. - norris_exec(): factored exec path for autonomous mode (skips the interactive run_shell cd-status renderer). - :norris <goal> meta — launches autonomous mode - :norris off meta — drops Norris flag (rare; usually 'abort') - :safety patterns meta — lists active is_destructive rules - :safety check <cmd> meta — probes a hypothetical command End-to-end mock-driven test: Submitted ":norris find files in /tmp" → banner → step 1 emits tool_call (auto_approved per policy) → dispatched → frame rendered → step 2 emits "GOAL: complete" → sub-loop exits → DONE banner. 2 broker invocations, no stalls. config.lua safety example block lands in commit #6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 23:42:14 +00:00
marfrit	11b1f566b3	safety: norris_step planner (Phase 3 commit #4 ) Phase 3 commit #4 per docs/PHASE3.md §12. Single-iteration planner. The driver loop in repl.lua (commit #5) calls this in a while loop, advancing step_n on every "continue" return. M.norris_step(ctx, model_cfg, helpers, opts): 1. One broker.chat_stream round-trip — text + tool_calls collected, text streamed via helpers.render_assistant_delta. 2. Parse actions from response: tool_calls (already collected), CMD: lines (via helpers.extract_cmd_lines), GOAL: complete sentinel (line-level exact match per R-C5). 3. Record the assistant turn (with tool_calls if any) and log it. If no actions AND no goal_done → status="stalled". 4. Dispatch tool_calls (structured route first): - is_destructive check on serialized call. - If destructive → halt_fn(proceed/skip/abort). - Else → auto_approve lookup; absent → halt for consent (R-C6: Norris is conservative; auto_approve is the only consent bypass). - On skip: synthesize role:tool turn "[aish] tool call skipped by user" — alternation preserved per C5/C7. - On abort: return status="aborted". - On proceed: dispatch via helpers.dispatch_tool, append role:tool turn with result content. - Argument JSON parse failure also synthesizes a tool turn (same alternation rationale). 5. Dispatch CMD: lines (legacy route): - is_destructive check. - Destructive → halt_fn. - Non-destructive → run directly (Norris user accepted autonomy for non-destructive shell). - skip → ctx:append_exec_output "[aish] CMD skipped by user". - proceed → exec via helpers.exec_cmd, frame via render_exec_begin/end. 6. Skip-budget escalation (R-C1): after dispatch, if ctx.norris_consecutive_skips >= 3 → escalation halt; abort exits, proceed resets counter. 7. Goal-done check AFTER all dispatch (R-C2 / Q25 resolution). 8. Budget check: step_n >= max_steps → status="budget_exhausted". 9. Otherwise → status="continue", driver advances. Helpers are passed in as injected functions rather than directly requiring repl/renderer/executor — keeps safety.lua's coupling clean and norris_step testable with a mocked helpers table. State carried across iterations on the ctx: - ctx.norris_consecutive_skips (resets on any successful proceed) - ctx.norris_goal / ctx.norris_active (set/cleared by the driver) Existing test_safety.lua corpus (87 cases) still passes — norris_step addition doesn't touch is_destructive's behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 23:37:53 +00:00
marfrit	d2a53d2fc7	renderer: Norris autonomous-mode frames (Phase 3 commit #3 ) Phase 3 commit #3 per docs/PHASE3.md §12. Four new renderer functions for Norris mode visual feedback. M.norris_begin(goal) Bold cyan banner on Norris entry, with the goal text on a dim indented line. Frames the start of the planning loop. M.norris_step(n, max_n, descr) Compact one-line step counter ("─ step 3/16 ─") with optional description. Renders before each iteration of the planner. M.norris_halt(step_n, max_n, reason, action) Bold red banner when the destructive-op gate fires. Three indented lines: step counter, reason (red), action text (truncated at 400 chars, newlines collapsed). The interactive proceed/skip/abort prompt is shown after this banner by repl.lua. M.norris_end(status, reason) Closing banner. status ∈ {"done", "aborted", "budget_exhausted", "stalled", "broker_error"}. Color cyan on "done", red otherwise. Optional reason text on a dim line. The interactive prompt `[aish:<model> ⚡]>` activation lands in commit #5 (repl.lua's prompt() function). Smoke-tested all five frames visually — clean ANSI output, correct truncation on long action strings, color discrimination on done/aborted/budget_exhausted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 23:36:44 +00:00
marfrit	2abd5da3a6	safety: LLM second-opinion + session cache (Phase 3 commit #2 ) Phase 3 commit #2 per docs/PHASE3.md §12. Adds the LLM-probe gate on top of commit #1's static patterns. Together they form is_destructive. broker.lua extension: - opts.max_tokens (A2) — passed through to the request body. Phase 3 probes cap at 4 tokens for YES/NO replies. - opts.timeout_ms — overrides model_cfg.timeout_ms per-call. Probe uses 15000ms cap regardless of the model's normal timeout (the user's deep model has 1800000ms for long generations; the probe must stay snappy). - M.chat now accepts an opts table (same shape as chat_stream's). Backwards compatible — existing callers passing (cfg, msgs) unaffected. safety.lua additions: - llm_probe(cfg, system, cmd): single broker.chat call returning "YES"/"NO"/"YES_FAILSAFE"/"YES_UNPARSEABLE" — fail-safe defaults. - llm_second_opinion(cmd, cfg): two-probe protocol per R-B2. Probe 1: "Is this destructive?" — YES → flag. Probe 2 (only if probe 1 said NO): "Is this safe?" inverted question — NO → flag (disagreement = HALT). Both NO → safe. - Session-scoped cache _llm_cache keyed by normalized command (lowercased + whitespace-collapsed). Mitigates Q23 latency for repeated commands within a Norris run. - Model-selection precedence: cfg.safety.llm_model (explicit) → cfg.models.deep (independent local class) → cfg.models[default]. Fail-safe YES if none configured. - is_destructive(cmd, cfg): runs static patterns first (always), then LLM if cfg present + not explicitly opted-out. cfg=nil yields static-only mode (handy for tests). End-to-end verified against hossenfelder using qwen-coder-7b-32k as the deep probe (qwen3-30b-a3b-instruct in repo's config.lua isn't currently loaded on the local backend): cat /etc/hostname → hit=false (LLM: NO, NO inverted = safe) rm /tmp/x.log → hit=true (LLM flagged; static missed because no -r/-f flags) cp /etc/passwd /tmp/passwd.bak → hit=false (safe copy) cache: second probe on same cmd → 0s wall time static-only (cfg=nil): rm -rf /tmp/x → static hit, no LLM call opt-out (llm_second_opinion=false): cp x y → hit=false, no probe Test corpus (test_safety.lua, 87 cases) still all pass — cfg=nil preserves the static-only behavior. Note: production config.lua currently has `deep = qwen3-30b-a3b-instruct` which isn't loaded on the proxy backend right now; Norris users will hit the fail-safe (everything flagged destructive) until either the deep model is brought up OR cfg.safety.llm_model = "cloud" is set to route the probe through anthropic/claude-haiku-4.5. Update the config or model deployment for production use — covered by Phase 3 verify test case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 23:36:06 +00:00
marfrit	bd59ce7243	safety: is_destructive static pattern matcher (Phase 3 commit #1 ) Phase 3 commit #1 per docs/PHASE3.md §12. Static-pattern destructive-op heuristic; no LLM second-opinion yet (lands in commit #2). Implementation: - 34 patterns in DESTRUCTIVE_PATTERNS table, grouped: 9 shell-wrapper patterns (R-B1 — bash -c / sh -c / zsh -c / eval / python -c / perl -e / pipe-to-sh both forms / pipe-to-bash both forms / xargs ... rm). HALT on the wrapper itself; user reads the inner before proceeding. 10 filesystem destructive (rm -rf, find -delete, dd to device, mkfs, shred, wipefs, truncate -s 0, ...). 5 version-control destructive (git push --force/-f, git reset --hard, git clean -fd, git branch -D). 5 database/process (DROP TABLE/DATABASE, TRUNCATE TABLE, kill/pkill -9). 2 permission (chmod 777, chown on root path). - ci=true flag for case-insensitive SQL patterns; rule patterns must be lowercase when ci is set (matcher lowercases input). - pkill -9 ordered BEFORE kill -9; kill rule uses %f[%w] frontier so "pkill -9 nginx" reports "pkill -9" not "kill -9" substring match. - M._patterns exposes the rule table for :safety patterns meta (Phase 3 commit #5) and for the test corpus. - M.norris_step stub stays — lands in commit #4. Test corpus (test_safety.lua, 87 cases): - 49 destructive cases across all categories (incl. all 11 wrapper forms, the canonical curl\|sh end-of-string bypass, sudo-prefixed rm -rf, etc.). - 38 safe cases (read-only commands, non-destructive variants of risky verbs like "git push" without --force, "find" without -delete, "chmod 644", "kill 1234" without -9, etc.). - Documented one accepted false positive: echo "rm -rf /" matches the rm pattern by substring — Norris user can proceed after reading; tradeoff between false positives and false negatives, biased toward false positives per §5. - Run from repo root: `luajit test_safety.lua`. Exit 0 on pass. - Verified all 87 pass at commit time. R-C4 / readline rebind, broker opts.max_tokens, LLM second-opinion, norris_step planner, repl driver, and the wider Norris UX land in subsequent commits per §12. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:47:10 +00:00
marfrit	125f800513	docs/PHASE3: re-review NIT fold-in — pipe-to-sh EOL, ci= note, §12 sync Re-review surfaced one new BLOCKER + two CONCERNs + four NITs. Folded: N1 BLOCKER: `\|%ssh%f[%s]` missed `curl x \| sh` (end-of-string canonical wrapper-bypass — Lua's `%f[%s]` requires transition INTO whitespace, which doesn't happen at EOL). Replaced with two patterns each for sh and bash: `\|%ssh%s` (followed by whitespace/args) and `\|%ssh%s$` (end-of-string). Same for bash. Verified against 18 wrapper-bypass test cases — all canonical idioms now HALT. N2 CONCERN: `ci=true` rule flag had no implementation note. Added one sentence to §5 explaining the matcher lowercases the input string when ci is set. N3 CONCERN: §12 commit #5 description was stale — still said "extends interactive CMD: extraction to consult is_destructive" which contradicts the R-B3 resolution (Norris-only). Rewrote commit #5 description to match R-B3, and bundled the ffi/readline.lua `_bound[seq]:free()` removal into commit #5's scope with explicit "Phase 1 amendment" callout. Same for the §12 risk note that still referenced the dropped behavior change. Other NITs (N4 skip threshold, N5 approved-turn mention, N6 :model swap interaction, N7 commit-attribution wording) are cosmetic and will fold in-flight during implement if material. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:45:25 +00:00
marfrit	91ddcb005d	docs/PHASE3: review fold-in — security-layer BLOCKERs resolved Independent review surfaced 3 BLOCKERs + 6 CONCERNs + 7 NITs against the analyze-tier draft. Resolutions applied: BLOCKERs: B1 Shell-wrapper bypass — static patterns leaked on bash -c, sh -c, eval, pipe-to-shell, python -c, xargs\|rm. Added 9 wrapper patterns to §5. Norris HALTs on any wrapper invocation; user reads the inner before proceed. The patterns are the conservative floor against the wrapper bypass class. B2 LLM second-opinion was self-policing — same model class generating actions then judging them. Switched probe model from `fast` to `deep` (qwen3-30b). Added re-roll inversion: if first probe says NO, ask "is this SAFE?". Disagreement between two probes → HALT. Cheap independent-class insurance. B3 `is_destructive` would have run on interactive CMD: extraction — a PHASE0 §6/§10 substrate amendment in disguise. Resolved Q24: heuristic runs ONLY when norris_active == true. No substrate change; interactive `confirm_cmd` semantics unchanged. CONCERNs: C1 Skip-budget: consecutive_user_skips counter; 3+ similar skips escalate to abort/force-proceed prompt. C2 Algorithm-vs-Q25-resolution contradiction: §4 reordered to dispatch ALL pending actions before checking GOAL: complete. C3 Norris-goal eviction: goal embedded directly in the dynamic system-prompt suffix; survives sliding-window eviction. C4 Readline use-after-free window: M.bind no longer frees old callbacks; pin for process lifetime (bounded memory cost). C5 GOAL: complete matcher: line-level scan, exact match after trim — substrate-aligned with CMD: rigor. C6 §4 step 4 tightened: auto_approve does NOT bypass destructive heuristic; tool_call without auto_approve still HALTs even when destructive-clear (Norris conservative). NITs deferred or rolled into pattern table: - chown root-path pattern tightened (NIT 2 in-line) - Test corpus expansion noted in §12 commit #1 risk - Other NITs are wording-level Status: Plan (review folded). Ready for commit #1 (safety static patterns) once another review pass clears. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:42:58 +00:00

1 2 3

102 Commits