marfrit/aish - aish - marfrit's space

Author	SHA1	Message	Date
marfrit	ff5a545404	config: example for context.summarize_every_n_turns (#101 ) Documents the new cadence-summarization keys (summarize_every_n_turns, summarize_keep_recent) inline with the existing Phase 5 eviction summarize block. Notes the composition with summarize_on_evict and the Norris suppression parity. Config-only commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 09:21:16 +00:00
marfrit	c9009399d6	config: example block for cfg.memory.auto_summarize_on_quit (#102 ) Documents the new opt-in keys (auto_summarize_on_quit, min_turns_for_summary, summary_model) inline with the existing Phase 4 memory block. Notes that the older summarizer_model key is still honored for back-compat. Config-only commit; no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 09:18:15 +00:00
marfrit	cb37fa861a	phase10: config.lua example for cfg.norris.{preplanner,executor} Documents the new Phase 10 / #89 surface for users: when, why, and how to set cfg.norris.preplanner + .executor + .tasks_max. Notes the graceful single-model fall-back when preplanner is unset OR fails, and the design choice that preplan does NOT route via call_broker (retry would silently swap planning models). C5: config-only commit; no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 08:22:22 +00:00
marfrit	c55077bc07	context + repl + config: route-aware context compression (closes #87 ) Small local models effectively use a fraction of their advertised context window. Per-request compression for routes that hit a local-compress-flagged model preset: keeps only the last N turns and tail-truncates oversized content. Cloud routes get the full context unchanged. Changes: - context.lua _compress_turns(turns, keep, max_chars): returns a new list (self.turns NEVER mutated) with the last `keep` turns preserved + content tail-truncated to `max_chars`. Defensive: drops tool turns at the slice head (orphaned without their assistant-with-tool_calls anchor — strict chat templates would reject them; same gotcha PHASE0 §6 warned about for user/user). - Context:to_messages(opts) — opts.compress = { keep_turns, max_turn_chars } swaps the turn iterable for the compressed view. Affects BOTH the use_tool_role=true path and the use_tool_role=false fallback (PHASE2.md Q18 strict-template workaround). Persistence + display via :history see the full uncompressed ctx.turns. - repl.lua ask_ai: when req_cfg (the routed model's cfg) has `local_compress = true`, build compress_opts from config.context.compress (defaults keep_turns=2, max_turn_chars=800). Pass through ctx:to_messages alongside the existing system_prompt_override (#86) — orthogonal opts that compose. - Norris unaffected: safety.norris_step builds its own messages array; the planner needs full history per PHASE3 design. - config.lua gains a header comment explaining the per-model opt-in + the context.compress defaults block + the documented tool-turn truncation trade-off. 13 unit cases verified: - no opts -> full turn list (no regression) - keep_turns=2 -> exactly last 2 emitted - long content tail-truncated to max_chars - self.turns unchanged after render - orphan tool-turn at slice head dropped (no chat-template violation) - tool turn included WITH its assistant anchor when keep_turns >= 3 E2E against live local broker: - models.fast.local_compress = true; keep_turns=1; max=200 - 4-turn session: each broker call sees ONLY the current turn (verified by short coherent CMD replies despite no cross-turn memory available to the model). FR-promised small-model friendliness in action; conversation continuity is the documented trade-off. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 07:50:07 +00:00
marfrit	74e4bffb37	broker + repl + safety: GBNF grammar-sampling passthrough (closes #88 ) llama.cpp constrains the sampler to ONLY emit tokens matching a GBNF grammar. For small models this kills format drift at the token level — `CMD: <cmd>` is enforced by the sampler rather than hoped for via prompt discipline. Probe finding (this commit's pre-implementation): cloud (Anthropic via Bedrock) silently IGNORES the `grammar` field — returns normally via standard sampling. Default passthrough is safe for all routes; no per-model opt-in/opt-out needed in v1. Changes: - broker.lua build_request: `if opts.grammar then req.grammar = opts.grammar end`. Misformed grammar surfaces at request time via the existing transport-error path. - repl.lua ask_ai: `grammar_override = config.routing.grammars [req_class]` (same gating shape as #86's system_prompts override). Passed via opts.grammar in the call_broker invocation. - safety.lua is_destructive threads cfg.safety.probe_grammar through opts.grammar so llm_probe constrains the YES/NO output. Skips the regex-match dance entirely when the model can't drift. Caller-provided opts.grammar takes precedence over cfg. - config.lua gains two commented examples: * routing.grammars per class * safety.probe_grammar for the destructive probe 6 unit cases verified (stubbed curl.post_sse / broker.chat): - default: no grammar in body - opts.grammar -> body contains grammar JSON-encoded - safety probe_grammar reaches llm_probe via opts - no probe_grammar configured -> opts.grammar nil - caller opts.grammar takes precedence over cfg.safety.probe_grammar E2E against live local broker: - `routing.grammars.default = "root ::= \\"ACK\\""` configured; prompted "tell me a long story about a fox" -> model output EXACTLY "ACK" (sampler forced; would normally produce paragraphs). Grammar passthrough end-to-end confirmed. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 07:00:36 +00:00
marfrit	047d629a66	context + repl + config: per-class system_prompt override (closes #86 ) Small local models follow precise structured instructions better than natural language. Per-routing-class system_prompt override gives them tighter instructions for THAT request while preserving ambient context. Changes: - Context:to_messages(opts) — opts.system_prompt_override REPLACES the base system_prompt for THIS render only (state unchanged). Dynamic blocks ([background], [project], [earlier summary], NORRIS suffix) still compose on top. opts is optional; nil-safe for old callers. - repl.lua ask_ai — captures req_class from router.classify_model (already returned by Phase 5; previously discarded after the status line). Looks up config.routing.system_prompts[req_class]; passes as opts.system_prompt_override to ctx:to_messages each iteration of the tool-sub-loop. - Gating: override fires only when routing.auto is on (no class -> no override). If system_prompts[class] absent for a class, fall through to the default system_prompt (no surprise). - Norris unaffected: safety.norris_step builds its own messages array; doesn't go through this path. - config.lua gains a commented-out example showing routing.system_ prompts with the code/default examples from the FR body. Smoke verified: - 12-case context.lua unit test: opts nil/absent/present, override replaces base, dynamic blocks still compose, state unchanged after call, Norris-mode coexistence (suffix still present; background still suppressed). - E2E against cloud broker with routing.system_prompts.code set: triple-backtick prompt -> code class -> override fires; model emits terse code-only output. Non-code prompt -> default class -> no override -> normal verbose-ish reply. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 05:41:15 +00:00
marfrit	df59ee2f2c	config + docs/PHASE9: template comment + status -> Implement (Phase 9 commit #4 ) config.lua header gains a Phase 9 paragraph documenting the project-overlay feature + the R7 shallow-merge warning ("if your .aish.lua sets a top-level block, it REPLACES the user's entire block — list every entry OR omit the block"). Inspect at runtime via `:config show`. docs/PHASE9.md status header bumped: "Plan + review fold-in" -> "Implement". Lists the 4 implement commits inline: `e525063` history: trust file helpers `34b465d` main: project-overlay loader `5b6ee55` repl: :config show meta + HELP this config template comment + status bump Phase 9 implementation complete. Next inner-loop step: verify (file TCs, run autonomous, close) + memory-update. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:54:53 +00:00
marfrit	08dba69fce	config + docs/PHASE8: example block + status -> Implement (Phase 8 commit #5 ) config.lua: - Commented-out `tokenize = { use_endpoint = true }` block with parity to the Phase 1-7 example blocks. - Documents the two consequences: (1) per-turn network cost (~30ms first time, cached after) and (2) token_budget is now actually enforced — sessions that fit under char/4 may evict earlier under accurate counts. - Notes cloud /tokenize 404 fallback path. docs/PHASE8.md: - Status header bumped: "Plan + review fold-in" -> "Implement" - Lists the 5 implement commits inline for traceability: `7ef2a6e` broker: token_count + endpoint cache `8502517` context: tokenize_fn + _tokens cache `db26d0c` context: enforce_budget honors token_budget (R2 guard) `94b7d86` repl: wire tokenize_fn + :cost detail estimate row this config example + status bump Phase 8 implementation is complete. Resolves Q1 (PHASE0 §13, originally Phase 3, deferred forward). Next inner-loop step: verify (7) — file test cases, run autonomous, close. Then memory-update (8). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:32:16 +00:00
marfrit	1f34b6dce8	config + docs/PHASE7: example block + status -> Implement (Phase 7 commit #6 ) R9-resolved single-owner of the status bump (commit #5 didn't touch PHASE7.md). N5: PHASE0 §11 amendment landed in commit `3bad07b` (formulate); not re-applied here. config.lua: - Commented-out `cost = { warn_at_dollars, warn_at_tokens }` block with parity to the Phase 1-6 example blocks. - Notes warn flags are independent (R4) and per-turn usage flows to session/*.jsonl for after-the-fact analysis. docs/PHASE7.md: - Status header bumped: "Plan + review fold-in" -> "Implement" - Lists the 6 implement commits inline for traceability: `7364963` broker: usage capture + opts widening `7b4a9be` context: accumulator helpers `8adebd5` repl: _record_usage + opts.category at 5 sites `b30212a` safety + repl: opts.category for Norris + probe `0d6ff93` repl: :cost meta surface this config example + status bump Phase 7 implementation is complete. Next inner-loop step is verify (7) — user-driven smoke tests, then memory-update (8). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:02:55 +00:00
marfrit	ac58b19da2	config + docs/PHASE6: example block + status -> Implement (Phase 6 commit #6 ) R9-resolved single-owner of the status bump (commit #5 didn't touch PHASE6.md per the review fold-in). config.lua: - Commented-out `project = { auto_tree, tree_depth, tree_max_chars }` block with the same shape as the Phase 1-5 example blocks. - Note that :diff / :tree / :highlight all work without config; the `project` block ONLY controls the startup auto-inject. - Note about :highlight v1 having no config flag (runtime-only), cross-references the in-REPL install hint. docs/PHASE6.md: - Status header bumped: "Plan + review fold-in" -> "Implement" - Lists the 6 implement commits in the header for traceability: `c4fc7fd` context: compose_project plumbing `d1dce83` _scan_project_tree + :tree + auto_tree hook `4d5f93a` :diff + _git_clean_cmd (B1 helper) `0d63f01` expand_mentions @<r1>..<r2> tiered resolution `11d0e59` tree-sitter highlighter (renderer fence filter + highlighted dispatch + :highlight meta) this config example + status bump Phase 6 implementation is complete. Next inner-loop step is verify (7) — user-driven smoke tests against the live broker on each pillar plus filing of issues for any defects, then memory-update (8). Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:27:58 +00:00
marfrit	d852acadc2	repl: wire #13 secrets — scrub outbound, rehydrate stream + tool args Plumbs the secrets.lua module (commit `e4b818b`) into the conversation pipeline. Hook points: ask_ai — scrub_messages(ctx:to_messages(), mode) before call_broker; rehydrate streamed deltas via streaming_rehydrator so the user sees real values while text_parts accumulates rehydrated chunks (final_resp is plain — CMD: / DELEGATE: extractors see plain values) MCP dispatch — dispatch_tool_call rehydrates the args table before sess:call_tool so the trusted MCP server receives real values (the model emitted placeholders because it saw a scrubbed context) DELEGATE: & :delegate — scrub sub_msgs before broker.chat; rehydrate sub_text before appending to context, so future turns see real values restored Phase 5 summarize-on-evict — scrub sum_msgs before broker.chat; rehydrate the reply that becomes ctx.summary :memory summarize — same scrub + rehydrate pair Mode resolution per call: model_cfg.redact → config.secrets.default → "vault+autodetect" if vault loaded, else "off". ctx storage convention: PLAIN values throughout. The scrub happens at the egress (broker call) per the active redact mode; ctx.turns never holds placeholders for content the user typed or executor produced. The model's own emissions (assistant tool_call arguments) may carry placeholders because the model saw the scrubbed context — rehydrated at MCP dispatch and otherwise harmless on re-serialization (idempotent re-scrubbing). New meta: :secrets [status] vault entries, placeholders allocated this session, active broker mode. Never prints actual values (vault file is itself a secret per gotcha 7). :secrets check <text> dry-run scrub against the active broker's mode — shows the output transformation. Documented in config.lua with a commented-out block + per-broker redact field example. Deferred to a follow-up issue (clearly scoped): - safety.lua broker call sites (Norris main loop, is_destructive LLM second-opinion probe) — same wiring pattern, but they don't currently see secrets_session; needs threading through helpers. - @-mention file content is appended PLAIN to ctx and scrubbed at egress alongside the rest of the user turn (covered by the ask_ai scrub). - exec output streamed live to terminal is pre-scrub (user sees real values in their own shell — by design); the captured-for- context copy is scrubbed at egress alongside the rest. This is the "full scope" implementation chosen via AskUserQuestion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:38:23 +00:00
marfrit	f94d16fc89	repl: background CMD&: with handle/poll (closes #8 ) Builds, long-running network calls, and file watches no longer block the turn. A new "CMD&: <cmd>" marker (analogue of CMD:) tells the REPL to spawn the command in the background, return immediately, and poll for completion between user inputs. Process model: shell-wrapped to avoid needing fork()/execv() FFI. nohup sh -c '(<cmd>) > <log> 2>&1; echo $? > <status>' </dev/null >/dev/null 2>&1 & echo $! The child is reparented to init; we hold only the PID and the path to the .status sidecar. Completion is detected by the .status file existing (the wrapper writes it as its last act). No waitpid needed — the child isn't ours after the popen subshell exits. Storage: <history.dir>/bg/<id>.log + <id>.status. The directory is created lazily at startup (mkdir -p). Requires history.dir to be configured; without it CMD&: emits an error status and the model sees an "[bg failed to start]" exec-output note. check_bg_done() runs at the top of each main-loop iteration alongside check_every_due(). When a job is detected as exited, the REPL: - emits a status line "[bg:<id> exited <code>, <bytes>, <secs>s wall] <cmd>" - appends the same string to ctx as exec output, so the model sees the completion on its next turn (natural follow-up: "ok the build finished; let me check the log") Meta surface: :bg-spawn <cmd> start a bg job directly (no AI needed; also useful for testing without depending on the model emitting CMD&:) :bg-list show running/done jobs (id, pid, state, runtime, cmd) :bg-output <id> dump the log file to stdout :bg-kill <id> SIGTERM (note: only delivers if the PID is still the actual command — long-lived shells may need pkill by name) Scope (deliberately limited for v1): - No callback-mode readline: bg completion detection is pre-prompt, not mid-readline. If a build finishes while the user is typing, notification comes when they hit Enter. - Permission policy DSL (#9) does NOT apply to CMD&: — the asynchronous gating model wasn't designed for the y/N flow. Filed as follow-up if needed. - Norris not extended: helpers.exec_cmd is still synchronous; the planner doesn't dispatch bg jobs. - Plan mode interaction: CMD&: in plan mode emits "PLAN: & <cmd>" and a "[plan] would bg-run: <cmd>" exec-output note, no spawn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:25:55 +00:00
marfrit	17e62c0326	safety: permission policy DSL — allow/confirm/deny rule lists (closes #9 ) The confirm_cmd boolean was too coarse: true interrupts every harmless ls; false ungates everything. Most workflows want trust for read-only ops while still gating writes/network/sudo. New config: permissions = { allow = { "^ls%s", "^cat%s", "^git status" }, confirm = { "^rm%s", "^git push", "^docker%s", "^sudo%s" }, deny = { "^ssh%s+root@", "^curl%s+http[^s]" }, } Verdict order: deny > confirm > allow. First match in the chosen category wins. Unmatched defaults to "confirm". Patterns are Lua patterns (not regex) per PHASE0.md §3 — no compiled extensions. Verdict behavior in the interactive CMD: loop: - allow → run without prompt - deny → status line, skip - confirm → [y/N] prompt (same UX as legacy confirm_cmd=true) Backward compat: - permissions unset + confirm_cmd=true → always confirm - permissions unset + confirm_cmd=false → always allow - permissions set → policy table is authoritative Scope deliberately limited to the interactive AI-suggested CMD: gate. Norris autonomous mode keeps its own safety.is_destructive machinery (combining the two would double-gate or replace the LLM probe — both non-obvious behavioral changes that belong in their own issues). User-typed shell-routed lines (`router.classify → "shell"`) and :exec also bypass the policy by design — those are direct user intent. New introspection: :perms list — show the configured rule lists :perms check <cmd> — report verdict + matching rule (debug) safety.classify_command is exported and unit-tested with 12 cases covering each category, priority order (deny > allow on overlap), and both fallback paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:20:56 +00:00
marfrit	fb15f7a690	repl: pre/post CMD hooks via config.hooks (closes #3 ) Optional shell scripts trigger around every CMD: execution. Use cases: audit logging, auto-format-after-edit, custom safety gates beyond the existing confirm_cmd boolean. Config shape: hooks = { pre_cmd = "/path/to/pre-script", post_cmd = "/path/to/post-script", } Contract per hook invocation: - The command line is piped to the hook on stdin. - Env vars: AISH_CMD (the command), AISH_TURN (#ctx.turns at the moment of dispatch), AISH_CWD (libc.getcwd() result). - Hook stdout is streamed live to the terminal via executor.exec (so the user sees its output regardless of exit status). Pre-hook: non-zero exit aborts the command and emits a status line including the exit code. last_exec_code is set to the hook's exit so the {last_status} prompt template variable reflects the abort. Post-hook: exit code is ignored (the spec says so); only the visible stdout matters. Runs after the command's exec_end frame. Tested with success, abort, and stdin-matches-env paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:16:11 +00:00
marfrit	d738f339cb	repl: configurable prompt template via config.shell.prompt (closes #10 ) At-a-glance situational awareness: see the active model, context fill, mode flags, and cwd in the prompt itself — prevents "wait, am I still in plan mode?" surprises. Example config: shell = { prompt = "[{model} {ctx_used}/{ctx_max}t T{turn} {mode}] {cwd_short} > ", } Variables (substituted via {name}): {model} active preset name {ctx_used} char/4 token heuristic (Phase 0 §8; accurate is Q1) {ctx_max} config.context.token_budget {turn} #ctx.turns {cwd} libc.getcwd() (chdir-aware; PWD env may drift) {cwd_short} cwd with $HOME -> ~ {last_status} last exec exit code, "" if none yet {mode} "norris" \| "plan" \| "normal" Default behavior unchanged when shell.prompt is unset — keeps the "[aish:<model>]>" form with norris ⚡ and plan markers. Side wiring: - ffi/libc.lua gains getcwd() (chdir() doesn't update PWD). - run_shell records exit code into last_exec_code for {last_status}. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 21:14:43 +00:00
marfrit	d72689f709	config: deep model → deepseek-coder-v2-lite (temporary) qwen3-30b-a3b-instruct isn't loaded on hossenfelder right now (per /v1/models). deepseek-coder-v2-lite IS loaded — 16B MoE with ~2.4B active params; fast enough that the 30-min timeout from the qwen3-30b config was wildly over-budget. Switched to deepseek-coder-v2-lite for the time being. Restore qwen3-30b when the slot is back up. Live-probed: YES/NO destructive probe via the deep model preset returns "YES." in ~4.8s — well within the new 5-min timeout, and fast enough that the Phase 3 LLM second-opinion path is now functional again without falling back to "fail-safe YES" on every ambiguous command. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:42:23 +00:00
marfrit	a9b39cd435	config: Phase 5 routing + summarize-on-evict example (commit #5 ) Phase 5 commit #5 (final) per docs/PHASE5.md §11. Documentation-only; commented-out example showing: - routing.auto (per-request auto-routing toggle) - routing.classes (class → model mapping; reasoning = nil by default per R-N2 cost-safety) - routing.fallback (single-hop retry to cloud on transport fail) - routing.fallback_model (default "cloud" if uncommented) - context.summarize_on_evict + summarizer_model + max_summary_chars (shown INSIDE the context = {...} block above) All defaults OFF — Phase 5 is opt-in across the board. Existing configs without `routing` or `context.summarize_on_evict` behave identically to Phase 4. Phase 5 implementation complete: #1 `3e57824` router.classify_model + 31-case corpus #2 `03497b5` context summarize_fn callback + summary block in to_messages #3 `40ea0b4` repl routing + fallback + summarize_fn wiring + :route/:fallback #4 - (bundled into #3 since meta cmds are trivial additions) #5 (this) config example block Phase 5 verify-partial: - router.classify_model: 31/31 case corpus passes - context summarize-on-evict: mock callback fires correctly (additive + compress paths), summary suppressed under Norris, :reset clears it - repl meta cmds: :route on/off/classes/check + :fallback on/off all work; :route check reports class + "routing currently disabled" suffix when auto is off (N1) Verify-pending: end-to-end with real broker (route a code question, see it land on deep; kill local backend, see fallback fire to cloud). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 11:32:20 +00:00
marfrit	27784f9b68	config: Phase 4 memory example block (commit #5 ) Phase 4 commit #5 (final) per docs/PHASE4.md §12. Documentation-only; commented-out example showing: - inject_max_chars (cap on startup injection; default 2000) - summarizer_model (which configured model :memory summarize uses) The block is OFF by default. The :memory meta surface (:remember, :memory list/forget/clear/inject/summarize) works without the block — items persist to <history.dir>/memory.jsonl regardless. The block only configures the injection-into-system-prompt behavior + summarizer model choice. Phase 4 implementation complete: #1 `199dd87` history.lua memory store + ffi/libc.lua flock #2 `c1a5c73` context.lua [background] block (suppressed in Norris) #3 `3b074af` repl.lua memory handle + :remember + :memory meta #4 `f22d21d` :memory summarize — LLM candidate extraction #5 (this) config.lua memory example block Phase 4 verify-partial: - history memory round-trip tests: add/forget/load all green - flock single-writer enforcement verified - context composition order (DEFAULT → [background] → NORRIS) + Norris suppression all green - End-to-end persistence across boots: :remember on boot 1 visible on boot 2 as injected memory items - :memory forget id-not-active surfaces clean status (N1) - :memory clear with [y/N] confirm gate works - :memory summarize wire-correct against fast model (candidate parsing tolerates bullets; per-candidate y/N/edit prompts fire) Verify-pending: real-model summarizer quality test (deep/cloud); multi-process flock contention test; long-running :memory inject race with running broker stream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 07:53:58 +00:00
marfrit	50666d092f	config: Phase 3 safety example block (commit #6 ) Phase 3 commit #6 (final) per docs/PHASE3.md §12. Documentation-only; commented-out example showing the safety schema: - llm_second_opinion (bool, default true) - llm_model (string, default deep→default_model fallback) - max_norris_steps (int, default 8) The block notes the model-selection trade-off (R-B2): cloud is the independent-class fast option (costs money), deep is the local-but-slow option, fast is self-policing and NOT recommended. No behavior change to existing configs — safety defaults kick in when the block is absent. Phase 3 implementation complete: #1 `bd59ce7` safety static patterns (34 rules) + 87-case test corpus #2 `2abd5da` LLM second-opinion + session cache + opts.max_tokens #3 `d2a53d2` renderer Norris frames #4 `11b1f56` safety.norris_step planner (single iteration) #5 `a404b2a` repl driver + \C-n real binding + :norris/:safety meta + readline rl_insert_text/rl_redisplay #6 (this) config.lua safety example block Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 23:42:46 +00:00
marfrit	f26cbd9a3a	phase2 amend: __ separator (Bedrock-safe) + post_sse error diagnostics Phase 7 verify finding from TC #26 against :model cloud: HTTP 400 from openrouter→Amazon Bedrock: "tools.0.custom.name: String should match pattern '^[a-zA-Z0-9_-]{1,128}$'" Anthropic via Bedrock validates tool names against that regex and rejects dots. PHASE2 originally chose "." as the namespace separator ("boltzmann.list_dir"); OpenAI tolerated it, Bedrock does not. Separator switched to "__" (two underscores) everywhere — internal API matches on-wire shape, no transformation layer: - repl.lua: - tools_schema builds "alias__name" - dispatch_tool_call splits via "^(.-)__(.+)$" (non-greedy → leftmost __) - :mcp tool parser uses same split - :mcp tools formatter prints "alias__name" - HELP block shows <alias__name> - safety.lua confirm_tool_call: alias.* glob → alias__* glob - config.lua example block: keys rewritten - docs/PHASE2.md: amendment header added; §1, §2 row, §3 config.lua row, §5 wire-shape JSON examples, §6 auto_approve schema, §7 meta-cmd table, §12 plan all updated. Original "." references preserved in commit history. Constraint: aliases must not themselves contain "__" so the parse stays unambiguous. Tool names from MCP servers may have underscores freely. Second fix bundled — uninformative broker error: Previously "broker error: transport: HTTP response code said error" Now "broker error: transport: HTTP 400: {full body snippet}" ffi/curl.lua M.post_sse changes: - FAILONERROR no longer set (was hiding the response body). - raw_body accumulator added alongside the SSE buffer; captures every byte regardless of SSE shape. - After perform, check status_code via curl_easy_getinfo. On >=400, return (nil, "HTTP <code>: <body[:400]>"). 2xx unchanged. - End-of-stream SSE flush only runs on 2xx (no false event on error bodies that aren't SSE-shaped). - Phase 1 callers reading just first return slot stay correct. End-to-end verified: - :model cloud + tools=[boltzmann__read_file ...] + "Use boltzmann__read_file with path=/etc/hostname" → Claude emits tool_call with name="boltzmann__read_file", args='{"path": "/etc/hostname"}'. ok=true, transport clean. - Force-bad tool name "bad.name.with.dots" → err string carries the full bedrock 400 with the regex-pattern message visible. TC #26 (sub-loop end-to-end) is now testable against cloud — the error that blocked it is resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 20:04:57 +00:00
marfrit	09800d192a	config: Phase 2 mcp example block + deep model switch Phase 2 commit #7 (final) per docs/PHASE2.md §12. Two changes bundled: (1) commented-out mcp = {...} example block (~40 lines) at the end of config.lua showing the Phase 2 schema: - mcp.servers — alias → {url, auth_token \| auth_env} - mcp.auto_approve — "<alias>.<tool>" or "<alias>.*" globs - mcp.max_tool_depth — sub-loop budget per ask_ai turn The block is OFF by default; uncomment + adjust per fleet to activate. Documentation-only; no behavior change to existing configs (mcp_sessions stays empty, tools_schema() returns [], broker omits the field — full Phase 1 compatibility). (2) User-authored: deep model preset switched from mistral-nemo-12b-instruct to qwen3-30b-a3b-instruct, with a 10-min timeout_ms accommodating the larger model's RK3588 inference time. Reason: nemo backend is dormant per the proxy /v1/models discovery (aish#23 now returns 404 cleanly for unknown models instead of silent fallback); qwen3-30b is the practical "deep" alternative. Phase 2 implementation is now complete — 7 of 7 commits landed: #1 `6c194de` mcp.lua + ffi/curl status_code + PHASE0 §4 amendment #2 `0fde77f` safety.lua confirm_tool_call #3 `7c221a8` context.lua tool turns + use_tool_role fallback #4 `c736d0e` renderer.lua tool-call frames #5 `efdc728` broker.lua opts.tools + tool_call accumulator #6 `7e9cfff` repl.lua sub-loop + :mcp meta + system-prompt block #7 (this) config.lua example + deep model switch Next phase-loop step: verify (Phase 7). Files written are wired and isolated-tested; end-to-end model-driven verification waits on either a more compliant model or explicit forcing of tool_calls from the prompt — known to be marginal with the loaded qwen-1.5b but proven correct against direct probes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:40:21 +00:00
marfrit	8870eb0451	config: route all presets through hossenfelder per issue #12 Resolves issue #12 by partial-accept of the recommendation. What landed: - Single broker URL: http://hossenfelder.fritz.box:8082 for all three presets (fast / deep / cloud). Server-side model-aware routing; no client-side cloud auth (proxy holds the OpenRouter bearer). - Models from hossenfelder's /v1/models inventory: fast -> qwen2.5-coder-1.5b-q4_k_m.gguf (boltzmann local) deep -> mistral-nemo-12b-instruct (boltzmann local) cloud -> anthropic/claude-haiku-4.5 (OpenRouter route) - `cloud` was already pointing at hossenfelder but with https://; flipped to http:// so it matches the proxy's actual scheme. What deferred: - Schema rename `models` -> `brokers` (and the 5-cloud-preset shape suggested in #12) — would touch repl.lua + broker.lua. Not blocking Phase 7. If multi-preset becomes useful in practice, file a separate issue for the rename then. Phase 7 verification (live broker test): - broker.chat(fast, [user="say pong"]) -> "CMD: echo pong" in ~3s - multi-turn arithmetic (78=56, 2=112) preserved across turns Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 18:06:08 +00:00
claude-noether	4310207738	Phase 0: scaffold tree + manifest - README, .gitignore, CLAUDE.md (project conventions) - docs/PHASE0.md — full Phase 0 manifest (locked substrate) - 10 root .lua modules + 4 ffi/ bindings, all stubs raising NotImplemented with module-scoped responsibilities matching the manifest - config.lua wired to current dirac/hossenfelder endpoints (qwen-coder-7b snappy/32k + cloud via OpenRouter through hossenfelder) File names match docs/PHASE0.md §4 exactly. Module bodies fill in across later phases; the tree shape is locked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 23:16:07 +00:00

23 Commits