marfrit/aish - aish - marfrit's space

Author	SHA1	Message	Date
marfrit	125f800513	docs/PHASE3: re-review NIT fold-in — pipe-to-sh EOL, ci= note, §12 sync Re-review surfaced one new BLOCKER + two CONCERNs + four NITs. Folded: N1 BLOCKER: `\|%ssh%f[%s]` missed `curl x \| sh` (end-of-string canonical wrapper-bypass — Lua's `%f[%s]` requires transition INTO whitespace, which doesn't happen at EOL). Replaced with two patterns each for sh and bash: `\|%ssh%s` (followed by whitespace/args) and `\|%ssh%s$` (end-of-string). Same for bash. Verified against 18 wrapper-bypass test cases — all canonical idioms now HALT. N2 CONCERN: `ci=true` rule flag had no implementation note. Added one sentence to §5 explaining the matcher lowercases the input string when ci is set. N3 CONCERN: §12 commit #5 description was stale — still said "extends interactive CMD: extraction to consult is_destructive" which contradicts the R-B3 resolution (Norris-only). Rewrote commit #5 description to match R-B3, and bundled the ffi/readline.lua `_bound[seq]:free()` removal into commit #5's scope with explicit "Phase 1 amendment" callout. Same for the §12 risk note that still referenced the dropped behavior change. Other NITs (N4 skip threshold, N5 approved-turn mention, N6 :model swap interaction, N7 commit-attribution wording) are cosmetic and will fold in-flight during implement if material. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:45:25 +00:00
marfrit	91ddcb005d	docs/PHASE3: review fold-in — security-layer BLOCKERs resolved Independent review surfaced 3 BLOCKERs + 6 CONCERNs + 7 NITs against the analyze-tier draft. Resolutions applied: BLOCKERs: B1 Shell-wrapper bypass — static patterns leaked on bash -c, sh -c, eval, pipe-to-shell, python -c, xargs\|rm. Added 9 wrapper patterns to §5. Norris HALTs on any wrapper invocation; user reads the inner before proceed. The patterns are the conservative floor against the wrapper bypass class. B2 LLM second-opinion was self-policing — same model class generating actions then judging them. Switched probe model from `fast` to `deep` (qwen3-30b). Added re-roll inversion: if first probe says NO, ask "is this SAFE?". Disagreement between two probes → HALT. Cheap independent-class insurance. B3 `is_destructive` would have run on interactive CMD: extraction — a PHASE0 §6/§10 substrate amendment in disguise. Resolved Q24: heuristic runs ONLY when norris_active == true. No substrate change; interactive `confirm_cmd` semantics unchanged. CONCERNs: C1 Skip-budget: consecutive_user_skips counter; 3+ similar skips escalate to abort/force-proceed prompt. C2 Algorithm-vs-Q25-resolution contradiction: §4 reordered to dispatch ALL pending actions before checking GOAL: complete. C3 Norris-goal eviction: goal embedded directly in the dynamic system-prompt suffix; survives sliding-window eviction. C4 Readline use-after-free window: M.bind no longer frees old callbacks; pin for process lifetime (bounded memory cost). C5 GOAL: complete matcher: line-level scan, exact match after trim — substrate-aligned with CMD: rigor. C6 §4 step 4 tightened: auto_approve does NOT bypass destructive heuristic; tool_call without auto_approve still HALTs even when destructive-clear (Norris conservative). NITs deferred or rolled into pattern table: - chown root-path pattern tightened (NIT 2 in-line) - Test corpus expansion noted in §12 commit #1 risk - Other NITs are wording-level Status: Plan (review folded). Ready for commit #1 (safety static patterns) once another review pass clears. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:42:58 +00:00
marfrit	cf4d79dd9d	docs/PHASE3: analyze + baseline — \C-n mechanics, LLM latency, module pre-state Analyze findings folded into the manifest: A1. \C-n binding can't toggle mid-prompt without rl_insert_text / rl_redisplay. Solution: bind those (one cdef + 2 wrappers in ffi/readline.lua) so \C-n inserts ":norris " at the cursor; user types goal + Enter. Routes through existing meta dispatch. A2. broker has no max_tokens passthrough. Add opts.max_tokens for the LLM second-opinion path (terminates at ~2 tokens; verified proxy honors it). A3. Phase 2 tool-sub-loop pattern IS the planner shape. safety.norris_step is the per-iteration extraction; driver loop in repl.lua. Module-changes table (§3) updated with the rl_insert_text and max_tokens rows. Baseline doc (PHASE3-baseline.md, 80 lines) captures: - LLM second-opinion latency: 425-1162ms per probe, all 5 test cases correct. Worst-case 16-step Norris = ~20s overhead; with static-pattern fast-path + session cache, ~5s realistic. - Module pre-state at commit `f26cbd9` (Phase 2 tip): LOC + state per file before Phase 3 edits. - Six static-pattern Lua-match sanity checks (all correct). - Carries: aish#15 (still open), aish#14, aish#32/#33. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 22:37:58 +00:00
marfrit	b58a842e49	docs/PHASE3: formulate — Norris autonomous mode + destructive-op gate Phase 3 formulate manifest. Three pillars per PHASE0.md §11 row 3: Chuck Norris autonomous mode (planning loop), destructive-op heuristic (static patterns + LLM second-opinion), and HALT/confirm protocol. Resolutions baked in via §2: Q2 iterative re-plan after each action (not top-down tree) Action sources CMD: lines AND MCP tool_calls — Phase 2 contract honored HALT trigger static-pattern hit OR LLM-second-opinion flag HALT shape 3-way: proceed / skip / abort Auto-approve under Norris honors Phase 2 auto_approve policy EXCEPT destructive-op heuristic always wins LLM second-opinion model the `fast` preset (cheapest) Norris prompt suffix appended to system prompt while active; "GOAL: complete" sentinel for done Key extensions: - safety.is_destructive: ~20 static shell-idiom patterns + LLM probe; runs on interactive CMD: extraction too (§9 — replaces bare confirm_cmd for known-destructive cases). Q24 worth challenging at analyze. - safety.norris_step: single-iteration of the planner. Driver loop in repl.lua. \C-n toggle (real binding, replaces Phase 1 placeholder); :norris <goal> explicit launch. - renderer.norris_begin/step/halt/end: visual parity with exec and tool_call frames. Prompt becomes [aish:fast ⚡]> per PHASE0.md §9. - context.to_messages dynamically appends NORRIS MODE suffix when norris_active. New open questions (Q23–Q30) tracked in §11: Q23 LLM second-opinion latency budget (caching mitigation) Q24 interactive CMD: also subject to is_destructive? (proposal: yes) Q25 GOAL: complete + pending actions in same response — dispatch first Q26 context preservation on abort/done/budget — all preserve Q27 :norris continue (resume after abort) — deferred to v2 Q28 side-effect MCP tools not in __shell/__write_file patterns Q29 goal-implies-authorization for destructive ops — no, always confirm Q30 :norris no-arg vs \C-n share goal-prompt path — yes, trivial Module-layout (PHASE0 §4) untouched — all changes are growth of existing files. 6 commits expected at implement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 20:45:03 +00:00
marfrit	f26cbd9a3a	phase2 amend: __ separator (Bedrock-safe) + post_sse error diagnostics Phase 7 verify finding from TC #26 against :model cloud: HTTP 400 from openrouter→Amazon Bedrock: "tools.0.custom.name: String should match pattern '^[a-zA-Z0-9_-]{1,128}$'" Anthropic via Bedrock validates tool names against that regex and rejects dots. PHASE2 originally chose "." as the namespace separator ("boltzmann.list_dir"); OpenAI tolerated it, Bedrock does not. Separator switched to "__" (two underscores) everywhere — internal API matches on-wire shape, no transformation layer: - repl.lua: - tools_schema builds "alias__name" - dispatch_tool_call splits via "^(.-)__(.+)$" (non-greedy → leftmost __) - :mcp tool parser uses same split - :mcp tools formatter prints "alias__name" - HELP block shows <alias__name> - safety.lua confirm_tool_call: alias.* glob → alias__* glob - config.lua example block: keys rewritten - docs/PHASE2.md: amendment header added; §1, §2 row, §3 config.lua row, §5 wire-shape JSON examples, §6 auto_approve schema, §7 meta-cmd table, §12 plan all updated. Original "." references preserved in commit history. Constraint: aliases must not themselves contain "__" so the parse stays unambiguous. Tool names from MCP servers may have underscores freely. Second fix bundled — uninformative broker error: Previously "broker error: transport: HTTP response code said error" Now "broker error: transport: HTTP 400: {full body snippet}" ffi/curl.lua M.post_sse changes: - FAILONERROR no longer set (was hiding the response body). - raw_body accumulator added alongside the SSE buffer; captures every byte regardless of SSE shape. - After perform, check status_code via curl_easy_getinfo. On >=400, return (nil, "HTTP <code>: <body[:400]>"). 2xx unchanged. - End-of-stream SSE flush only runs on 2xx (no false event on error bodies that aren't SSE-shaped). - Phase 1 callers reading just first return slot stay correct. End-to-end verified: - :model cloud + tools=[boltzmann__read_file ...] + "Use boltzmann__read_file with path=/etc/hostname" → Claude emits tool_call with name="boltzmann__read_file", args='{"path": "/etc/hostname"}'. ok=true, transport clean. - Force-bad tool name "bad.name.with.dots" → err string carries the full bedrock 400 with the regex-pattern message visible. TC #26 (sub-loop end-to-end) is now testable against cloud — the error that blocked it is resolved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 20:04:57 +00:00
marfrit	3fa6279f5b	repl: :mcp tool — disambiguate "no alias" vs "unknown alias" errors Surfaced by Phase 7 verify test case #29: typing :mcp tool list_dir (no dot) printed "unknown alias: nil" instead of a useful diagnostic. The parse failure was being conflated with the alias-not-found case. Now: :mcp tool list_dir -> tool name missing alias prefix: list_dir :mcp tool unknown_alias.x -> unknown alias: unknown_alias :mcp tool known_alias.bogus -> unknown tool: known_alias.bogus Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 18:55:01 +00:00
marfrit	09800d192a	config: Phase 2 mcp example block + deep model switch Phase 2 commit #7 (final) per docs/PHASE2.md §12. Two changes bundled: (1) commented-out mcp = {...} example block (~40 lines) at the end of config.lua showing the Phase 2 schema: - mcp.servers — alias → {url, auth_token \| auth_env} - mcp.auto_approve — "<alias>.<tool>" or "<alias>.*" globs - mcp.max_tool_depth — sub-loop budget per ask_ai turn The block is OFF by default; uncomment + adjust per fleet to activate. Documentation-only; no behavior change to existing configs (mcp_sessions stays empty, tools_schema() returns [], broker omits the field — full Phase 1 compatibility). (2) User-authored: deep model preset switched from mistral-nemo-12b-instruct to qwen3-30b-a3b-instruct, with a 10-min timeout_ms accommodating the larger model's RK3588 inference time. Reason: nemo backend is dormant per the proxy /v1/models discovery (aish#23 now returns 404 cleanly for unknown models instead of silent fallback); qwen3-30b is the practical "deep" alternative. Phase 2 implementation is now complete — 7 of 7 commits landed: #1 `6c194de` mcp.lua + ffi/curl status_code + PHASE0 §4 amendment #2 `0fde77f` safety.lua confirm_tool_call #3 `7c221a8` context.lua tool turns + use_tool_role fallback #4 `c736d0e` renderer.lua tool-call frames #5 `efdc728` broker.lua opts.tools + tool_call accumulator #6 `7e9cfff` repl.lua sub-loop + :mcp meta + system-prompt block #7 (this) config.lua example + deep model switch Next phase-loop step: verify (Phase 7). Files written are wired and isolated-tested; end-to-end model-driven verification waits on either a more compliant model or explicit forcing of tool_calls from the prompt — known to be marginal with the loaded qwen-1.5b but proven correct against direct probes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:40:21 +00:00
marfrit	7e9cfff04d	repl: tool-call sub-loop + :mcp meta + system-prompt augmentation Phase 2 commit #6 per docs/PHASE2.md §12. End-to-end wiring of the MCP tool-call flow on top of broker/safety/context/renderer/mcp. repl.lua additions: - mcp_sessions table populated from config.mcp.servers at startup. connect_mcp() helper does initialize + caches tools/list. Failures status-logged once; absent from mcp_sessions until manual reconnect (C4 — no auto-retry). - tools_schema() flattens connected sessions' tools into the OpenAI {type:"function", function:{name,description,parameters}} shape with "<alias>.<name>" namespacing. - flatten_content() concatenates content[type="text"] blocks; one-shot status warning when non-text blocks (image/resource) are dropped (§4 normative spec, v1 only handles text). - dispatch_tool_call(name, args_table) splits alias.tool, looks up session, calls. Returns (content_string, is_error). Errors of every flavor (missing alias, no session, rpc_error, transport_error) yield a synthesized "[aish] ..." string so callers always have a body for the role:"tool" turn — alternation preserved per C5/C7. - ask_ai rewritten as a sub-loop that re-issues the broker request until the model returns pure text or max_tool_depth (default 8) is hit. Each iteration: stream response → if tool_calls present, confirm-gate each → dispatch → append role:"tool" turn → continue. Argument-JSON parse failure produces a synthesized tool turn (C7). Decline at confirm produces "[aish] tool call declined by user" tool turn (alternation guarantee). - :mcp meta with sub-commands: list / tools / tool <a.n> / connect <url> [alias] / disconnect <alias>. HELP block extended. context.lua: DEFAULT_SYSTEM_PROMPT grows by ~4 lines per PHASE2.md §8 (hybrid prompt: static frame about MCP + dynamic tools list in the request body). Block is always present even when no MCP servers configured — ~60 tokens for clarity that 'CMD:' remains the fallback. CMD: extraction unchanged — runs on the FINAL pure-text response only (not on intermediate iterations of the tool sub-loop). Substrate §3 invariant preserved. End-to-end verified two ways: (1) Direct broker probe: aish's tools_schema fed through broker.chat_stream against hossenfelder → qwen-1.5b emits one tool_call payload with correct id + name="boltzmann.list_dir" + args='{"path":"/tmp"}'. Accumulator stitched the JSON-string across fragmented deltas. (2) Mocked-broker sub-loop test: ask_ai feeds 'list /tmp', mock emits text + tool_call, sub-loop dispatches against LIVE boltzmann lmcp (auto_approve via policy), 80+ files rendered inside the tool_call frame, broker re-invoked with the extended context, mock returns pure text, sub-loop terminates. Total broker invocations: 2. Known: the loaded fast model (qwen-1.5b) tends to emit "CMD: ..." suggestions even when an MCP tool is the better path; the small model's system-prompt compliance is weak. Larger models and the analyze-time direct probe confirm the tools_schema and tool_calls flow is wire-correct — Phase 7 verify will exercise this against qwen3-30b or cloud models when available. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:20:42 +00:00
marfrit	efdc7281c7	broker: opts.tools passthrough + streaming tool_call accumulator Phase 2 commit #5 per docs/PHASE2.md §12. Streaming broker grows tool-call support without taking a dependency on mcp.lua (caller supplies the tools array — B5 from review). chat_stream signature widens to (cfg, msgs, on_delta, opts): opts.tools - optional array, passed to the request body as the OpenAI-shape tools field. OMITTED entirely when nil or empty (#tools == 0) — some servers reject "tools": []. on_delta callback shape widens to (kind, payload): kind = "text", payload = string (Phase 1 path; unchanged semantics, signature changes from (delta) to ("text", delta)) kind = "tool_call", payload = {id, name, arguments} emitted ONCE per call on finish_reason "tool_calls" after the streaming accumulator pulls fragmented JSON-string arguments together. Accumulator behavior: - Keyed by delta.tool_calls[i].index. - If index is absent on a delta (some llama.cpp builds omit it on single-call streams; C2 in review), default to 0 with a one-shot stderr debug status per stream. - id and name captured from the opening delta of each slot. - function.arguments concatenated across all deltas as the raw JSON-string; caller (repl.lua / future Phase 2 commit #6) does dkjson.decode. - On finish_reason "tool_calls" the accumulator emits all collected calls in index order and resets. M.chat external contract unchanged (C1): wrapper now uses the new (kind, payload) shape internally but exposes the same text-string return. No caller of M.chat passes opts.tools so tool_call kinds are silently dropped. repl.lua minimal companion edit: ask_ai's chat_stream callback updated to the new shape. Text path unchanged; tool_call kinds are no-op placeholders until commit #6 lands the sub-loop. Keeps Phase 1 streaming functional between #5 and #6. Smoke-tested against hossenfelder/8082 (post-#23 fix): - text-only: ok=true, kind="text" deltas received - with opts.tools: model emitted one tool_call, accumulator collected id + name=get_weather + args={"city":"Paris"} correctly across fragmented deltas - opts.tools={}: server accepted (field omitted as required) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 14:20:32 +00:00
marfrit	c736d0e129	renderer: tool-call begin/end frames Phase 2 commit #4 per docs/PHASE2.md §12. Adds M.tool_call_begin(name, args) and M.tool_call_end(content, is_error) for visual parity with the existing exec_begin/exec_end frame. Visual cadence: ─── tool: <name (cyan)> ─── <args, dim, truncated at 200 chars; omitted if empty/"{}"> <content> ─── ok ─── (dim, success) ─── error ─── (red status word inside dim rule, on is_error=true) Same rule glyph (━) and ANSI palette as the exec frame so the user reads tool dispatch and shell dispatch the same way. Smoke-tested all five shapes: success with args / empty args / error / long args truncated / empty content. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 14:11:42 +00:00
marfrit	7c221a8aae	context: tool turns + tool_calls on assistant; use_tool_role fallback Phase 2 commit #3 per docs/PHASE2.md §12. Three concrete edits per §3 context.lua row (the BLOCKER-fold-in from review): (a) Loosen Context:append shape-per-role: assistant may carry empty content if tool_calls is non-empty; role:"tool" requires tool_call_id + content. (b) Preserve tool_calls / tool_call_id on store (Phase 1 :append built {role, content} only and silently dropped extras). (c) Extend to_messages() with two emission modes selected by use_tool_role: true (default) — OpenAI-standard role:"tool" + assistant turns with tool_calls (wrapped as {id, type:"function", function:{name, arguments}}). false (fallback) — collapse assistant-with-tool_calls + its following role:"tool" turns into a single assistant text turn with synthesized "[tool: name]\n<args>\n[result]\n <content>" body; merge consecutive assistant turns so the trailing post-tool-result text doesn't yield asst/asst back-to-back (same strict-template gotcha PHASE0.md §6 warned about for user/user). Alternation assert added (N4): role:"tool" turns must trace back through zero-or-more prior tool turns to an assistant-with-tool_calls. Catches sub-loop bugs at append time. Orphan tool turns rejected. pending_exec_output behavior unchanged per §3 row: buffer persists across tool-call sub-loops, flushes on next genuine user turn (B4). Smoke-tested §12 verify-row #3: (i) default mode round-trip — 5 OpenAI-shape messages, tool_calls + tool_call_id preserved. (ii) fallback mode round-trip — collapsed into 3 messages (system/user/assistant), tool_calls + role:"tool" not emitted. (iii) multi-call: 2 tool_calls in one assistant turn followed by 2 tool replies, both modes render correctly. (iv) orphan tool turn after user — assertion fires. (v) B4: pending_exec_output survives a tool sub-loop, flushes on next :append_user. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 13:10:47 +00:00
marfrit	0fde77fe35	safety: confirm_tool_call gate with auto-approve policy Phase 2 commit #2 per docs/PHASE2.md §12. Implements just the per-call confirm-gate surface; Phase 3 stubs (is_destructive, norris_step) stay unimplemented with their error() bodies. M.confirm_tool_call(name, args, cfg) checks cfg.mcp.auto_approve for: - exact match on "<alias>.<tool>" - "<alias>.*" glob covering a whole server Miss falls back to a [y/N] readline prompt. Empty or non-"y" answer rejects (matches the existing confirm_cmd UX from PHASE0 §10). Pretty-printing renders args as compact JSON, truncated at 80 chars with "..." suffix so one-line prompts stay readable. Smoke-test passes all eight cases per §12 verify-row #2: exact match / alias glob → auto-approve, no prompt miss + y / n / empty / nil-cfg → prompt shown, expected verdict empty args / long args → clean rendering, truncation works Note: PHASE0 §4 module-layout had a "lands in Phase 2" hint on the norris_step stub; the actual landing is Phase 3 per PHASE0 §11 row 3. Comment in safety.lua updated to clarify. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 13:07:57 +00:00
marfrit	6c194deea0	mcp: JSON-RPC client + ffi/curl status_code; PHASE0 §4 amended First commit of Phase 2 per docs/PHASE2.md §12. Three changes bundled: mcp.lua (new, 153 lines): - M.connect(url, opts) returns a Session. - Session:initialize() round-trips initialize + notifications/initialized + tools/list. Caches tools for session lifetime (lmcp announces capabilities.tools.listChanged = false; no refetch). - Session:list_tools() returns the cached tool list. - Session:call_tool(name, args) returns (result_table, kind) where kind ∈ {"ok", "handler_error", "rpc_error", "transport_error"} per the §4 error split. Folded HTTP-level failure into transport_error. - Per-server Bearer auth via opts.auth_token or opts.auth_env env-var indirection. - Captures protocolVersion mismatch as a warning string rather than aborting (lmcp doesn't negotiate — N3 in review). ffi/curl.lua extension: - Add curl_easy_getinfo to ffi.cdef. - Pre-cast as getinfo_long; helper get_response_code() fetches CURLINFO_RESPONSE_CODE (decimal 2097154 = CURLINFOTYPE_LONG \| 2). - M.post now returns (body, status_code) on transport success; (nil, errmsg) on libcurl failure stays unchanged. Phase 1 callers reading only the first slot are unaffected. docs/PHASE0.md §4: - Insert `mcp.lua` between broker.lua and router.lua per PHASE2.md §9. - Module-stability invariant clarified: rename prohibition is what matters; adding new files is additive. Smoke-test passes for all four kinds against boltzmann lmcp v0.5.4: - initialize: ok (7 tools cached) - list_dir /tmp: ok (1.2KB content) - read_file /nonexistent: ok (boltzmann's baseline §3 quirk — isError:false even on failure; content is authoritative) - nope_tool: rpc_error (code=-32601) - wrong auth: transport_error (HTTP 401) - unreachable host: transport_error (DNS failure) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 13:06:39 +00:00
marfrit	f5daa6afc0	docs/PHASE2: re-review NITs — M.post shape, getinfo cdef, content flattening normative Three follow-up NITs from the post-fold-in review: (1) Disambiguate M.post return shape: (body, status_code) on transport success regardless of status; (nil, errmsg) on libcurl failure stays unchanged. Phase 1 callers reading only the first slot are unaffected. (2) Note that the M.post extension requires extending ffi.cdef to include curl_easy_getinfo + CURLINFO_RESPONSE_CODE (decimal 2097154, CURLINFOTYPE_LONG \| 2) and a long[1] out-param shim. Implementation detail the commit #1 author will need. (3) Move the tool-result content-flattening rule from §12 risk note into §4 normative spec (forward-referenced both ways) — §4 is where a future reader looking for the tool-invocation contract will scan. No design changes; clarifications only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 13:02:35 +00:00
marfrit	d3570ccea4	docs/PHASE2: review fold-in — 5 BLOCKERs + 7 CONCERNs + key NITs Independent review of the formulate+analyze+plan draft surfaced design gaps that would have shipped as silent bugs. Resolutions applied: BLOCKERs: B1 context.lua impact widened — Phase 1 :append asserts content and discards extra fields. Need (a) shape-per-role assert, (b) preserve tool_calls/tool_call_id on store, (c) emit from to_messages(). B2 ffi/curl.M.post extended to return (body, status_code). lmcp's 401 returns a non-JSON-RPC body that would have been mis-decoded. B3 §3 typo schema -> inputSchema. B4 pending_exec_output × tool-call sub-loop interaction specified. B5 §3/§12 broker dependency contradiction — broker takes opts.tools from caller; no layering inversion. CONCERNs: C1 M.chat return polymorphism dropped (no consumer). C2 tool_calls[].index absent fallback: default to 0. C3 Re-injection stores accumulated text, not hard-coded empty. C4 :mcp connect failure: no auto-retry, status-log once. C5/C7 JSON-RPC error AND argument-parse failure both synthesize a role:"tool" turn — keeps strict-template alternation legal exactly the way PHASE0 §6 demanded for exec output. C6 §9 confirms §4 amendment is additive (preserves §3 invariant). NITs: N3 protocolVersion fallback (lmcp doesn't negotiate). N4 Alternation assert in Context:append. N7 Model-routing bug filed as aish#23. N8 Day-one fallback test for use_tool_role=false in commit #3. Manifest status: Plan (review folded). Status line and Resolutions sections updated; commit-by-commit roadmap reflects revised specs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 13:00:07 +00:00
marfrit	447e430254	docs/PHASE2 §12: implementation plan — 7-commit roadmap Bottom-up: mcp.lua → safety.lua → context.lua → renderer.lua → broker.lua → repl.lua → config.lua. Same cadence as Phase 0/1. Risks called out explicitly: - Empty tools array → omit field entirely (some servers reject []) - isError:false on actual failure (baseline §3 finding) → pass content through regardless; let model read error text - JSON-RPC error from tools/call → aish status only, no tool turn appended, no model recovery - max_tool_depth=8 cap on tool-call sub-loop - Argument JSON streaming may yield malformed JSON → status warn + skip - Q18 fallback (use_tool_role=true default; prefix-injection plumbed but dead-coded; verify can flip) - Connect-at-startup is sequential (~30ms × N); fine for N≤3 Two items left open for review: Q18 default flip vs ship-true-flip-on-fail, and whether :mcp connect should re-fetch tools after the initial cache. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 12:37:27 +00:00
marfrit	c5116bf129	docs/PHASE2-baseline: pre-implementation measurements Phase 7 (verify) anchor. Captures: - MCP RPC round-trip timings against boltzmann lmcp v0.5.4 (all sub-100ms on LAN; LLM is the latency floor, not the transport). - 6 fixture responses saved to /tmp/aish-baseline/ covering initialize, notifications/initialized, tools/list, tools/call success, isError, and JSON-RPC unknown-tool error. - Baseline design finding: boltzmann's read_file returns isError:false even on failure (error text in content). aish should treat content as authoritative, isError as advisory; feed both to the model. PHASE2.md §4's "pass-through" stance already accommodates; no manifest amendment needed. - Streaming tool_calls delta shape verified against hossenfelder; matches PHASE2.md §5. - Pre-MCP aish behavior snapshot: loaded model emits markdown code-fence ignoring the CMD: contract — once MCP tools exist the model gets a structured path that doesn't depend on prose-formatting compliance. - Module pre-state at Phase 1 head `5878f73`: LOC + capability snapshot per module so Phase 2 diff has a reference frame. - Two boltzmann-proxy blockers (SSE buffering, model-field routing) carried explicitly into Phase 7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 12:34:32 +00:00
marfrit	5878f7347b	docs/PHASE2: analyze — lmcp v0.5.4 probed, transport simplified Live-probed against lmcp v0.5.4 (boltzmann) + hossenfelder broker proxy: Transport simpler than spec: - lmcp only implements POST-per-RPC with Connection: close; no held-open SSE channel. Combined with capabilities.tools.listChanged=false, no client-side listener is needed in v1. Drops the planned M.get_sse addition to ffi/curl.lua — Phase 1's M.post covers MCP. Bearer auth is universal across the fleet — config schema grew auth_token (literal) and auth_env (env-var indirection) fields per server, mirroring PHASE0 §10's key_env convention. Streaming tool_calls delta shape verified — accumulator by `index`, function.arguments arrives as chunked JSON-string. Matches the formulate-phase assumption in §5. Resolutions: Q17 transport abstraction — POST-only, no SSE channel for lmcp. Q21 error mapping — result.isError (model-recoverable, feed back as tool turn) vs JSON-RPC error (unknown method/tool, transport-level). Q18 role:"tool" turn — accepted at protocol level (live-probed). Mistral-nemo template verification blocked by the hossenfelder model-field routing bug; full closure carried to Phase 7 verify. Open-end recorded in §11: the hossenfelder proxy routes every request to the loaded fast model regardless of model field, blocking Phase 2 testing against mistral-nemo specifically. Parallel to the SSE buffering issue at marfrit/aish#15; same root (boltzmann proxy code). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 09:51:03 +00:00
marfrit	ec6793c93c	docs/PHASE2: formulate — MCP client + tool-calling bridge Phase 2 formulate manifest. Three pillars per PHASE0.md §11 row 2: mcp.lua (JSON-RPC 2.0 over HTTP+SSE, target: lmcp), tool-calling bridge (OpenAI tools field <-> MCP tools/call), and the safety.lua authorization gate (per-call confirm + auto_approve policy). Resolves PHASE0.md §13 Q6–Q10: Q6 CMD: + tool-calls coexist; substrate §3 unchanged Q7 config-declared servers + runtime :mcp connect Q8 per-call confirm default, auto_approve policy in config Q9 hybrid system prompt: static frame + dynamic tools body field Q10 streaming-from-day-one on Phase 1 SSE; on_delta widens to (kind, payload) New questions tracked in §11 (Q17–Q22): transport abstraction, role:tool vs prefix injection (mistral-nemo template verification needed), large tool-result handling, parallel dispatch, error mapping, aish-as-MCP-server (parked). §4 module layout amended: mcp.lua slots between broker.lua and router.lua. The amendment is documented in this manifest; the actual §4 table edit lands when implementation starts (Phase 2 implement phase). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 09:23:53 +00:00
marfrit	f7c3c32aa2	.claude: project-shared permission allowlist for read-only MCP/Bash Adds .claude/settings.json — 10 read-only entries (mcp____read_file, mcp__hub-tools__remote_list_hosts, Bash(ping ), Bash(dig *)) auto-allowed in any aish session, reducing per-call permission prompts during routine file-reading and host probing. Generated via /fewer-permission-prompts. settings.local.json stays user-private (per-user ad-hoc grants); .gitignore now covers it so it doesn't accidentally land in commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 08:08:26 +00:00
marfrit	7d62eb5659	review followups: pcall shield, :resume guard, shell quoting, nits CONCERNs from the Phase 1 review pass: ffi/curl.lua: - SSE write_cb body is now pcall-wrapped. A Lua error in on_event (or in the parse loop itself) is captured into cb_error and surfaced after curl_easy_perform rather than propagating across the FFI callback boundary (which LuaJIT documents as process-fatal). The EOS flush path gets the same shield. Errors return (nil, "callback: <msg>") from post_sse. history.lua: - sh_singlequote() escapes shell metacharacters; the mkdir -p and ls -1 shell-outs no longer double-quote (where $(...) and $VAR still expand) — single-quote with embedded-' escaping is the safe form. - M.load now returns (turns, meta) instead of (meta, turns). turns is ALWAYS a table on success, never nil-when-no-header; failure path is the unambiguous (nil, err). Callers can `if not turns then` without the previous ambiguity. repl.lua :resume updated to the new shape. repl.lua :resume: - Refuse to resume into a non-empty ctx — silent overwrite was the Q15 default, but the review surfaced the no-undo / no-warning failure mode. User must :reset (or :save then re-launch) to express intent. The current session's on-disk log is unaffected either way. NITs: - ffi/libc.lua READ_BUF: comment noting it's module-shared and Phase 1 has no reentrant readers; revisit when that changes. - PHASE1.md §7: \C-x\C-c reservation pinned to Phase 3 ("deferred from Phase 1 — no consumer here") rather than the previous dangling "(or here)". Regression suite verifies: - history.load new signature on success + failure paths - shell-quoted history.dir with $ doesn't trip - aish scripted run: ctx with 2 turns refuses :resume anchor with a clear status; user must :reset first Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:05:23 +00:00
marfrit	1f1065157e	review BLOCKER: PTY input forwarding + raw mode toggle Phase 1 review caught a structural gap: executor.exec only drained the PTY master fd, never forwarded user keystrokes — vim/less/htop/nano would render and hang on input. PHASE1.md §5 specified bidirectional multiplex but only the read leg landed. tcgetattr/tcsetattr were also missing, so even with input forwarding the parent's line discipline would buffer until newline (breaking single-key UIs). ffi/libc: - struct termios opaque buffer + tcgetattr/tcsetattr + cfmakeraw - M.set_raw(fd) saves termios + applies cfmakeraw; returns saved or (nil, err) when fd isn't a tty (scripted / piped-stdin runs) - M.restore_termios(fd, saved) - struct pollfd + M.poll (POLLIN constant) executor: - multiplex(sess): poll(stdin, master); reads master on any revents (POLLHUP fires when child closes its slave end, not POLLIN — the revents != 0 check catches both); forwards stdin keystrokes to master; loop exits when master read returns 0 (EOF / child gone) - stdin polling is only enabled when stdin_is_tty (set_raw succeeded); piped-stdin runs (tests / scripted) would otherwise drain queued aish commands into the child of the current cmd, swallowing them - raw mode is restored before returning so the user lands back at the aish prompt in canonical mode renderer + repl: - exec_output(out, code) split into exec_begin() (top rule, before spawn) + exec_end(code) (closing rule with exit, after wait). PTY multiplex streams the body live to stdout in between; the renderer never re-prints the body. PHASE1.md §3: - tcgetattr/tcsetattr changed from "optional" to "required for single-key UIs to work — done-criteria #2"; poll added to the libc row description. Verified: - non-interactive smoke (echo / false / exit 7 / ls /nonexistent / printf multi-line) — all exit codes correct, output streamed live, a\nb\nc\n preserved byte-for-byte - scripted-stdin run reaches all expected lines (no stdin draining into a non-interactive child) - aish prompt + framed exec block + exit-code line all render in correct order Live interactive verification (vim / less / htop in a real terminal) still needs a user-test pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 20:00:53 +00:00
marfrit	a75118b2ae	readline: bind() via rl_bind_keyseq; repl reserves \C-n no-op Phase 1 readline binding wiring per PHASE1.md §7. ffi/readline: M.bind(seq, lua_fn) -> bool Wraps lua_fn as a C callback (signature `int (int, int)` per readline's rl_command_func_t) and registers it via rl_bind_keyseq(seq, cb). Returns true on success (rl returns 0). Trampolines are pinned in module-local state so they outlive the bind call — readline retains the function pointer for the process lifetime. Rebinding the same seq frees the previous trampoline. Bound handlers are pcall-wrapped so a Lua error doesn't crash readline's input loop. repl: Binds \C-n to a no-op that emits "[aish] Norris mode not yet implemented (Phase 3)" Verifies the mechanism end-to-end; Phase 3 (Norris autonomous mode) replaces the body with the actual toggle. Smoke covers bind / rebind-same-seq (exercises the :free path) / bind-different-seq with no errors. Live keyboard verification waits on user-test. Phase 1's 8(+1) inner loop is now functionally through `implement`; next inner phase is `verify` (review pass) followed by memory-update. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:26:58 +00:00
marfrit	9d586870e8	repl: session persistence wiring — auto-log, :save, :resume, :sessions Phase 1 session log integration per PHASE1.md §6. On every M.run(), open a session file at <config.history.dir>/sessions/<utc-iso8601>.jsonl with a meta header (started, model, aish_version). If history.dir is unset or unwritable, status-log the disable and continue without persistence. ask_ai logs the merged user turn (after pending exec output is folded in) and the assistant turn (after streaming completes). run_shell does NOT log [exec output] — that becomes part of the next user turn when ctx.pending_exec_output is flushed. New meta commands: :sessions list session files; "*" marks the active one :save <name> rename current session log to <name>.jsonl (auto- appends .jsonl); reopens for continued append :resume <name> load <name>.jsonl into ctx (replaces current turns via ctx:reset + append loop). The current process's own session log is unaffected — Phase 1 chooses per-process logs over chained continuations. :quit and EOF (Ctrl-D) both close the session file via shutdown_session before exiting. HELP text updated (no longer "Phase 0:" header since meta set has grown). Q15 noted in PHASE1.md §10 (resume into non-empty context) is resolved by the ctx:reset() in :resume — silent overwrite for Phase 1, revisit if anyone cares. End-to-end live verified: chat -> auto-log; :save renames; :sessions listings; :resume + :history shows the round-trip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:23:05 +00:00
marfrit	87316f8345	history: JSONL session log — open, append, load, list_sessions Phase 1 persistence per PHASE1.md §6. history.open(path, meta?) -> session \| (nil, err) parent dir auto-created; meta line written iff file is new/empty so reopening a session doesn't duplicate the header session:append(turn) JSON-encoded line, fh:flush after every write (no fsync — Q16 tracks the policy if it ever bites) session:close() history.load(path) -> meta, turns \| (nil, err) skips unparseable lines (e.g. partial trailing write from a crash); distinguishes the meta-header line from role/content turn lines history.list_sessions(dir) -> [basename, ...] sorted (ISO 8601 names lex-sort chronologically); no mtime / turn counts in Phase 1 — that's a Phase 4 :sessions UI concern Smoke: - open, append 3 turns, close, list_sessions sees 1 file - load returns meta (model="fast") and 3 turns in order - corrupt tail (partial JSON line appended) is silently skipped on load - reopen with different meta does NOT duplicate the header line Repl wiring (`:save`, `:resume`, `:sessions`, auto-write on quit) lands in the next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:19:21 +00:00
marfrit	a722f576ac	repl + renderer: streaming assistant output (Phase 1) repl.ask_ai now drives broker.chat_stream and pumps each delta into renderer.assistant_delta(delta) as it arrives. renderer.assistant_flush is called when the stream ends to add a trailing newline if missing. The full reassembled response is then handed to executor.extract_cmd_lines for the CMD: confirm-and-execute path (unchanged from Phase 0). renderer.assistant() is kept for non-streaming callers (none in tree right now, but cheap to keep around). assistant_delta/flush share no state with assistant(); they use a module-local stream_buf that tracks the in-progress streamed block. Q12 deferred: incremental CMD: highlighting (cursor-positioning re- render on flush) is not implemented in Phase 1 — deltas emit raw. The §6 CMD: marker is still extractable on the reassembled string post- stream, which is what executor cares about. Renderer's bold+cyan treatment for CMD: lines stays available via M.assistant(). Broker error / SSE-framed api-error path still pops the user turn and restores ctx.pending_exec_output. Order: assistant_flush always runs (even on error) so the cursor lands on a fresh line before the broker- error status renders. Live verification: `Count one to ten` against hossenfelder fast streams deltas through to stdout incrementally; CMD: extraction works on the reassembled string; confirm gate intact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:17:27 +00:00
marfrit	e46a5c385d	broker: chat_stream over post_sse; chat is now a buffering wrapper Phase 1 streaming consumer per PHASE1.md §3. broker.chat_stream(model_cfg, messages, on_delta) -> true \| (nil, err) broker.chat(model_cfg, messages) -> content \| (nil, err) (now a thin buffer over chat_stream) The HTTP shape unifies on stream:true. on_event from ffi/curl.post_sse decodes each event's JSON, extracts choices[1].delta.content, and calls on_delta(content) for non-empty string deltas. The `[DONE]` sentinel is filtered. SSE-framed error envelopes ({"error":{"message":...}} arriving as data:) surface as "api: ..." errors. build_request is factored out so chat_stream and (future) any non-streaming consumer share URL/body/header construction. Live verification against hossenfelder fast preset: - chat_stream("Count one to five..."): 9 incremental deltas streamed token-by-token, assembled to "1 2 3 4 5" - chat("Reply with exactly: pong"): "pong" returned via buffer Error envelope path is correct by inspection but not exercised live — hossenfelder passes through bogus model names rather than rejecting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:16:07 +00:00
marfrit	2e36381576	ffi/curl: SSE streaming via post_sse — incremental data: events Phase 1 streaming substrate per PHASE1.md §4. curl.post_sse(url, body, headers, on_event, timeout_ms) -> true \| (nil, errmsg) Reuses the Phase 0 WRITEFUNCTION hook. Each chunk delivery accumulates into a per-request buffer; the buffer is drained for complete events (\n\n-terminated). Each event's `data: ...` field(s) are joined per the SSE spec and passed to on_event(data_string) synchronously. `:` comment lines (keepalives) are filtered. The `[DONE]` sentinel is passed through to on_event as-is (broker.lua filters it — this module stays HTTP-layer only, no JSON / OpenAI shape knowledge). Two robustness items: - End-of-stream flush: the final event may lack \n\n if the server closes-on-EOF immediately after the last data: line (some llama.cpp builds, plain HTTP/1.0 close-on-EOF feeds). Post-perform, any remaining buffer is parsed as one last event. - FAILONERROR: a non-2xx response surfaces as a CURLcode error rather than silently feeding the error body into the SSE parser. Smoke: [1] canned events via nc listener: 3 events parsed in order [2] chunk-split mid-event ("Hel" + sleep + "lo..."): correctly reassembled across two WRITEFUNCTION deliveries [3] LIVE against hossenfelder.fritz.box:8082 fast preset with stream:true: response "pong" assembled from incremental deltas; 4 raw events (role + 1 content + finish_reason + [DONE]) Next: broker.lua chat_stream that decodes the OpenAI delta shape on top of this and exposes on_delta(content_string) for renderer streaming. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:14:54 +00:00
marfrit	ee4d7f86d6	executor: swap popen+sentinel for pty.spawn (Phase 1) Replaces the Phase 0 io.popen + sentinel-echo exit-code recovery with forkpty + waitpid via ffi/pty. The §7 amendment paragraph on PHASE0.md is rewritten to point at PHASE1.md §5 — the workaround is gone, not just renamed. User-visible behavioral changes: - Interactive commands (vim, less, htop, top) now work via $cmd / :exec / known-command shell paths because the child has a real PTY for line discipline. - Exit codes are accurate: `false` -> 1, `exit 7` -> 7, signal kill -> 128+N (bash convention), shell parse error -> sh's 2. - Broken-shell-syntax cmd now shows the actual sh diagnostic (e.g. "Syntax error: end of file unexpected") instead of Phase 0's "(no output — possible shell parse error)" guess. - Output normalization: PTY emits CR LF; executor collapses \r\n -> \n to keep the Phase 0 contract ("output uses \n separators"). Code path: pty.spawn(cmd) -> drain master_fd until EOF -> wait() returns ("exit", N) \| ("signal", N) \| ... -> exit_code mapped: exit -> N, signal -> 128+N, else -1 Phase 0 invariants intact: `cd` interception unchanged (still libc.chdir per §3 + §7), `CMD: ` extraction unchanged. PHASE0.md §7: the "LuaJIT 2.1 popen-close caveat" paragraph is rewritten to "Superseded by Phase 1" — points at PHASE1.md §5 for the live model. The illustrative sketch is left in place as historical context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:08:27 +00:00
marfrit	10d2fc5ac1	ffi/pty: forkpty-backed spawn + session handle Phase 1 PTY substrate per PHASE1.md §5. Replaces Phase 0's io.popen sentinel-echo path with a real PTY so interactive cmds (vim, less, htop) work and exit-status comes from waitpid instead of parsing a sentinel out of stdout. API: pty.spawn(cmd) -> session \| (nil, err) session:read(count) -> (data, n) ; n == 0 means EOF session:write(data) -> bytes session:close() ; closes master_fd; child gets SIGHUP session:wait(options) -> (kind, val) ; "exit"/"signal"/"other"/nil session:signal(sig) -> ok ; kill(pid, sig) Child branch execs `/bin/sh -c cmd`, preserving Phase 0's shell- interpretation semantics (quoting, redirection, pipes still work). The PTY makes vim/less/htop functional because the child gets a real tty for line discipline instead of a pipe. Loader uses the versioned-soname fallback idiom (util / util.so.1 / util.so.0) so a runtime-only host without libutil-dev works. Smoke covers: echo hello (exit 0), false (1), exit 7, bogus binary (sh's 127), multi-line printf, cat bidirectional (write ping -> read echo+cat output -> close master -> child exits via SIGHUP). Next: executor.lua swap from popen+sentinel to pty.spawn. That commit also retires the §7 amendment paragraph (no longer needed once popen is gone). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:03:19 +00:00
marfrit	113f87125a	ffi/libc: phase 1 syscalls — waitpid + raw fd I/O + kill Extends Phase 0's chdir/errno/strerror with the syscalls that ffi/pty needs to drive a forkpty'd child: waitpid (with WIFEXITED / WEXITSTATUS / WIFSIGNALED / WTERMSIG decoders), read, write, close, kill. Status-word macros are reproduced from glibc bits/waitstatus.h using the LuaJIT `bit` library. M.waitpid returns a structured (kind, value) rather than the raw status word — callers don't have to know the encoding: "exit", N — normal exit, N is exit code "signal", N — killed by signal N "other", raw — stopped/continued (Phase 1 doesn't trace those) nil, err — syscall failure M.read / M.write / M.close / M.kill mirror their syscall return shape with errno-string surfacing on failure. Read uses a shared 4 KiB buffer for the common case; larger reads allocate a fresh buffer. Smoke covers the chdir regression (still works), all four status decoders against known status words, pipe round-trip for read/write/ close, EOF -> ("", 0), invalid-fd close -> false, kill(self, 0) success, kill(bogus, 0) failure. waitpid is not exercised by the smoke (needs a real child); that arrives with ffi/pty. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 18:58:35 +00:00
marfrit	539408f480	phase1 formulate: scope, tech decisions, module changes, open questions Inner-loop Phase 1 (formulate) deliverable for the milestone Phase 1 of the aish project. Drafts docs/PHASE1.md to specify what lands on top of the Phase 0 substrate — no code changes, no §3 invariant amendments. Phase 1 milestone scope per PHASE0.md §11: 1. SSE streaming via libcurl FFI (existing WRITEFUNCTION hook) 2. PTY-backed exec via forkpty(3); replaces popen + retires the §7 sentinel exit-code workaround in favor of waitpid 3. Session persistence as append-only JSONL under <config.history.dir>/sessions/<utc>.jsonl 4. Readline custom bindings (rl_bind_keyseq); Phase 1 reserves \C-n as a no-op for Phase 3's Norris consumer Module growth (no new file names beyond the §4-stubs): ffi/curl -> M.post_sse(url, body, headers, on_event) ffi/pty -> M.spawn / read / write / close / wait ffi/libc -> waitpid + WEXITSTATUS + tcgetattr/tcsetattr ffi/readline -> M.bind(seq, fn) broker -> M.chat_stream; M.chat becomes a buffering wrapper executor -> PTY path; sentinel hack deleted repl -> :save, :resume <name>, :sessions; streaming render renderer -> assistant_delta + assistant_flush history -> open / append / load / list_sessions Open questions Q11–Q16 (six new) tracked in §10: - SSE shape uniformity across OpenRouter routes (Q11, Phase 7) - CMD: highlight-on-stream strategy (Q12, plan phase) - tty raw-mode recovery on Lua error (Q13, plan phase) - bind \C-n now or defer to Phase 3 (Q14, plan phase) - :resume into non-empty context (Q15, plan phase) - session-log fsync policy (Q16, default close-only; tracked) Next inner phase is "analyze": for each module change, identify dependencies + risks + per-commit ordering. Then baseline (capture Phase 0 behaviors we want to preserve), plan, review, implement, verify, memory-update. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 18:56:20 +00:00
marfrit	16490e6905	fix: buffer exec output for next user turn; alternation for strict templates User-test surfaced the bug: with `deep` (mistral-nemo-12b) active, running `list files` -> y on `CMD: ls` -> `Are there directory entries beginning with "lor"?` returned a Jinja exception: api: ... Error: Jinja Exception: After the optional system message, conversation roles must alternate user/assistant/user/assistant/... Cause: §6 specified "exec output injected into context uses role 'user' with a prefix tag '[exec output]'." This works for permissive templates (qwen2.5-coder-1.5b, the `fast` preset) but produces a back-to-back user/user pair on strict templates that enforce the OpenAI alternation contract — `[exec output]` user turn followed by the user's actual follow-up question. Fix: context.lua: - new field `pending_exec_output` (initially nil) - new method `:append_exec_output(out)` buffers (concat on subsequent captures so multi-shell-then-ai still merges everything) - new method `:append_user(content)` flushes buffered exec output as a `[exec output]\n...\n\n` prefix and appends a user turn - `:reset()` also clears the buffer repl.lua: - run_shell calls ctx:append_exec_output(out) instead of ctx:append({role="user", content="[exec output]\n"..out}) - ask_ai calls ctx:append_user(text) instead of raw :append; saves prev_pending so a broker error can restore the buffer for retry PHASE0.md §6: - amended the role-injection paragraph to describe the buffer-and- prepend policy; the §3 invariants list is untouched (this was a §6 design detail, not a locked invariant) Verification: - context unit tests cover: alternation after the failing sequence, multi-shell merge, reset clears buffer, broker-error retry path - live reproduction against `deep` (mistral-nemo) of the exact user-reported sequence succeeds; model responds with a sensible `CMD: ls \| grep '^lor'` instead of a Jinja exception Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 18:41:21 +00:00
marfrit	8870eb0451	config: route all presets through hossenfelder per issue #12 Resolves issue #12 by partial-accept of the recommendation. What landed: - Single broker URL: http://hossenfelder.fritz.box:8082 for all three presets (fast / deep / cloud). Server-side model-aware routing; no client-side cloud auth (proxy holds the OpenRouter bearer). - Models from hossenfelder's /v1/models inventory: fast -> qwen2.5-coder-1.5b-q4_k_m.gguf (boltzmann local) deep -> mistral-nemo-12b-instruct (boltzmann local) cloud -> anthropic/claude-haiku-4.5 (OpenRouter route) - `cloud` was already pointing at hossenfelder but with https://; flipped to http:// so it matches the proxy's actual scheme. What deferred: - Schema rename `models` -> `brokers` (and the 5-cloud-preset shape suggested in #12) — would touch repl.lua + broker.lua. Not blocking Phase 7. If multi-preset becomes useful in practice, file a separate issue for the rename then. Phase 7 verification (live broker test): - broker.chat(fast, [user="say pong"]) -> "CMD: echo pong" in ~3s - multi-turn arithmetic (78=56, 2=112) preserved across turns Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 18:06:08 +00:00
marfrit	a76ff664b3	phase0 amendment: §3/§7/§10 close review-surfaced manifest gaps Three additions to PHASE0.md, all surfaced by the Phase 5 review of the Phase 0 implementation. No invariant changes; manifest now matches implementation reality. §3 — FFI loader fallback paragraph. ffi.load("name") needs the unversioned `libname.so` symlink that comes with the -dev package. Phase 0 loaders try unversioned first then versioned sonames so runtime-only hosts (no -dev) work as-is. Documents the actual behavior in ffi/readline.lua and ffi/curl.lua. §7 — LuaJIT 2.1 popen-close caveat paragraph. The §7 sketch had been showing Lua 5.2's three-return io.popen():close() shape; LuaJIT 2.1 follows the Lua 5.1 ABI and returns just `true`. Phase 0 recovers the exit status with a sentinel echo (`echo __AISH_EXIT_<tag>__$?`). Phase 1 PTY+waitpid replaces the hack and the sketch becomes accurate. Sketch left as-is (it's the right shape conceptually); caveat now explicit. §10 — cwd-relative package.path note. Phase 0 prepends `./?.lua; ./vendor/?.lua`, so aish must run from the repo root. Cwd-independent resolution is a later concern. Also clarifies that --config is strict (no fallback if the path is unopenable) — matches main.lua post the review-followup commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 17:44:20 +00:00
marfrit	abc993aa49	review followup: empty-input guards, ~/ symmetry, CMD: filter Addresses three concerns + one nit from the Phase 0 review pass. executor.lua: - M.exec guards empty / whitespace-only cmd up front, returns "(empty command)" / -1 instead of running the wrapper on nothing. - On sentinel-parse failure with empty output (typical of shell parse errors — the syntax error itself escapes to the popen parent's stderr because 2>&1 is inside the unparsable subshell), surface "(no output — possible shell parse error)" rather than a silent empty frame. - extract_cmd_lines now skips whitespace-only / empty bodies; a bare `CMD: ` line in assistant output no longer turns into an "execute ''? [y/N]" prompt. - "what" comments cleaned in maybe_chdir. router.lua: - path_like now matches `~` and `~/foo` so `~/scripts/build.sh` classifies as shell (was: ai). Restores symmetry with executor's maybe_chdir, which already expands `~` on `cd`. repl.lua: - :exec and :ask trim args and renderer.status a usage line on empty rather than running an empty cmd / sending an empty turn to broker. Regression: full prior smoke suite still passes — known_commands shell paths, all maybe_chdir branches, CMD: extraction with non-empty bodies, exec exit-code recovery, all router branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 17:41:35 +00:00
marfrit	a18e530c03	main: --config/--help arg parsing, vendor on package.path, REPL start Phase 0 entry point per PHASE0.md §4, §10. Resolves the §10 config search: --config <path> (explicit; failure if not openable, no fallback) $AISH_CONFIG ~/.config/aish/config.lua ./config.lua The explicit form now hard-fails instead of silently falling through to the next candidate — caught in smoke (`--config /nonexistent` was loading ./config.lua). Pre-pends `./?.lua;./vendor/?.lua` to package.path so `require("dkjson")` finds vendor/dkjson.lua and project requires resolve from the repo root. Run from the repo root; cwd-independent resolution lands later. `--help` prints the usage block. Unrecognized arg exits 2 with a diagnostic on stderr. Phase 0 done-criteria (PHASE0.md §2): ✓ shell command execution with framed output ✓ :meta commands (full §5.2 set) ✓ in-memory conversation history with sliding-window eviction ✓ codebase layout matches §4 — every module name stable for Phase 1+ ⏳ live AI exchange — structurally wired; live test deferred per issue #12 (broker endpoint hostname not resolvable from noether) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 17:12:25 +00:00
marfrit	e0e69f839b	repl: readline loop, dispatch, all Phase 0 meta commands Phase 0 implementation per PHASE0.md §5, §9. Wires the lower-half modules into a single REPL: ffi/readline -> input + history router -> classify(line) -> meta/shell/ai executor -> run_shell with cd interception, frame output, capture broker -> ask_ai, then extract+confirm CMD: lines from response context -> turn list + eviction; status line on evict renderer -> assistant text + exec frame + status Prompt format `[aish:<model>]> ` per §9. Meta commands all wired (§5.2): :quit/:q, :clear, :reset, :model <name>, :models, :history, :exec <cmd>, :ask <text>, :help. Unknown meta names report via renderer.status rather than crashing. End-of-input (Ctrl-D on empty line) breaks the loop cleanly. Empty / whitespace-only lines are skipped silently before dispatch — router would otherwise classify them as ai with empty payload and pollute context. `CMD: ` extraction + confirm-and-execute is wired: when broker returns an assistant turn, the response is scanned for §6 CMD: lines; each is prompted via readline ("execute '...'? [y/N]") when config.shell .confirm_cmd is true (default), else auto-executed. On broker error, the user turn just appended is popped so the context isn't polluted with a turn that has no assistant response. Smoke covers :help, :models, shell exec via known_commands allowlist, and Ctrl-D break. Live broker exchange deferred per issue #12. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 15:17:40 +00:00
marfrit	f22a3b33c8	renderer: assistant text, exec output frame, status line Phase 0 minimal output formatting per PHASE0.md skeleton. M.assistant(text) — line-by-line; `CMD: ` lines bold+cyan M.exec_output(output, code) — top/bottom rules; exit code on closing rule (red on non-zero) M.status(line) — dim "[aish] ..." single-liner ANSI table is local to the module (no external dep). Trailing-sentinel pattern ((text..\"\\n\"):gmatch(\"([^\\n]*)\\n\")) preserves blank lines in assistant output rather than squashing them, at the cost of one extra trailing newline — acceptable for Phase 0. Real syntax-aware formatting (tree-sitter) lands in Phase 6. Smoke verifies escape codes are emitted (od -c shows \\033[1m\\033[36m around CMD: line) and the visual layout looks right. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 14:42:56 +00:00
marfrit	f9f8b0370c	broker: blocking POST /v1/chat/completions via ffi/curl + dkjson Phase 0 implementation per PHASE0.md §6. M.chat(model_cfg, messages) -> content_string \| (nil, errmsg) Builds the OpenAI-compat JSON body: { model, messages, stream: false, temperature: model_cfg.temperature ?? 0.2 } Sends Content-Type and (optionally) Authorization Bearer pulled from model_cfg.key_env's process environment. Default timeout 60s; overridable per-model via model_cfg.timeout_ms. Error surfaces split: "transport: ..." curl-side (TCP/TLS/timeout) "decode: ..." non-JSON response body "api: ..." OpenAI-style { error: { message } } envelope "broker.chat: no choices[1].message.content..." shape miss Tested against four canned mock responses (nc -lN listener feeding HTTP/1.0 + Connection: close so EOF terminates the body): happy path, api error envelope, raw-text non-JSON, empty choices[]. The on-wire request body verified as well: POST path, headers, model/messages/ temperature/stream JSON. Live test against a real llama.cpp/hossenfelder endpoint deferred per issue #12 (broker endpoint configuration). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 14:10:00 +00:00
marfrit	91187d2302	router: classify(line, config) -> (kind, payload) Phase 0 implementation per PHASE0.md §5. Pure function. Three kinds: "meta" — line starts with ":", payload is the rest "shell" — line starts with "$" (override, $ stripped), OR first word is in config.shell.known_commands, OR first word is path-like (`./`, `../`, `/`) "ai" — everything else (including empty / whitespace-only; the repl loop skips empty payloads before dispatching) Path-like detection is deliberately conservative in Phase 0: anchored prefixes only, no quoted-path or shell-glob handling. Q4 in §13 tracks multi-command CMD: blocks; this router doesn't see those (it only classifies user input lines, not assistant output). Smoke covers all branches plus a nil-config fallthrough. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 12:05:25 +00:00
marfrit	5fb4023c55	executor: io.popen wrapper, cd interception, CMD: extraction Phase 0 implementation per PHASE0.md §6, §7. M.exec(cmd) -> (output, exit_code) M.maybe_chdir(cmd) -> nil \| true \| false, errmsg M.extract_cmd_lines(text)-> { "ls -la", "echo hi", ... } Two non-obvious bits: 1. LuaJIT 2.1's io.popen():close() follows the Lua 5.1 ABI and returns only `true` — no child exit status. The §7 manifest sketch assumes Lua 5.2's three-return form, which doesn't apply here. Recover the exit code by appending `; echo __AISH_EXIT_<tag>__$?` after the command and parsing the sentinel-prefixed integer back out. Phase 1 replaces this with waitpid via libc FFI when PTY support lands. 2. `cd` interception is a §3 invariant: must not delegate to popen (popen forks; a child cd evaporates). maybe_chdir parses the line, ~ expands, calls libc.chdir, returns success/failure separate from "not a cd" (nil) so the caller can distinguish. CMD: extraction is anchored at start-of-line per the §3 "exact prefix, single space" invariant — leading whitespace before CMD: does not match. Smoke covers: echo capture (code=0), failed ls (code!=0), `false` (code=1), multi-line output preserved, all maybe_chdir branches (non-cd / bare / explicit / ~ expansion / failure), CMD extraction including the leading-whitespace-rejection case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 12:03:19 +00:00
marfrit	10848645af	context: in-memory turn list + max_turns sliding-window eviction Phase 0 implementation per PHASE0.md §6, §8. Context.new(opts) constructs with the §6 default system prompt (the `CMD: ` extraction contract is hard-coded in there per §3 — locked substrate, do not edit). opts overrides: system_prompt, max_turns (default 40), token_budget (default 4096; visibility only in Phase 0 per Q1, deferred to Phase 3 for accurate tokenization). API: ctx:append({role, content}) record a turn ctx:to_messages() [{system,...}, ...turns] for broker.chat ctx:enforce_budget() evict pairs (user+assistant) until #turns <= max_turns; returns count ctx:estimate_tokens() char/4 heuristic ctx:reset() drop all turns (system_prompt kept) System prompt is the §6 phrasing verbatim including the `CMD: ` clause — stored on the context, NOT in self.turns, so it is prepended freshly on every to_messages() call. Smoke covers basic ops, no-evict-at-max, evict-on-overflow, bulk eviction (14 turns -> 4), reset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 11:59:25 +00:00
marfrit	5fd7c7ac63	ffi/curl: blocking POST with header list and response capture Phase 0 binding per PHASE0.md §6. M.post(url, body, headers, timeout_ms) uses CURLOPT_{URL, POST, POSTFIELDS, HTTPHEADER, WRITEFUNCTION, NOSIGNAL, TIMEOUT_MS, USERAGENT} on a fresh easy handle, capturing the response into a Lua string via a closure-based WRITEFUNCTION callback. curl_easy_setopt is variadic; LuaJIT's variadic FFI dispatch needs ffi.new() per argument otherwise. Pre-cast to three concrete signatures (long / void* / const char*) bypasses that — cleaner and matches the lua-curl idiom. Robust loader: tries `curl`, `curl.so.4`, `curl-gnutls.so.4` so a runtime-only host (no libcurl-dev installed) just works. Same idiom as ffi/readline. Smoke against a local nc listener: request was correctly framed (POST path, Content-Type + X-Test headers, Content-Length matches JSON body length) and the canned response was captured into the returned Lua string. SSE streaming for Phase 1 reuses this same WRITEFUNCTION hook — chunks arrive incrementally, the closure consumes them as they come. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 11:54:36 +00:00
marfrit	c9116c9bbf	ffi/readline: blocking readline() + add_history(), nil on EOF Phase 0 binding per PHASE0.md §9. M.readline(prompt) returns the line as a Lua string (the C buffer is freed via libc free immediately after ffi.string copies it) or nil on EOF. M.add_history skips empty lines. Loader handles the case where libreadline-dev's unversioned `libreadline.so` symlink isn't installed — falls through to `readline.so.8` (current Debian/Arch ALARM) and `.so.7` (older) before giving up. This trips on noether-the-LXD: only the runtime package is present. Smoke (stdin from heredoc, two lines + EOF): p1> hello world -> "hello world" p2> second line -> "second line" p3> -> nil (EOF) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 11:44:38 +00:00
marfrit	fd63dff65e	ffi/libc: implement chdir, errno, strerror Smallest Phase 0 module per CLAUDE.md §4 implementation order. M.chdir(path) returns (true) or (false, errmsg) — errmsg via strerror(__errno_location()[0]). Glibc errno is thread-local behind __errno_location() rather than a plain global, hence the indirect access. Verified against PHASE0.md §7 expectation: a libc.chdir() persists across subsequent io.popen() calls (popen's child inherits the parent's wd), which is the property executor.lua relies on for `cd` interception. Smoke: libc.chdir("/tmp"); io.popen("pwd"):read("*l") --> /tmp Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 11:35:17 +00:00
marfrit	2704edd57d	phase0 amendment: vendor dkjson 2.8 under vendor/ Captures the JSON-library decision noted as open in CLAUDE.md §6. dkjson is pure Lua (preserves §3's "no compiled extensions" invariant), single file, redistributable (MIT/X11). Sourced from Debian's `lua-dkjson` package (/usr/share/lua/5.1/dkjson.lua, version 2.8) — Debian's curated copy of the upstream at dkolf.de. Vendoring (rather than relying on a system lua-dkjson install) keeps aish self-contained per the §3 "no luarocks packages" invariant: any host with luajit can run the tree as-is. PHASE0.md §3 grows one row recording the choice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 11:30:16 +00:00
marfrit	7b5d58686e	docs: codify contribution flow — issues for features, PRs for review Captures two carve-outs to aish's "non-PR-flow repo" default: - Feature requests and bugs go to git.reauktion.de/marfrit/aish/issues rather than direct-implement-in-band. Tag `architecture` for cross- phase concerns. Aligns with the fleet-wide bug-filing convention from the `his` cheatsheet; this row extends it to features for aish. - Review-required iteration opens a PR (authored as claude-<host>, marfrit reviews, self-approval forbidden). PR #1 was the precedent. Both are opt-in; direct-to-main remains the default for autonomous work that doesn't need a feedback loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 11:04:51 +00:00
marfrit	fcfc23eef2	phase0 review: tighten phase 2 row + add Q9, Q10, sharpen Q6 (#1 )	2026-05-10 11:00:35 +00:00
claude-noether	e1d1931006	phase0 review: tighten phase 2 row + add Q9, Q10, sharpen Q6 Captures three findings from the review of `013c625` ("phase0 amendment: insert MCP phase 2"). Opening as a PR rather than direct-to-main: the non-PR-flow convention works fine for autonomous work, but feedback- required iteration needs a readable medium that isn't the Claude Code transcript. §11 phase 2 row: spell out two scope items the original row left implicit — the system-prompt rewrite to declare the tools schema (Phase 0's `CMD:` contract is hard-coded into the prompt) and `safety.lua` extension to gate tool calls (per Q8). §13 Q6: explicit note that choosing "retire `CMD:`" requires a §3 invariant amendment in the same commit — keeps the substrate-vs-phase boundary honest. Adds (§3 if retiring) to the impact column. §13 Q9 (new): MCP system-prompt augmentation locus — static block in broker.lua / per-request assembly from connected servers / hybrid. Real architectural call with token-cost tradeoff per option. §13 Q10 (new): tool-call streaming vs the Phase 1 SSE substrate — phase-ordering question. Either Phase 2 lands on the blocking Phase 0 broker and refits when SSE arrives, or Phase 1 SSE moves before MCP so tool-call deltas stream from day one.	2026-05-10 06:06:14 +00:00

1 2

54 Commits