marfrit/aish - aish - marfrit's space

Author	SHA1	Message	Date
marfrit	00869ba412	docs/PHASE8: formulate — accurate tokenization (resolves Q1) Phase 8 formulate manifest + PHASE0 §11 amendment to add the Phase 8 row (substrate amendment per CLAUDE.md §3 lands same commit). Four pillars: 1. Per-endpoint /tokenize probe (cached). One round-trip on first call per (endpoint, model); capability cached for session. hossenfelder + llama.cpp expose <endpoint>/tokenize (NOT /v1/ tokenize — per real probe; the path is endpoint-local, not under the OpenAI /v1 prefix). Cloud (OpenRouter) 404s — silent char/4 fallback. 2. broker.token_count(model_cfg, text) — thin wrapper; tries probe, falls back to char/4 on miss. Always returns non-negative int; never errors. 2s tight timeout; failures cache as not-supported. 3. Context:estimate_tokens widened. Accepts optional tokenize_fn at Context.new; uses it when present, char/4 otherwise. repl.lua wires `tokenize_fn = function(text) return broker.token_count( active_cfg, text) end` when cfg.tokenize.use_endpoint = true. Per-turn _tokens cache to amortize across estimate calls. 4. :cost detail est-vs-actual annotation. When the heuristic disagrees with the actual prompt_tokens from broker usage by >10%, show `~est=N`. Silent otherwise. Display-only; no behavior change. Resolves Q1 (PHASE0 §13, originally Phase 3) — replace char/4 heuristic on Context:estimate_tokens. Originally targeted at Phase 3 but deferred forward each iteration; now lands. Baseline already observed during formulate: - /v1/tokenize -> 404 on hossenfelder; /tokenize -> works - Body shape: {content: "..."} returns {tokens: [N1, N2, ...]} - Accuracy gap: char/4 UNDERESTIMATES by ~10% on real code/prose (508 vs 558 on a 2KB README sample). Material for context- budget eviction decisions. Doc covers scope + done-when, tech decisions table, module changes, per-pillar deep dives, UX surface, out of scope, 6 risk rows, 6 open questions (Q-T4/T5 baseline-bound, others analyze-bound). Scope confirmed via AskUserQuestion: tokenization (chosen over cross-session cost persistence and hard rate-limit enforcement). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:19:53 +00:00
marfrit	3bad07b2da	docs/PHASE7: formulate — cost / usage observability Phase 7 formulate manifest + PHASE0 §11 amendment to add the Phase 7 row (substrate amendment per CLAUDE.md §3, lands in the same commit). Four pillars: 1. Usage capture in broker.chat_stream — extract `usage` from the final SSE chunk (OpenAI streaming spec with `stream_options: {include_usage: true}`). Surface via new on_delta("usage", payload) kind. broker.chat returns (text, usage) — backward- compat: existing callers ignore the second value. 2. Per-session accumulator on ctx — ctx.usage_totals[model][category] tables (categories: main / delegate / summarize / memory_summarize / probe / norris, tagged at the call site via opts.category). :reset preserves usage_totals (R8 parity with memory_items / project). Session JSONL gains an optional `usage` field on assistant turns for after-the-fact analysis. 3. :cost meta surface — :cost (summary), :cost detail (per-model + per-category breakdown), :cost reset (zero the meter). Pure-Lua read of ctx.usage_totals; no broker calls. 4. Optional warn thresholds — cfg.cost.warn_at_dollars / warn_at_tokens emit a one-shot status when crossed. Default off; useful with cloud presets configured. Doc covers scope + done-when criteria, tech decisions table, module changes, per-pillar deep dive with code sketches, UX surface, out of scope, risks, 6 open questions to resolve in analyze. Open at formulate: Q-C1 — provider-without-usage handling (local llama.cpp probably) Q-C2 — cross-session persistence (defer to phase 8) Q-C3 — categories closed-set vs free-form Q-C4 — does hossenfelder forward stream_options to all backends? Q-C5 — warn fires on the call that crosses, or the next one? Q-C6 — :reset clears cost_warn_fired too, or only :cost reset? Scope confirmed via AskUserQuestion: cost/usage observability (chosen over project-local config overlay and session search/tag). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:47:58 +00:00
marfrit	6c194deea0	mcp: JSON-RPC client + ffi/curl status_code; PHASE0 §4 amended First commit of Phase 2 per docs/PHASE2.md §12. Three changes bundled: mcp.lua (new, 153 lines): - M.connect(url, opts) returns a Session. - Session:initialize() round-trips initialize + notifications/initialized + tools/list. Caches tools for session lifetime (lmcp announces capabilities.tools.listChanged = false; no refetch). - Session:list_tools() returns the cached tool list. - Session:call_tool(name, args) returns (result_table, kind) where kind ∈ {"ok", "handler_error", "rpc_error", "transport_error"} per the §4 error split. Folded HTTP-level failure into transport_error. - Per-server Bearer auth via opts.auth_token or opts.auth_env env-var indirection. - Captures protocolVersion mismatch as a warning string rather than aborting (lmcp doesn't negotiate — N3 in review). ffi/curl.lua extension: - Add curl_easy_getinfo to ffi.cdef. - Pre-cast as getinfo_long; helper get_response_code() fetches CURLINFO_RESPONSE_CODE (decimal 2097154 = CURLINFOTYPE_LONG \| 2). - M.post now returns (body, status_code) on transport success; (nil, errmsg) on libcurl failure stays unchanged. Phase 1 callers reading only the first slot are unaffected. docs/PHASE0.md §4: - Insert `mcp.lua` between broker.lua and router.lua per PHASE2.md §9. - Module-stability invariant clarified: rename prohibition is what matters; adding new files is additive. Smoke-test passes for all four kinds against boltzmann lmcp v0.5.4: - initialize: ok (7 tools cached) - list_dir /tmp: ok (1.2KB content) - read_file /nonexistent: ok (boltzmann's baseline §3 quirk — isError:false even on failure; content is authoritative) - nope_tool: rpc_error (code=-32601) - wrong auth: transport_error (HTTP 401) - unreachable host: transport_error (DNS failure) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 13:06:39 +00:00
marfrit	ee4d7f86d6	executor: swap popen+sentinel for pty.spawn (Phase 1) Replaces the Phase 0 io.popen + sentinel-echo exit-code recovery with forkpty + waitpid via ffi/pty. The §7 amendment paragraph on PHASE0.md is rewritten to point at PHASE1.md §5 — the workaround is gone, not just renamed. User-visible behavioral changes: - Interactive commands (vim, less, htop, top) now work via $cmd / :exec / known-command shell paths because the child has a real PTY for line discipline. - Exit codes are accurate: `false` -> 1, `exit 7` -> 7, signal kill -> 128+N (bash convention), shell parse error -> sh's 2. - Broken-shell-syntax cmd now shows the actual sh diagnostic (e.g. "Syntax error: end of file unexpected") instead of Phase 0's "(no output — possible shell parse error)" guess. - Output normalization: PTY emits CR LF; executor collapses \r\n -> \n to keep the Phase 0 contract ("output uses \n separators"). Code path: pty.spawn(cmd) -> drain master_fd until EOF -> wait() returns ("exit", N) \| ("signal", N) \| ... -> exit_code mapped: exit -> N, signal -> 128+N, else -1 Phase 0 invariants intact: `cd` interception unchanged (still libc.chdir per §3 + §7), `CMD: ` extraction unchanged. PHASE0.md §7: the "LuaJIT 2.1 popen-close caveat" paragraph is rewritten to "Superseded by Phase 1" — points at PHASE1.md §5 for the live model. The illustrative sketch is left in place as historical context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:08:27 +00:00
marfrit	16490e6905	fix: buffer exec output for next user turn; alternation for strict templates User-test surfaced the bug: with `deep` (mistral-nemo-12b) active, running `list files` -> y on `CMD: ls` -> `Are there directory entries beginning with "lor"?` returned a Jinja exception: api: ... Error: Jinja Exception: After the optional system message, conversation roles must alternate user/assistant/user/assistant/... Cause: §6 specified "exec output injected into context uses role 'user' with a prefix tag '[exec output]'." This works for permissive templates (qwen2.5-coder-1.5b, the `fast` preset) but produces a back-to-back user/user pair on strict templates that enforce the OpenAI alternation contract — `[exec output]` user turn followed by the user's actual follow-up question. Fix: context.lua: - new field `pending_exec_output` (initially nil) - new method `:append_exec_output(out)` buffers (concat on subsequent captures so multi-shell-then-ai still merges everything) - new method `:append_user(content)` flushes buffered exec output as a `[exec output]\n...\n\n` prefix and appends a user turn - `:reset()` also clears the buffer repl.lua: - run_shell calls ctx:append_exec_output(out) instead of ctx:append({role="user", content="[exec output]\n"..out}) - ask_ai calls ctx:append_user(text) instead of raw :append; saves prev_pending so a broker error can restore the buffer for retry PHASE0.md §6: - amended the role-injection paragraph to describe the buffer-and- prepend policy; the §3 invariants list is untouched (this was a §6 design detail, not a locked invariant) Verification: - context unit tests cover: alternation after the failing sequence, multi-shell merge, reset clears buffer, broker-error retry path - live reproduction against `deep` (mistral-nemo) of the exact user-reported sequence succeeds; model responds with a sensible `CMD: ls \| grep '^lor'` instead of a Jinja exception Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 18:41:21 +00:00
marfrit	a76ff664b3	phase0 amendment: §3/§7/§10 close review-surfaced manifest gaps Three additions to PHASE0.md, all surfaced by the Phase 5 review of the Phase 0 implementation. No invariant changes; manifest now matches implementation reality. §3 — FFI loader fallback paragraph. ffi.load("name") needs the unversioned `libname.so` symlink that comes with the -dev package. Phase 0 loaders try unversioned first then versioned sonames so runtime-only hosts (no -dev) work as-is. Documents the actual behavior in ffi/readline.lua and ffi/curl.lua. §7 — LuaJIT 2.1 popen-close caveat paragraph. The §7 sketch had been showing Lua 5.2's three-return io.popen():close() shape; LuaJIT 2.1 follows the Lua 5.1 ABI and returns just `true`. Phase 0 recovers the exit status with a sentinel echo (`echo __AISH_EXIT_<tag>__$?`). Phase 1 PTY+waitpid replaces the hack and the sketch becomes accurate. Sketch left as-is (it's the right shape conceptually); caveat now explicit. §10 — cwd-relative package.path note. Phase 0 prepends `./?.lua; ./vendor/?.lua`, so aish must run from the repo root. Cwd-independent resolution is a later concern. Also clarifies that --config is strict (no fallback if the path is unopenable) — matches main.lua post the review-followup commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 17:44:20 +00:00
marfrit	2704edd57d	phase0 amendment: vendor dkjson 2.8 under vendor/ Captures the JSON-library decision noted as open in CLAUDE.md §6. dkjson is pure Lua (preserves §3's "no compiled extensions" invariant), single file, redistributable (MIT/X11). Sourced from Debian's `lua-dkjson` package (/usr/share/lua/5.1/dkjson.lua, version 2.8) — Debian's curated copy of the upstream at dkolf.de. Vendoring (rather than relying on a system lua-dkjson install) keeps aish self-contained per the §3 "no luarocks packages" invariant: any host with luajit can run the tree as-is. PHASE0.md §3 grows one row recording the choice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 11:30:16 +00:00
claude-noether	e1d1931006	phase0 review: tighten phase 2 row + add Q9, Q10, sharpen Q6 Captures three findings from the review of `013c625` ("phase0 amendment: insert MCP phase 2"). Opening as a PR rather than direct-to-main: the non-PR-flow convention works fine for autonomous work, but feedback- required iteration needs a readable medium that isn't the Claude Code transcript. §11 phase 2 row: spell out two scope items the original row left implicit — the system-prompt rewrite to declare the tools schema (Phase 0's `CMD:` contract is hard-coded into the prompt) and `safety.lua` extension to gate tool calls (per Q8). §13 Q6: explicit note that choosing "retire `CMD:`" requires a §3 invariant amendment in the same commit — keeps the substrate-vs-phase boundary honest. Adds (§3 if retiring) to the impact column. §13 Q9 (new): MCP system-prompt augmentation locus — static block in broker.lua / per-request assembly from connected servers / hybrid. Real architectural call with token-cost tradeoff per option. §13 Q10 (new): tool-call streaming vs the Phase 1 SSE substrate — phase-ordering question. Either Phase 2 lands on the blocking Phase 0 broker and refits when SSE arrives, or Phase 1 SSE moves before MCP so tool-call deltas stream from day one.	2026-05-10 06:06:14 +00:00
marfrit	ca8ff107c7	docs: fix Phase-N references stale after MCP renumber Sweep four call-sites pointing at the wrong phase number: - README.md:19 — Norris mode "Phase 2" → Phase 3 (renumbered by `013c625`) - README.md:62 — safety.lua "Phase 2+" → Phase 3+ (same renumber) - PHASE0.md:58 — safety.lua "(Phase 1)" → (Phase 3) (was wrong pre-013c625 too — referenced Phase 1 when Norris was actually Phase 2) - PHASE0.md:214 — Norris-mode prompt example "(Phase 1)" → (Phase 3) (same pre-existing wrong reference) Caught by review of `013c625`. No semantic change; mechanical phase-number sweep only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 05:43:58 +00:00
marfrit	013c6257f2	phase0 amendment: insert MCP phase 2, renumber subsequent phases MCP/tool-calling lands as a distinct phase, before Norris mode so the autonomous planner has tools as substrate. lmcp speaks MCP standard JSON-RPC 2.0 over HTTP/SSE — fits the existing libcurl FFI plan; tool calls ride the OpenAI-compatible `tools` field on /v1/chat/completions, so the §6 broker contract is unchanged at the transport level. §8: tokenization concern bumped Phase 2 → Phase 3 (still tracks Norris). §11: Norris→3, memory→4, routing→5, tree-sitter→6. §13: Q1/Q2/Q3/Q5 phase numbers tracked the renumber; added Q6 (CMD: vs tools coexistence), Q7 (server discovery), Q8 (tool-call auth gate). No §3 invariant broken. No code touched — Phase 0 implementation per the locked manifest is still the next move. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 05:37:58 +00:00
claude-noether	4310207738	Phase 0: scaffold tree + manifest - README, .gitignore, CLAUDE.md (project conventions) - docs/PHASE0.md — full Phase 0 manifest (locked substrate) - 10 root .lua modules + 4 ffi/ bindings, all stubs raising NotImplemented with module-scoped responsibilities matching the manifest - config.lua wired to current dirac/hossenfelder endpoints (qwen-coder-7b snappy/32k + cloud via OpenRouter through hossenfelder) File names match docs/PHASE0.md §4 exactly. Module bodies fill in across later phases; the tree shape is locked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 23:16:07 +00:00

11 Commits