Phase 7 (verify) anchor. Captures:
- MCP RPC round-trip timings against boltzmann lmcp v0.5.4 (all sub-100ms
on LAN; LLM is the latency floor, not the transport).
- 6 fixture responses saved to /tmp/aish-baseline/ covering initialize,
notifications/initialized, tools/list, tools/call success, isError,
and JSON-RPC unknown-tool error.
- Baseline design finding: boltzmann's read_file returns isError:false
even on failure (error text in content). aish should treat content as
authoritative, isError as advisory; feed both to the model. PHASE2.md
§4's "pass-through" stance already accommodates; no manifest amendment
needed.
- Streaming tool_calls delta shape verified against hossenfelder; matches
PHASE2.md §5.
- Pre-MCP aish behavior snapshot: loaded model emits markdown code-fence
ignoring the CMD: contract — once MCP tools exist the model gets a
structured path that doesn't depend on prose-formatting compliance.
- Module pre-state at Phase 1 head 5878f73: LOC + capability snapshot
per module so Phase 2 diff has a reference frame.
- Two boltzmann-proxy blockers (SSE buffering, model-field routing)
carried explicitly into Phase 7.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live-probed against lmcp v0.5.4 (boltzmann) + hossenfelder broker proxy:
Transport simpler than spec:
- lmcp only implements POST-per-RPC with Connection: close; no held-open
SSE channel. Combined with capabilities.tools.listChanged=false, no
client-side listener is needed in v1. Drops the planned M.get_sse
addition to ffi/curl.lua — Phase 1's M.post covers MCP.
Bearer auth is universal across the fleet — config schema grew
auth_token (literal) and auth_env (env-var indirection) fields per
server, mirroring PHASE0 §10's key_env convention.
Streaming tool_calls delta shape verified — accumulator by `index`,
function.arguments arrives as chunked JSON-string. Matches the
formulate-phase assumption in §5.
Resolutions:
Q17 transport abstraction — POST-only, no SSE channel for lmcp.
Q21 error mapping — result.isError (model-recoverable, feed
back as tool turn) vs JSON-RPC error
(unknown method/tool, transport-level).
Q18 role:"tool" turn — accepted at protocol level (live-probed).
Mistral-nemo template verification
blocked by the hossenfelder model-field
routing bug; full closure carried to
Phase 7 verify.
Open-end recorded in §11: the hossenfelder proxy routes every request
to the loaded fast model regardless of model field, blocking Phase 2
testing against mistral-nemo specifically. Parallel to the SSE
buffering issue at marfrit/aish#15; same root (boltzmann proxy code).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CONCERNs from the Phase 1 review pass:
ffi/curl.lua:
- SSE write_cb body is now pcall-wrapped. A Lua error in on_event (or
in the parse loop itself) is captured into cb_error and surfaced
after curl_easy_perform rather than propagating across the FFI
callback boundary (which LuaJIT documents as process-fatal). The
EOS flush path gets the same shield. Errors return
(nil, "callback: <msg>") from post_sse.
history.lua:
- sh_singlequote() escapes shell metacharacters; the mkdir -p and
ls -1 shell-outs no longer double-quote (where $(...) and $VAR
still expand) — single-quote with embedded-' escaping is the
safe form.
- M.load now returns (turns, meta) instead of (meta, turns). turns
is ALWAYS a table on success, never nil-when-no-header; failure
path is the unambiguous (nil, err). Callers can `if not turns
then` without the previous ambiguity. repl.lua :resume updated
to the new shape.
repl.lua :resume:
- Refuse to resume into a non-empty ctx — silent overwrite was the
Q15 default, but the review surfaced the no-undo / no-warning
failure mode. User must :reset (or :save then re-launch) to
express intent. The current session's on-disk log is unaffected
either way.
NITs:
- ffi/libc.lua READ_BUF: comment noting it's module-shared and
Phase 1 has no reentrant readers; revisit when that changes.
- PHASE1.md §7: \C-x\C-c reservation pinned to Phase 3 ("deferred
from Phase 1 — no consumer here") rather than the previous
dangling "(or here)".
Regression suite verifies:
- history.load new signature on success + failure paths
- shell-quoted history.dir with $ doesn't trip
- aish scripted run: ctx with 2 turns refuses :resume anchor with
a clear status; user must :reset first
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 review caught a structural gap: executor.exec only drained the
PTY master fd, never forwarded user keystrokes — vim/less/htop/nano
would render and hang on input. PHASE1.md §5 specified bidirectional
multiplex but only the read leg landed. tcgetattr/tcsetattr were also
missing, so even with input forwarding the parent's line discipline
would buffer until newline (breaking single-key UIs).
ffi/libc:
- struct termios opaque buffer + tcgetattr/tcsetattr + cfmakeraw
- M.set_raw(fd) saves termios + applies cfmakeraw; returns saved or
(nil, err) when fd isn't a tty (scripted / piped-stdin runs)
- M.restore_termios(fd, saved)
- struct pollfd + M.poll (POLLIN constant)
executor:
- multiplex(sess): poll(stdin, master); reads master on any revents
(POLLHUP fires when child closes its slave end, not POLLIN — the
revents != 0 check catches both); forwards stdin keystrokes to
master; loop exits when master read returns 0 (EOF / child gone)
- stdin polling is only enabled when stdin_is_tty (set_raw succeeded);
piped-stdin runs (tests / scripted) would otherwise drain queued
aish commands into the child of the *current* cmd, swallowing them
- raw mode is restored before returning so the user lands back at the
aish prompt in canonical mode
renderer + repl:
- exec_output(out, code) split into exec_begin() (top rule, before
spawn) + exec_end(code) (closing rule with exit, after wait). PTY
multiplex streams the body live to stdout in between; the renderer
never re-prints the body.
PHASE1.md §3:
- tcgetattr/tcsetattr changed from "optional" to "required for
single-key UIs to work — done-criteria #2"; poll added to the libc
row description.
Verified:
- non-interactive smoke (echo / false / exit 7 / ls /nonexistent /
printf multi-line) — all exit codes correct, output streamed live,
a\nb\nc\n preserved byte-for-byte
- scripted-stdin run reaches all expected lines (no stdin draining
into a non-interactive child)
- aish prompt + framed exec block + exit-code line all render in
correct order
Live interactive verification (vim / less / htop in a real terminal)
still needs a user-test pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the Phase 0 io.popen + sentinel-echo exit-code recovery with
forkpty + waitpid via ffi/pty. The §7 amendment paragraph on PHASE0.md
is rewritten to point at PHASE1.md §5 — the workaround is gone, not
just renamed.
User-visible behavioral changes:
- Interactive commands (vim, less, htop, top) now work via $cmd /
:exec / known-command shell paths because the child has a real
PTY for line discipline.
- Exit codes are accurate: `false` -> 1, `exit 7` -> 7, signal kill
-> 128+N (bash convention), shell parse error -> sh's 2.
- Broken-shell-syntax cmd now shows the actual sh diagnostic
(e.g. "Syntax error: end of file unexpected") instead of Phase 0's
"(no output — possible shell parse error)" guess.
- Output normalization: PTY emits CR LF; executor collapses \r\n
-> \n to keep the Phase 0 contract ("output uses \n separators").
Code path:
pty.spawn(cmd) -> drain master_fd until EOF
-> wait() returns ("exit", N) | ("signal", N) | ...
-> exit_code mapped: exit -> N, signal -> 128+N, else -1
Phase 0 invariants intact: `cd` interception unchanged (still libc.chdir
per §3 + §7), `CMD: ` extraction unchanged.
PHASE0.md §7: the "LuaJIT 2.1 popen-close caveat" paragraph is rewritten
to "Superseded by Phase 1" — points at PHASE1.md §5 for the live model.
The illustrative sketch is left in place as historical context.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-test surfaced the bug: with `deep` (mistral-nemo-12b) active,
running `list files` -> y on `CMD: ls` -> `Are there directory entries
beginning with "lor"?` returned a Jinja exception:
api: ... Error: Jinja Exception: After the optional system message,
conversation roles must alternate user/assistant/user/assistant/...
Cause: §6 specified "exec output injected into context uses role 'user'
with a prefix tag '[exec output]'." This works for permissive templates
(qwen2.5-coder-1.5b, the `fast` preset) but produces a back-to-back
user/user pair on strict templates that enforce the OpenAI alternation
contract — `[exec output]` user turn followed by the user's actual
follow-up question.
Fix:
context.lua:
- new field `pending_exec_output` (initially nil)
- new method `:append_exec_output(out)` buffers (concat on subsequent
captures so multi-shell-then-ai still merges everything)
- new method `:append_user(content)` flushes buffered exec output as
a `[exec output]\n...\n\n` prefix and appends a user turn
- `:reset()` also clears the buffer
repl.lua:
- run_shell calls ctx:append_exec_output(out) instead of
ctx:append({role="user", content="[exec output]\n"..out})
- ask_ai calls ctx:append_user(text) instead of raw :append; saves
prev_pending so a broker error can restore the buffer for retry
PHASE0.md §6:
- amended the role-injection paragraph to describe the buffer-and-
prepend policy; the §3 invariants list is untouched (this was a §6
design detail, not a locked invariant)
Verification:
- context unit tests cover: alternation after the failing sequence,
multi-shell merge, reset clears buffer, broker-error retry path
- live reproduction against `deep` (mistral-nemo) of the exact
user-reported sequence succeeds; model responds with a sensible
`CMD: ls | grep '^lor'` instead of a Jinja exception
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three additions to PHASE0.md, all surfaced by the Phase 5 review of
the Phase 0 implementation. No invariant changes; manifest now matches
implementation reality.
§3 — FFI loader fallback paragraph. ffi.load("name") needs the
unversioned `libname.so` symlink that comes with the -dev package.
Phase 0 loaders try unversioned first then versioned sonames so
runtime-only hosts (no -dev) work as-is. Documents the actual
behavior in ffi/readline.lua and ffi/curl.lua.
§7 — LuaJIT 2.1 popen-close caveat paragraph. The §7 sketch had been
showing Lua 5.2's three-return io.popen():close() shape; LuaJIT 2.1
follows the Lua 5.1 ABI and returns just `true`. Phase 0 recovers
the exit status with a sentinel echo (`echo __AISH_EXIT_<tag>__$?`).
Phase 1 PTY+waitpid replaces the hack and the sketch becomes
accurate. Sketch left as-is (it's the right shape conceptually);
caveat now explicit.
§10 — cwd-relative package.path note. Phase 0 prepends `./?.lua;
./vendor/?.lua`, so aish must run from the repo root. Cwd-independent
resolution is a later concern. Also clarifies that --config is strict
(no fallback if the path is unopenable) — matches main.lua post the
review-followup commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the JSON-library decision noted as open in CLAUDE.md §6.
dkjson is pure Lua (preserves §3's "no compiled extensions" invariant),
single file, redistributable (MIT/X11). Sourced from Debian's `lua-dkjson`
package (/usr/share/lua/5.1/dkjson.lua, version 2.8) — Debian's curated
copy of the upstream at dkolf.de.
Vendoring (rather than relying on a system lua-dkjson install) keeps
aish self-contained per the §3 "no luarocks packages" invariant: any
host with luajit can run the tree as-is.
PHASE0.md §3 grows one row recording the choice.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures three findings from the review of 013c625 ("phase0 amendment:
insert MCP phase 2"). Opening as a PR rather than direct-to-main: the
non-PR-flow convention works fine for autonomous work, but feedback-
required iteration needs a readable medium that isn't the Claude Code
transcript.
§11 phase 2 row: spell out two scope items the original row left implicit —
the system-prompt rewrite to declare the tools schema (Phase 0's `CMD:`
contract is hard-coded into the prompt) and `safety.lua` extension to
gate tool calls (per Q8).
§13 Q6: explicit note that choosing "retire `CMD:`" requires a §3
invariant amendment in the same commit — keeps the substrate-vs-phase
boundary honest. Adds (§3 if retiring) to the impact column.
§13 Q9 (new): MCP system-prompt augmentation locus — static block in
broker.lua / per-request assembly from connected servers / hybrid.
Real architectural call with token-cost tradeoff per option.
§13 Q10 (new): tool-call streaming vs the Phase 1 SSE substrate —
phase-ordering question. Either Phase 2 lands on the blocking Phase 0
broker and refits when SSE arrives, or Phase 1 SSE moves before MCP
so tool-call deltas stream from day one.
MCP/tool-calling lands as a distinct phase, before Norris mode so the
autonomous planner has tools as substrate. lmcp speaks MCP standard
JSON-RPC 2.0 over HTTP/SSE — fits the existing libcurl FFI plan; tool
calls ride the OpenAI-compatible `tools` field on /v1/chat/completions,
so the §6 broker contract is unchanged at the transport level.
§8: tokenization concern bumped Phase 2 → Phase 3 (still tracks Norris).
§11: Norris→3, memory→4, routing→5, tree-sitter→6.
§13: Q1/Q2/Q3/Q5 phase numbers tracked the renumber; added Q6 (CMD: vs
tools coexistence), Q7 (server discovery), Q8 (tool-call auth gate).
No §3 invariant broken. No code touched — Phase 0 implementation per
the locked manifest is still the next move.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README, .gitignore, CLAUDE.md (project conventions)
- docs/PHASE0.md — full Phase 0 manifest (locked substrate)
- 10 root .lua modules + 4 ffi/ bindings, all stubs raising NotImplemented
with module-scoped responsibilities matching the manifest
- config.lua wired to current dirac/hossenfelder endpoints (qwen-coder-7b
snappy/32k + cloud via OpenRouter through hossenfelder)
File names match docs/PHASE0.md §4 exactly. Module bodies fill in across
later phases; the tree shape is locked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>