First commit of Phase 2 per docs/PHASE2.md §12. Three changes bundled:
mcp.lua (new, 153 lines):
- M.connect(url, opts) returns a Session.
- Session:initialize() round-trips initialize + notifications/initialized
+ tools/list. Caches tools for session lifetime (lmcp announces
capabilities.tools.listChanged = false; no refetch).
- Session:list_tools() returns the cached tool list.
- Session:call_tool(name, args) returns (result_table, kind) where
kind ∈ {"ok", "handler_error", "rpc_error", "transport_error"} per
the §4 error split. Folded HTTP-level failure into transport_error.
- Per-server Bearer auth via opts.auth_token or opts.auth_env env-var
indirection.
- Captures protocolVersion mismatch as a warning string rather than
aborting (lmcp doesn't negotiate — N3 in review).
ffi/curl.lua extension:
- Add curl_easy_getinfo to ffi.cdef.
- Pre-cast as getinfo_long; helper get_response_code() fetches
CURLINFO_RESPONSE_CODE (decimal 2097154 = CURLINFOTYPE_LONG | 2).
- M.post now returns (body, status_code) on transport success;
(nil, errmsg) on libcurl failure stays unchanged. Phase 1 callers
reading only the first slot are unaffected.
docs/PHASE0.md §4:
- Insert `mcp.lua` between broker.lua and router.lua per PHASE2.md §9.
- Module-stability invariant clarified: rename prohibition is what
matters; adding new files is additive.
Smoke-test passes for all four kinds against boltzmann lmcp v0.5.4:
- initialize: ok (7 tools cached)
- list_dir /tmp: ok (1.2KB content)
- read_file /nonexistent: ok (boltzmann's baseline §3 quirk —
isError:false even on failure; content is authoritative)
- nope_tool: rpc_error (code=-32601)
- wrong auth: transport_error (HTTP 401)
- unreachable host: transport_error (DNS failure)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-up NITs from the post-fold-in review:
(1) Disambiguate M.post return shape: (body, status_code) on transport
success regardless of status; (nil, errmsg) on libcurl failure
stays unchanged. Phase 1 callers reading only the first slot are
unaffected.
(2) Note that the M.post extension requires extending ffi.cdef to
include curl_easy_getinfo + CURLINFO_RESPONSE_CODE (decimal
2097154, CURLINFOTYPE_LONG | 2) and a long[1] out-param shim.
Implementation detail the commit #1 author will need.
(3) Move the tool-result content-flattening rule from §12 risk note
into §4 normative spec (forward-referenced both ways) — §4 is
where a future reader looking for the tool-invocation contract
will scan.
No design changes; clarifications only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Independent review of the formulate+analyze+plan draft surfaced design
gaps that would have shipped as silent bugs. Resolutions applied:
BLOCKERs:
B1 context.lua impact widened — Phase 1 :append asserts content and
discards extra fields. Need (a) shape-per-role assert, (b) preserve
tool_calls/tool_call_id on store, (c) emit from to_messages().
B2 ffi/curl.M.post extended to return (body, status_code). lmcp's
401 returns a non-JSON-RPC body that would have been mis-decoded.
B3 §3 typo schema -> inputSchema.
B4 pending_exec_output × tool-call sub-loop interaction specified.
B5 §3/§12 broker dependency contradiction — broker takes opts.tools
from caller; no layering inversion.
CONCERNs:
C1 M.chat return polymorphism dropped (no consumer).
C2 tool_calls[].index absent fallback: default to 0.
C3 Re-injection stores accumulated text, not hard-coded empty.
C4 :mcp connect failure: no auto-retry, status-log once.
C5/C7 JSON-RPC error AND argument-parse failure both synthesize a
role:"tool" turn — keeps strict-template alternation legal
exactly the way PHASE0 §6 demanded for exec output.
C6 §9 confirms §4 amendment is additive (preserves §3 invariant).
NITs:
N3 protocolVersion fallback (lmcp doesn't negotiate).
N4 Alternation assert in Context:append.
N7 Model-routing bug filed as aish#23.
N8 Day-one fallback test for use_tool_role=false in commit #3.
Manifest status: Plan (review folded). Status line and Resolutions
sections updated; commit-by-commit roadmap reflects revised specs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bottom-up: mcp.lua → safety.lua → context.lua → renderer.lua → broker.lua
→ repl.lua → config.lua. Same cadence as Phase 0/1.
Risks called out explicitly:
- Empty tools array → omit field entirely (some servers reject [])
- isError:false on actual failure (baseline §3 finding) → pass content
through regardless; let model read error text
- JSON-RPC error from tools/call → aish status only, no tool turn
appended, no model recovery
- max_tool_depth=8 cap on tool-call sub-loop
- Argument JSON streaming may yield malformed JSON → status warn + skip
- Q18 fallback (use_tool_role=true default; prefix-injection plumbed
but dead-coded; verify can flip)
- Connect-at-startup is sequential (~30ms × N); fine for N≤3
Two items left open for review: Q18 default flip vs ship-true-flip-on-fail,
and whether :mcp connect should re-fetch tools after the initial cache.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 7 (verify) anchor. Captures:
- MCP RPC round-trip timings against boltzmann lmcp v0.5.4 (all sub-100ms
on LAN; LLM is the latency floor, not the transport).
- 6 fixture responses saved to /tmp/aish-baseline/ covering initialize,
notifications/initialized, tools/list, tools/call success, isError,
and JSON-RPC unknown-tool error.
- Baseline design finding: boltzmann's read_file returns isError:false
even on failure (error text in content). aish should treat content as
authoritative, isError as advisory; feed both to the model. PHASE2.md
§4's "pass-through" stance already accommodates; no manifest amendment
needed.
- Streaming tool_calls delta shape verified against hossenfelder; matches
PHASE2.md §5.
- Pre-MCP aish behavior snapshot: loaded model emits markdown code-fence
ignoring the CMD: contract — once MCP tools exist the model gets a
structured path that doesn't depend on prose-formatting compliance.
- Module pre-state at Phase 1 head 5878f73: LOC + capability snapshot
per module so Phase 2 diff has a reference frame.
- Two boltzmann-proxy blockers (SSE buffering, model-field routing)
carried explicitly into Phase 7.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live-probed against lmcp v0.5.4 (boltzmann) + hossenfelder broker proxy:
Transport simpler than spec:
- lmcp only implements POST-per-RPC with Connection: close; no held-open
SSE channel. Combined with capabilities.tools.listChanged=false, no
client-side listener is needed in v1. Drops the planned M.get_sse
addition to ffi/curl.lua — Phase 1's M.post covers MCP.
Bearer auth is universal across the fleet — config schema grew
auth_token (literal) and auth_env (env-var indirection) fields per
server, mirroring PHASE0 §10's key_env convention.
Streaming tool_calls delta shape verified — accumulator by `index`,
function.arguments arrives as chunked JSON-string. Matches the
formulate-phase assumption in §5.
Resolutions:
Q17 transport abstraction — POST-only, no SSE channel for lmcp.
Q21 error mapping — result.isError (model-recoverable, feed
back as tool turn) vs JSON-RPC error
(unknown method/tool, transport-level).
Q18 role:"tool" turn — accepted at protocol level (live-probed).
Mistral-nemo template verification
blocked by the hossenfelder model-field
routing bug; full closure carried to
Phase 7 verify.
Open-end recorded in §11: the hossenfelder proxy routes every request
to the loaded fast model regardless of model field, blocking Phase 2
testing against mistral-nemo specifically. Parallel to the SSE
buffering issue at marfrit/aish#15; same root (boltzmann proxy code).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds .claude/settings.json — 10 read-only entries (mcp__*__read_file,
mcp__hub-tools__remote_list_hosts, Bash(ping *), Bash(dig *)) auto-allowed
in any aish session, reducing per-call permission prompts during routine
file-reading and host probing. Generated via /fewer-permission-prompts.
settings.local.json stays user-private (per-user ad-hoc grants); .gitignore
now covers it so it doesn't accidentally land in commits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CONCERNs from the Phase 1 review pass:
ffi/curl.lua:
- SSE write_cb body is now pcall-wrapped. A Lua error in on_event (or
in the parse loop itself) is captured into cb_error and surfaced
after curl_easy_perform rather than propagating across the FFI
callback boundary (which LuaJIT documents as process-fatal). The
EOS flush path gets the same shield. Errors return
(nil, "callback: <msg>") from post_sse.
history.lua:
- sh_singlequote() escapes shell metacharacters; the mkdir -p and
ls -1 shell-outs no longer double-quote (where $(...) and $VAR
still expand) — single-quote with embedded-' escaping is the
safe form.
- M.load now returns (turns, meta) instead of (meta, turns). turns
is ALWAYS a table on success, never nil-when-no-header; failure
path is the unambiguous (nil, err). Callers can `if not turns
then` without the previous ambiguity. repl.lua :resume updated
to the new shape.
repl.lua :resume:
- Refuse to resume into a non-empty ctx — silent overwrite was the
Q15 default, but the review surfaced the no-undo / no-warning
failure mode. User must :reset (or :save then re-launch) to
express intent. The current session's on-disk log is unaffected
either way.
NITs:
- ffi/libc.lua READ_BUF: comment noting it's module-shared and
Phase 1 has no reentrant readers; revisit when that changes.
- PHASE1.md §7: \C-x\C-c reservation pinned to Phase 3 ("deferred
from Phase 1 — no consumer here") rather than the previous
dangling "(or here)".
Regression suite verifies:
- history.load new signature on success + failure paths
- shell-quoted history.dir with $ doesn't trip
- aish scripted run: ctx with 2 turns refuses :resume anchor with
a clear status; user must :reset first
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 review caught a structural gap: executor.exec only drained the
PTY master fd, never forwarded user keystrokes — vim/less/htop/nano
would render and hang on input. PHASE1.md §5 specified bidirectional
multiplex but only the read leg landed. tcgetattr/tcsetattr were also
missing, so even with input forwarding the parent's line discipline
would buffer until newline (breaking single-key UIs).
ffi/libc:
- struct termios opaque buffer + tcgetattr/tcsetattr + cfmakeraw
- M.set_raw(fd) saves termios + applies cfmakeraw; returns saved or
(nil, err) when fd isn't a tty (scripted / piped-stdin runs)
- M.restore_termios(fd, saved)
- struct pollfd + M.poll (POLLIN constant)
executor:
- multiplex(sess): poll(stdin, master); reads master on any revents
(POLLHUP fires when child closes its slave end, not POLLIN — the
revents != 0 check catches both); forwards stdin keystrokes to
master; loop exits when master read returns 0 (EOF / child gone)
- stdin polling is only enabled when stdin_is_tty (set_raw succeeded);
piped-stdin runs (tests / scripted) would otherwise drain queued
aish commands into the child of the *current* cmd, swallowing them
- raw mode is restored before returning so the user lands back at the
aish prompt in canonical mode
renderer + repl:
- exec_output(out, code) split into exec_begin() (top rule, before
spawn) + exec_end(code) (closing rule with exit, after wait). PTY
multiplex streams the body live to stdout in between; the renderer
never re-prints the body.
PHASE1.md §3:
- tcgetattr/tcsetattr changed from "optional" to "required for
single-key UIs to work — done-criteria #2"; poll added to the libc
row description.
Verified:
- non-interactive smoke (echo / false / exit 7 / ls /nonexistent /
printf multi-line) — all exit codes correct, output streamed live,
a\nb\nc\n preserved byte-for-byte
- scripted-stdin run reaches all expected lines (no stdin draining
into a non-interactive child)
- aish prompt + framed exec block + exit-code line all render in
correct order
Live interactive verification (vim / less / htop in a real terminal)
still needs a user-test pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 readline binding wiring per PHASE1.md §7.
ffi/readline:
M.bind(seq, lua_fn) -> bool
Wraps lua_fn as a C callback (signature `int (int, int)` per
readline's rl_command_func_t) and registers it via
rl_bind_keyseq(seq, cb). Returns true on success (rl returns 0).
Trampolines are pinned in module-local state so they outlive the
bind call — readline retains the function pointer for the process
lifetime. Rebinding the same seq frees the previous trampoline.
Bound handlers are pcall-wrapped so a Lua error doesn't crash
readline's input loop.
repl:
Binds \C-n to a no-op that emits
"[aish] Norris mode not yet implemented (Phase 3)"
Verifies the mechanism end-to-end; Phase 3 (Norris autonomous mode)
replaces the body with the actual toggle.
Smoke covers bind / rebind-same-seq (exercises the :free path) /
bind-different-seq with no errors. Live keyboard verification waits
on user-test.
Phase 1's 8(+1) inner loop is now functionally through `implement`;
next inner phase is `verify` (review pass) followed by memory-update.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 session log integration per PHASE1.md §6.
On every M.run(), open a session file at
<config.history.dir>/sessions/<utc-iso8601>.jsonl
with a meta header (started, model, aish_version). If history.dir is
unset or unwritable, status-log the disable and continue without
persistence.
ask_ai logs the merged user turn (after pending exec output is folded
in) and the assistant turn (after streaming completes). run_shell does
NOT log [exec output] — that becomes part of the next user turn when
ctx.pending_exec_output is flushed.
New meta commands:
:sessions list session files; "*" marks the active one
:save <name> rename current session log to <name>.jsonl (auto-
appends .jsonl); reopens for continued append
:resume <name> load <name>.jsonl into ctx (replaces current turns
via ctx:reset + append loop). The current process's
own session log is unaffected — Phase 1 chooses
per-process logs over chained continuations.
:quit and EOF (Ctrl-D) both close the session file via shutdown_session
before exiting.
HELP text updated (no longer "Phase 0:" header since meta set has
grown). Q15 noted in PHASE1.md §10 (resume into non-empty context) is
resolved by the ctx:reset() in :resume — silent overwrite for Phase 1,
revisit if anyone cares.
End-to-end live verified: chat -> auto-log; :save renames; :sessions
listings; :resume + :history shows the round-trip.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 persistence per PHASE1.md §6.
history.open(path, meta?) -> session | (nil, err)
parent dir auto-created; meta line written iff file is new/empty so
reopening a session doesn't duplicate the header
session:append(turn)
JSON-encoded line, fh:flush after every write (no fsync — Q16
tracks the policy if it ever bites)
session:close()
history.load(path) -> meta, turns | (nil, err)
skips unparseable lines (e.g. partial trailing write from a crash);
distinguishes the meta-header line from role/content turn lines
history.list_sessions(dir) -> [basename, ...]
sorted (ISO 8601 names lex-sort chronologically); no mtime / turn
counts in Phase 1 — that's a Phase 4 :sessions UI concern
Smoke:
- open, append 3 turns, close, list_sessions sees 1 file
- load returns meta (model="fast") and 3 turns in order
- corrupt tail (partial JSON line appended) is silently skipped on load
- reopen with different meta does NOT duplicate the header line
Repl wiring (`:save`, `:resume`, `:sessions`, auto-write on quit) lands
in the next commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
repl.ask_ai now drives broker.chat_stream and pumps each delta into
renderer.assistant_delta(delta) as it arrives. renderer.assistant_flush
is called when the stream ends to add a trailing newline if missing.
The full reassembled response is then handed to executor.extract_cmd_lines
for the CMD: confirm-and-execute path (unchanged from Phase 0).
renderer.assistant() is kept for non-streaming callers (none in tree
right now, but cheap to keep around). assistant_delta/flush share no
state with assistant(); they use a module-local stream_buf that tracks
the in-progress streamed block.
Q12 deferred: incremental CMD: highlighting (cursor-positioning re-
render on flush) is not implemented in Phase 1 — deltas emit raw. The
§6 CMD: marker is still extractable on the reassembled string post-
stream, which is what executor cares about. Renderer's bold+cyan
treatment for CMD: lines stays available via M.assistant().
Broker error / SSE-framed api-error path still pops the user turn and
restores ctx.pending_exec_output. Order: assistant_flush always runs
(even on error) so the cursor lands on a fresh line before the broker-
error status renders.
Live verification: `Count one to ten` against hossenfelder fast streams
deltas through to stdout incrementally; CMD: extraction works on the
reassembled string; confirm gate intact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 streaming consumer per PHASE1.md §3.
broker.chat_stream(model_cfg, messages, on_delta) -> true | (nil, err)
broker.chat(model_cfg, messages) -> content | (nil, err)
(now a thin buffer over
chat_stream)
The HTTP shape unifies on stream:true. on_event from ffi/curl.post_sse
decodes each event's JSON, extracts choices[1].delta.content, and calls
on_delta(content) for non-empty string deltas. The `[DONE]` sentinel is
filtered. SSE-framed error envelopes ({"error":{"message":...}} arriving
as data:) surface as "api: ..." errors.
build_request is factored out so chat_stream and (future) any
non-streaming consumer share URL/body/header construction.
Live verification against hossenfelder fast preset:
- chat_stream("Count one to five..."): 9 incremental deltas streamed
token-by-token, assembled to "1 2 3 4 5"
- chat("Reply with exactly: pong"): "pong" returned via buffer
Error envelope path is correct by inspection but not exercised live —
hossenfelder passes through bogus model names rather than rejecting.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 streaming substrate per PHASE1.md §4.
curl.post_sse(url, body, headers, on_event, timeout_ms)
-> true | (nil, errmsg)
Reuses the Phase 0 WRITEFUNCTION hook. Each chunk delivery accumulates
into a per-request buffer; the buffer is drained for complete events
(\n\n-terminated). Each event's `data: ...` field(s) are joined per the
SSE spec and passed to on_event(data_string) synchronously. `:` comment
lines (keepalives) are filtered.
The `[DONE]` sentinel is passed through to on_event as-is (broker.lua
filters it — this module stays HTTP-layer only, no JSON / OpenAI shape
knowledge).
Two robustness items:
- End-of-stream flush: the final event may lack \n\n if the server
closes-on-EOF immediately after the last data: line (some llama.cpp
builds, plain HTTP/1.0 close-on-EOF feeds). Post-perform, any
remaining buffer is parsed as one last event.
- FAILONERROR: a non-2xx response surfaces as a CURLcode error rather
than silently feeding the error body into the SSE parser.
Smoke:
[1] canned events via nc listener: 3 events parsed in order
[2] chunk-split mid-event ("Hel" + sleep + "lo..."): correctly
reassembled across two WRITEFUNCTION deliveries
[3] LIVE against hossenfelder.fritz.box:8082 fast preset with
stream:true: response "pong" assembled from incremental deltas;
4 raw events (role + 1 content + finish_reason + [DONE])
Next: broker.lua chat_stream that decodes the OpenAI delta shape on
top of this and exposes on_delta(content_string) for renderer streaming.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the Phase 0 io.popen + sentinel-echo exit-code recovery with
forkpty + waitpid via ffi/pty. The §7 amendment paragraph on PHASE0.md
is rewritten to point at PHASE1.md §5 — the workaround is gone, not
just renamed.
User-visible behavioral changes:
- Interactive commands (vim, less, htop, top) now work via $cmd /
:exec / known-command shell paths because the child has a real
PTY for line discipline.
- Exit codes are accurate: `false` -> 1, `exit 7` -> 7, signal kill
-> 128+N (bash convention), shell parse error -> sh's 2.
- Broken-shell-syntax cmd now shows the actual sh diagnostic
(e.g. "Syntax error: end of file unexpected") instead of Phase 0's
"(no output — possible shell parse error)" guess.
- Output normalization: PTY emits CR LF; executor collapses \r\n
-> \n to keep the Phase 0 contract ("output uses \n separators").
Code path:
pty.spawn(cmd) -> drain master_fd until EOF
-> wait() returns ("exit", N) | ("signal", N) | ...
-> exit_code mapped: exit -> N, signal -> 128+N, else -1
Phase 0 invariants intact: `cd` interception unchanged (still libc.chdir
per §3 + §7), `CMD: ` extraction unchanged.
PHASE0.md §7: the "LuaJIT 2.1 popen-close caveat" paragraph is rewritten
to "Superseded by Phase 1" — points at PHASE1.md §5 for the live model.
The illustrative sketch is left in place as historical context.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 PTY substrate per PHASE1.md §5. Replaces Phase 0's io.popen
sentinel-echo path with a real PTY so interactive cmds (vim, less,
htop) work and exit-status comes from waitpid instead of parsing a
sentinel out of stdout.
API:
pty.spawn(cmd) -> session | (nil, err)
session:read(count) -> (data, n) ; n == 0 means EOF
session:write(data) -> bytes
session:close() ; closes master_fd; child gets SIGHUP
session:wait(options) -> (kind, val) ; "exit"/"signal"/"other"/nil
session:signal(sig) -> ok ; kill(pid, sig)
Child branch execs `/bin/sh -c cmd`, preserving Phase 0's shell-
interpretation semantics (quoting, redirection, pipes still work).
The PTY makes vim/less/htop functional because the child gets a real
tty for line discipline instead of a pipe.
Loader uses the versioned-soname fallback idiom (util / util.so.1 /
util.so.0) so a runtime-only host without libutil-dev works.
Smoke covers: echo hello (exit 0), false (1), exit 7, bogus binary
(sh's 127), multi-line printf, cat bidirectional (write ping -> read
echo+cat output -> close master -> child exits via SIGHUP).
Next: executor.lua swap from popen+sentinel to pty.spawn. That commit
also retires the §7 amendment paragraph (no longer needed once popen
is gone).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends Phase 0's chdir/errno/strerror with the syscalls that ffi/pty
needs to drive a forkpty'd child: waitpid (with WIFEXITED / WEXITSTATUS
/ WIFSIGNALED / WTERMSIG decoders), read, write, close, kill.
Status-word macros are reproduced from glibc bits/waitstatus.h using
the LuaJIT `bit` library. M.waitpid returns a structured (kind, value)
rather than the raw status word — callers don't have to know the
encoding:
"exit", N — normal exit, N is exit code
"signal", N — killed by signal N
"other", raw — stopped/continued (Phase 1 doesn't trace those)
nil, err — syscall failure
M.read / M.write / M.close / M.kill mirror their syscall return shape
with errno-string surfacing on failure. Read uses a shared 4 KiB
buffer for the common case; larger reads allocate a fresh buffer.
Smoke covers the chdir regression (still works), all four status
decoders against known status words, pipe round-trip for read/write/
close, EOF -> ("", 0), invalid-fd close -> false, kill(self, 0)
success, kill(bogus, 0) failure.
waitpid is not exercised by the smoke (needs a real child); that
arrives with ffi/pty.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-test surfaced the bug: with `deep` (mistral-nemo-12b) active,
running `list files` -> y on `CMD: ls` -> `Are there directory entries
beginning with "lor"?` returned a Jinja exception:
api: ... Error: Jinja Exception: After the optional system message,
conversation roles must alternate user/assistant/user/assistant/...
Cause: §6 specified "exec output injected into context uses role 'user'
with a prefix tag '[exec output]'." This works for permissive templates
(qwen2.5-coder-1.5b, the `fast` preset) but produces a back-to-back
user/user pair on strict templates that enforce the OpenAI alternation
contract — `[exec output]` user turn followed by the user's actual
follow-up question.
Fix:
context.lua:
- new field `pending_exec_output` (initially nil)
- new method `:append_exec_output(out)` buffers (concat on subsequent
captures so multi-shell-then-ai still merges everything)
- new method `:append_user(content)` flushes buffered exec output as
a `[exec output]\n...\n\n` prefix and appends a user turn
- `:reset()` also clears the buffer
repl.lua:
- run_shell calls ctx:append_exec_output(out) instead of
ctx:append({role="user", content="[exec output]\n"..out})
- ask_ai calls ctx:append_user(text) instead of raw :append; saves
prev_pending so a broker error can restore the buffer for retry
PHASE0.md §6:
- amended the role-injection paragraph to describe the buffer-and-
prepend policy; the §3 invariants list is untouched (this was a §6
design detail, not a locked invariant)
Verification:
- context unit tests cover: alternation after the failing sequence,
multi-shell merge, reset clears buffer, broker-error retry path
- live reproduction against `deep` (mistral-nemo) of the exact
user-reported sequence succeeds; model responds with a sensible
`CMD: ls | grep '^lor'` instead of a Jinja exception
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves issue #12 by partial-accept of the recommendation.
What landed:
- Single broker URL: http://hossenfelder.fritz.box:8082 for all three
presets (fast / deep / cloud). Server-side model-aware routing; no
client-side cloud auth (proxy holds the OpenRouter bearer).
- Models from hossenfelder's /v1/models inventory:
fast -> qwen2.5-coder-1.5b-q4_k_m.gguf (boltzmann local)
deep -> mistral-nemo-12b-instruct (boltzmann local)
cloud -> anthropic/claude-haiku-4.5 (OpenRouter route)
- `cloud` was already pointing at hossenfelder but with https://; flipped
to http:// so it matches the proxy's actual scheme.
What deferred:
- Schema rename `models` -> `brokers` (and the 5-cloud-preset shape
suggested in #12) — would touch repl.lua + broker.lua. Not blocking
Phase 7. If multi-preset becomes useful in practice, file a separate
issue for the rename then.
Phase 7 verification (live broker test):
- broker.chat(fast, [user="say pong"]) -> "CMD: echo pong" in ~3s
- multi-turn arithmetic (7*8=56, *2=112) preserved across turns
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three additions to PHASE0.md, all surfaced by the Phase 5 review of
the Phase 0 implementation. No invariant changes; manifest now matches
implementation reality.
§3 — FFI loader fallback paragraph. ffi.load("name") needs the
unversioned `libname.so` symlink that comes with the -dev package.
Phase 0 loaders try unversioned first then versioned sonames so
runtime-only hosts (no -dev) work as-is. Documents the actual
behavior in ffi/readline.lua and ffi/curl.lua.
§7 — LuaJIT 2.1 popen-close caveat paragraph. The §7 sketch had been
showing Lua 5.2's three-return io.popen():close() shape; LuaJIT 2.1
follows the Lua 5.1 ABI and returns just `true`. Phase 0 recovers
the exit status with a sentinel echo (`echo __AISH_EXIT_<tag>__$?`).
Phase 1 PTY+waitpid replaces the hack and the sketch becomes
accurate. Sketch left as-is (it's the right shape conceptually);
caveat now explicit.
§10 — cwd-relative package.path note. Phase 0 prepends `./?.lua;
./vendor/?.lua`, so aish must run from the repo root. Cwd-independent
resolution is a later concern. Also clarifies that --config is strict
(no fallback if the path is unopenable) — matches main.lua post the
review-followup commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses three concerns + one nit from the Phase 0 review pass.
executor.lua:
- M.exec guards empty / whitespace-only cmd up front, returns
"(empty command)" / -1 instead of running the wrapper on nothing.
- On sentinel-parse failure with empty output (typical of shell
parse errors — the syntax error itself escapes to the popen
parent's stderr because 2>&1 is inside the unparsable subshell),
surface "(no output — possible shell parse error)" rather than
a silent empty frame.
- extract_cmd_lines now skips whitespace-only / empty bodies; a
bare `CMD: ` line in assistant output no longer turns into an
"execute ''? [y/N]" prompt.
- "what" comments cleaned in maybe_chdir.
router.lua:
- path_like now matches `~` and `~/foo` so `~/scripts/build.sh`
classifies as shell (was: ai). Restores symmetry with executor's
maybe_chdir, which already expands `~` on `cd`.
repl.lua:
- :exec and :ask trim args and renderer.status a usage line on
empty rather than running an empty cmd / sending an empty turn
to broker.
Regression: full prior smoke suite still passes — known_commands
shell paths, all maybe_chdir branches, CMD: extraction with non-empty
bodies, exec exit-code recovery, all router branches.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 entry point per PHASE0.md §4, §10.
Resolves the §10 config search:
--config <path> (explicit; failure if not openable, no fallback)
$AISH_CONFIG
~/.config/aish/config.lua
./config.lua
The explicit form now hard-fails instead of silently falling through to
the next candidate — caught in smoke (`--config /nonexistent` was loading
./config.lua).
Pre-pends `./?.lua;./vendor/?.lua` to package.path so `require("dkjson")`
finds vendor/dkjson.lua and project requires resolve from the repo root.
Run from the repo root; cwd-independent resolution lands later.
`--help` prints the usage block. Unrecognized arg exits 2 with a
diagnostic on stderr.
Phase 0 done-criteria (PHASE0.md §2):
✓ shell command execution with framed output
✓ :meta commands (full §5.2 set)
✓ in-memory conversation history with sliding-window eviction
✓ codebase layout matches §4 — every module name stable for Phase 1+
⏳ live AI exchange — structurally wired; live test deferred per
issue #12 (broker endpoint hostname not resolvable from noether)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 implementation per PHASE0.md §5, §9.
Wires the lower-half modules into a single REPL:
ffi/readline -> input + history
router -> classify(line) -> meta/shell/ai
executor -> run_shell with cd interception, frame output, capture
broker -> ask_ai, then extract+confirm CMD: lines from response
context -> turn list + eviction; status line on evict
renderer -> assistant text + exec frame + status
Prompt format `[aish:<model>]> ` per §9.
Meta commands all wired (§5.2): :quit/:q, :clear, :reset, :model <name>,
:models, :history, :exec <cmd>, :ask <text>, :help. Unknown meta names
report via renderer.status rather than crashing.
End-of-input (Ctrl-D on empty line) breaks the loop cleanly. Empty /
whitespace-only lines are skipped silently before dispatch — router
would otherwise classify them as ai with empty payload and pollute
context.
`CMD: ` extraction + confirm-and-execute is wired: when broker returns
an assistant turn, the response is scanned for §6 CMD: lines; each is
prompted via readline ("execute '...'? [y/N]") when config.shell
.confirm_cmd is true (default), else auto-executed.
On broker error, the user turn just appended is popped so the context
isn't polluted with a turn that has no assistant response.
Smoke covers :help, :models, shell exec via known_commands allowlist,
and Ctrl-D break. Live broker exchange deferred per issue #12.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 minimal output formatting per PHASE0.md skeleton.
M.assistant(text) — line-by-line; `CMD: ` lines bold+cyan
M.exec_output(output, code) — top/bottom rules; exit code on closing
rule (red on non-zero)
M.status(line) — dim "[aish] ..." single-liner
ANSI table is local to the module (no external dep). Trailing-sentinel
pattern ((text..\"\\n\"):gmatch(\"([^\\n]*)\\n\")) preserves blank lines
in assistant output rather than squashing them, at the cost of one
extra trailing newline — acceptable for Phase 0. Real syntax-aware
formatting (tree-sitter) lands in Phase 6.
Smoke verifies escape codes are emitted (od -c shows \\033[1m\\033[36m
around CMD: line) and the visual layout looks right.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 implementation per PHASE0.md §6.
M.chat(model_cfg, messages) -> content_string | (nil, errmsg)
Builds the OpenAI-compat JSON body:
{ model, messages, stream: false, temperature: model_cfg.temperature ?? 0.2 }
Sends Content-Type and (optionally) Authorization Bearer pulled from
model_cfg.key_env's process environment. Default timeout 60s; overridable
per-model via model_cfg.timeout_ms.
Error surfaces split:
"transport: ..." curl-side (TCP/TLS/timeout)
"decode: ..." non-JSON response body
"api: ..." OpenAI-style { error: { message } } envelope
"broker.chat: no choices[1].message.content..." shape miss
Tested against four canned mock responses (nc -lN listener feeding
HTTP/1.0 + Connection: close so EOF terminates the body): happy path,
api error envelope, raw-text non-JSON, empty choices[]. The on-wire
request body verified as well: POST path, headers, model/messages/
temperature/stream JSON.
Live test against a real llama.cpp/hossenfelder endpoint deferred per
issue #12 (broker endpoint configuration).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 implementation per PHASE0.md §5.
Pure function. Three kinds:
"meta" — line starts with ":", payload is the rest
"shell" — line starts with "$" (override, $ stripped), OR first word
is in config.shell.known_commands, OR first word is
path-like (`./`, `../`, `/`)
"ai" — everything else (including empty / whitespace-only; the
repl loop skips empty payloads before dispatching)
Path-like detection is deliberately conservative in Phase 0: anchored
prefixes only, no quoted-path or shell-glob handling. Q4 in §13 tracks
multi-command CMD: blocks; this router doesn't see those (it only
classifies user input lines, not assistant output).
Smoke covers all branches plus a nil-config fallthrough.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 implementation per PHASE0.md §6, §7.
M.exec(cmd) -> (output, exit_code)
M.maybe_chdir(cmd) -> nil | true | false, errmsg
M.extract_cmd_lines(text)-> { "ls -la", "echo hi", ... }
Two non-obvious bits:
1. LuaJIT 2.1's io.popen():close() follows the Lua 5.1 ABI and returns
only `true` — no child exit status. The §7 manifest sketch assumes
Lua 5.2's three-return form, which doesn't apply here. Recover the
exit code by appending `; echo __AISH_EXIT_<tag>__$?` after the
command and parsing the sentinel-prefixed integer back out. Phase 1
replaces this with waitpid via libc FFI when PTY support lands.
2. `cd` interception is a §3 invariant: must not delegate to popen
(popen forks; a child cd evaporates). maybe_chdir parses the line,
~ expands, calls libc.chdir, returns success/failure separate from
"not a cd" (nil) so the caller can distinguish.
CMD: extraction is anchored at start-of-line per the §3 "exact prefix,
single space" invariant — leading whitespace before CMD: does not match.
Smoke covers: echo capture (code=0), failed ls (code!=0), `false`
(code=1), multi-line output preserved, all maybe_chdir branches
(non-cd / bare / explicit / ~ expansion / failure), CMD extraction
including the leading-whitespace-rejection case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 implementation per PHASE0.md §6, §8.
Context.new(opts) constructs with the §6 default system prompt (the
`CMD: ` extraction contract is hard-coded in there per §3 — locked
substrate, do not edit). opts overrides: system_prompt, max_turns
(default 40), token_budget (default 4096; visibility only in Phase 0
per Q1, deferred to Phase 3 for accurate tokenization).
API:
ctx:append({role, content}) record a turn
ctx:to_messages() [{system,...}, ...turns] for broker.chat
ctx:enforce_budget() evict pairs (user+assistant) until
#turns <= max_turns; returns count
ctx:estimate_tokens() char/4 heuristic
ctx:reset() drop all turns (system_prompt kept)
System prompt is the §6 phrasing verbatim including the `CMD: ` clause
— stored on the context, NOT in self.turns, so it is prepended freshly
on every to_messages() call.
Smoke covers basic ops, no-evict-at-max, evict-on-overflow, bulk
eviction (14 turns -> 4), reset.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 binding per PHASE0.md §6. M.post(url, body, headers, timeout_ms)
uses CURLOPT_{URL, POST, POSTFIELDS, HTTPHEADER, WRITEFUNCTION, NOSIGNAL,
TIMEOUT_MS, USERAGENT} on a fresh easy handle, capturing the response
into a Lua string via a closure-based WRITEFUNCTION callback.
curl_easy_setopt is variadic; LuaJIT's variadic FFI dispatch needs
ffi.new() per argument otherwise. Pre-cast to three concrete signatures
(long / void* / const char*) bypasses that — cleaner and matches the
lua-curl idiom.
Robust loader: tries `curl`, `curl.so.4`, `curl-gnutls.so.4` so a
runtime-only host (no libcurl-dev installed) just works. Same idiom
as ffi/readline.
Smoke against a local nc listener: request was correctly framed
(POST path, Content-Type + X-Test headers, Content-Length matches
JSON body length) and the canned response was captured into the
returned Lua string.
SSE streaming for Phase 1 reuses this same WRITEFUNCTION hook —
chunks arrive incrementally, the closure consumes them as they come.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 0 binding per PHASE0.md §9. M.readline(prompt) returns the line
as a Lua string (the C buffer is freed via libc free immediately after
ffi.string copies it) or nil on EOF. M.add_history skips empty lines.
Loader handles the case where libreadline-dev's unversioned
`libreadline.so` symlink isn't installed — falls through to
`readline.so.8` (current Debian/Arch ALARM) and `.so.7` (older)
before giving up. This trips on noether-the-LXD: only the runtime
package is present.
Smoke (stdin from heredoc, two lines + EOF):
p1> hello world -> "hello world"
p2> second line -> "second line"
p3> -> nil (EOF)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smallest Phase 0 module per CLAUDE.md §4 implementation order.
M.chdir(path) returns (true) or (false, errmsg) — errmsg via
strerror(__errno_location()[0]). Glibc errno is thread-local
behind __errno_location() rather than a plain global, hence the
indirect access.
Verified against PHASE0.md §7 expectation: a libc.chdir() persists
across subsequent io.popen() calls (popen's child inherits the
parent's wd), which is the property executor.lua relies on for `cd`
interception. Smoke:
libc.chdir("/tmp"); io.popen("pwd"):read("*l") --> /tmp
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the JSON-library decision noted as open in CLAUDE.md §6.
dkjson is pure Lua (preserves §3's "no compiled extensions" invariant),
single file, redistributable (MIT/X11). Sourced from Debian's `lua-dkjson`
package (/usr/share/lua/5.1/dkjson.lua, version 2.8) — Debian's curated
copy of the upstream at dkolf.de.
Vendoring (rather than relying on a system lua-dkjson install) keeps
aish self-contained per the §3 "no luarocks packages" invariant: any
host with luajit can run the tree as-is.
PHASE0.md §3 grows one row recording the choice.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures two carve-outs to aish's "non-PR-flow repo" default:
- Feature requests and bugs go to git.reauktion.de/marfrit/aish/issues
rather than direct-implement-in-band. Tag `architecture` for cross-
phase concerns. Aligns with the fleet-wide bug-filing convention from
the `his` cheatsheet; this row extends it to features for aish.
- Review-required iteration opens a PR (authored as claude-<host>,
marfrit reviews, self-approval forbidden). PR #1 was the precedent.
Both are opt-in; direct-to-main remains the default for autonomous
work that doesn't need a feedback loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures three findings from the review of 013c625 ("phase0 amendment:
insert MCP phase 2"). Opening as a PR rather than direct-to-main: the
non-PR-flow convention works fine for autonomous work, but feedback-
required iteration needs a readable medium that isn't the Claude Code
transcript.
§11 phase 2 row: spell out two scope items the original row left implicit —
the system-prompt rewrite to declare the tools schema (Phase 0's `CMD:`
contract is hard-coded into the prompt) and `safety.lua` extension to
gate tool calls (per Q8).
§13 Q6: explicit note that choosing "retire `CMD:`" requires a §3
invariant amendment in the same commit — keeps the substrate-vs-phase
boundary honest. Adds (§3 if retiring) to the impact column.
§13 Q9 (new): MCP system-prompt augmentation locus — static block in
broker.lua / per-request assembly from connected servers / hybrid.
Real architectural call with token-cost tradeoff per option.
§13 Q10 (new): tool-call streaming vs the Phase 1 SSE substrate —
phase-ordering question. Either Phase 2 lands on the blocking Phase 0
broker and refits when SSE arrives, or Phase 1 SSE moves before MCP
so tool-call deltas stream from day one.
MCP/tool-calling lands as a distinct phase, before Norris mode so the
autonomous planner has tools as substrate. lmcp speaks MCP standard
JSON-RPC 2.0 over HTTP/SSE — fits the existing libcurl FFI plan; tool
calls ride the OpenAI-compatible `tools` field on /v1/chat/completions,
so the §6 broker contract is unchanged at the transport level.
§8: tokenization concern bumped Phase 2 → Phase 3 (still tracks Norris).
§11: Norris→3, memory→4, routing→5, tree-sitter→6.
§13: Q1/Q2/Q3/Q5 phase numbers tracked the renumber; added Q6 (CMD: vs
tools coexistence), Q7 (server discovery), Q8 (tool-call auth gate).
No §3 invariant broken. No code touched — Phase 0 implementation per
the locked manifest is still the next move.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README is now self-contained for a human reader landing on the repo cold:
project value-prop, status, quick-orientation reading order, directory
layout, build/runtime deps, run + config invocation, and a pointer to
CLAUDE.md for contribution norms.
CLAUDE.md is rewritten as the substrate a fresh Claude session needs to
pick up Phase 0→1 implementation without prior conversation context:
- Reading order (PHASE0.md → README → config.lua)
- Phase-loop discipline (8+1 with loopbacks)
- Eight invariants from PHASE0.md called out as non-negotiable without
manifest amendment
- Bottom-up implementation order for Phase 0 (libc → readline → curl →
context → executor → router → broker → renderer → repl → main)
- Testing approach without a test framework
- Open question on JSON library (dkjson recommended; needs §3 amendment)
- Ambiguity handling pattern (ask vs log-in-§13 vs stop-and-ask)
- Commit style + Co-Authored-By trailer template
- Model-class caveat: small Q4 coder models have output variance, validate
before exec, confirm_cmd defaults exist for this reason
- Push credential note for sessions without ssh-keys-on-Gitea
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README, .gitignore, CLAUDE.md (project conventions)
- docs/PHASE0.md — full Phase 0 manifest (locked substrate)
- 10 root .lua modules + 4 ffi/ bindings, all stubs raising NotImplemented
with module-scoped responsibilities matching the manifest
- config.lua wired to current dirac/hossenfelder endpoints (qwen-coder-7b
snappy/32k + cloud via OpenRouter through hossenfelder)
File names match docs/PHASE0.md §4 exactly. Module bodies fill in across
later phases; the tree shape is locked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>