Phase 8 formulate manifest + PHASE0 §11 amendment to add the Phase 8
row (substrate amendment per CLAUDE.md §3 lands same commit).
Four pillars:
1. Per-endpoint /tokenize probe (cached). One round-trip on first
call per (endpoint, model); capability cached for session.
hossenfelder + llama.cpp expose <endpoint>/tokenize (NOT /v1/
tokenize — per real probe; the path is endpoint-local, not
under the OpenAI /v1 prefix). Cloud (OpenRouter) 404s — silent
char/4 fallback.
2. broker.token_count(model_cfg, text) — thin wrapper; tries probe,
falls back to char/4 on miss. Always returns non-negative int;
never errors. 2s tight timeout; failures cache as not-supported.
3. Context:estimate_tokens widened. Accepts optional tokenize_fn at
Context.new; uses it when present, char/4 otherwise. repl.lua
wires `tokenize_fn = function(text) return broker.token_count(
active_cfg, text) end` when cfg.tokenize.use_endpoint = true.
Per-turn _tokens cache to amortize across estimate calls.
4. :cost detail est-vs-actual annotation. When the heuristic
disagrees with the actual prompt_tokens from broker usage by
>10%, show `~est=N`. Silent otherwise. Display-only; no
behavior change.
Resolves Q1 (PHASE0 §13, originally Phase 3) — replace char/4
heuristic on Context:estimate_tokens. Originally targeted at Phase 3
but deferred forward each iteration; now lands.
Baseline already observed during formulate:
- /v1/tokenize -> 404 on hossenfelder; /tokenize -> works
- Body shape: {content: "..."} returns {tokens: [N1, N2, ...]}
- Accuracy gap: char/4 UNDERESTIMATES by ~10% on real code/prose
(508 vs 558 on a 2KB README sample). Material for context-
budget eviction decisions.
Doc covers scope + done-when, tech decisions table, module changes,
per-pillar deep dives, UX surface, out of scope, 6 risk rows, 6
open questions (Q-T4/T5 baseline-bound, others analyze-bound).
Scope confirmed via AskUserQuestion: tokenization (chosen over
cross-session cost persistence and hard rate-limit enforcement).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 7 formulate manifest + PHASE0 §11 amendment to add the Phase 7
row (substrate amendment per CLAUDE.md §3, lands in the same commit).
Four pillars:
1. Usage capture in broker.chat_stream — extract `usage` from the
final SSE chunk (OpenAI streaming spec with `stream_options:
{include_usage: true}`). Surface via new on_delta("usage",
payload) kind. broker.chat returns (text, usage) — backward-
compat: existing callers ignore the second value.
2. Per-session accumulator on ctx — ctx.usage_totals[model][category]
tables (categories: main / delegate / summarize / memory_summarize
/ probe / norris, tagged at the call site via opts.category).
:reset preserves usage_totals (R8 parity with memory_items /
project). Session JSONL gains an optional `usage` field on
assistant turns for after-the-fact analysis.
3. :cost meta surface — :cost (summary), :cost detail (per-model +
per-category breakdown), :cost reset (zero the meter). Pure-Lua
read of ctx.usage_totals; no broker calls.
4. Optional warn thresholds — cfg.cost.warn_at_dollars /
warn_at_tokens emit a one-shot status when crossed. Default off;
useful with cloud presets configured.
Doc covers scope + done-when criteria, tech decisions table, module
changes, per-pillar deep dive with code sketches, UX surface, out of
scope, risks, 6 open questions to resolve in analyze.
Open at formulate:
Q-C1 — provider-without-usage handling (local llama.cpp probably)
Q-C2 — cross-session persistence (defer to phase 8)
Q-C3 — categories closed-set vs free-form
Q-C4 — does hossenfelder forward stream_options to all backends?
Q-C5 — warn fires on the call that crosses, or the next one?
Q-C6 — :reset clears cost_warn_fired too, or only :cost reset?
Scope confirmed via AskUserQuestion: cost/usage observability
(chosen over project-local config overlay and session search/tag).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First commit of Phase 2 per docs/PHASE2.md §12. Three changes bundled:
mcp.lua (new, 153 lines):
- M.connect(url, opts) returns a Session.
- Session:initialize() round-trips initialize + notifications/initialized
+ tools/list. Caches tools for session lifetime (lmcp announces
capabilities.tools.listChanged = false; no refetch).
- Session:list_tools() returns the cached tool list.
- Session:call_tool(name, args) returns (result_table, kind) where
kind ∈ {"ok", "handler_error", "rpc_error", "transport_error"} per
the §4 error split. Folded HTTP-level failure into transport_error.
- Per-server Bearer auth via opts.auth_token or opts.auth_env env-var
indirection.
- Captures protocolVersion mismatch as a warning string rather than
aborting (lmcp doesn't negotiate — N3 in review).
ffi/curl.lua extension:
- Add curl_easy_getinfo to ffi.cdef.
- Pre-cast as getinfo_long; helper get_response_code() fetches
CURLINFO_RESPONSE_CODE (decimal 2097154 = CURLINFOTYPE_LONG | 2).
- M.post now returns (body, status_code) on transport success;
(nil, errmsg) on libcurl failure stays unchanged. Phase 1 callers
reading only the first slot are unaffected.
docs/PHASE0.md §4:
- Insert `mcp.lua` between broker.lua and router.lua per PHASE2.md §9.
- Module-stability invariant clarified: rename prohibition is what
matters; adding new files is additive.
Smoke-test passes for all four kinds against boltzmann lmcp v0.5.4:
- initialize: ok (7 tools cached)
- list_dir /tmp: ok (1.2KB content)
- read_file /nonexistent: ok (boltzmann's baseline §3 quirk —
isError:false even on failure; content is authoritative)
- nope_tool: rpc_error (code=-32601)
- wrong auth: transport_error (HTTP 401)
- unreachable host: transport_error (DNS failure)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the Phase 0 io.popen + sentinel-echo exit-code recovery with
forkpty + waitpid via ffi/pty. The §7 amendment paragraph on PHASE0.md
is rewritten to point at PHASE1.md §5 — the workaround is gone, not
just renamed.
User-visible behavioral changes:
- Interactive commands (vim, less, htop, top) now work via $cmd /
:exec / known-command shell paths because the child has a real
PTY for line discipline.
- Exit codes are accurate: `false` -> 1, `exit 7` -> 7, signal kill
-> 128+N (bash convention), shell parse error -> sh's 2.
- Broken-shell-syntax cmd now shows the actual sh diagnostic
(e.g. "Syntax error: end of file unexpected") instead of Phase 0's
"(no output — possible shell parse error)" guess.
- Output normalization: PTY emits CR LF; executor collapses \r\n
-> \n to keep the Phase 0 contract ("output uses \n separators").
Code path:
pty.spawn(cmd) -> drain master_fd until EOF
-> wait() returns ("exit", N) | ("signal", N) | ...
-> exit_code mapped: exit -> N, signal -> 128+N, else -1
Phase 0 invariants intact: `cd` interception unchanged (still libc.chdir
per §3 + §7), `CMD: ` extraction unchanged.
PHASE0.md §7: the "LuaJIT 2.1 popen-close caveat" paragraph is rewritten
to "Superseded by Phase 1" — points at PHASE1.md §5 for the live model.
The illustrative sketch is left in place as historical context.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-test surfaced the bug: with `deep` (mistral-nemo-12b) active,
running `list files` -> y on `CMD: ls` -> `Are there directory entries
beginning with "lor"?` returned a Jinja exception:
api: ... Error: Jinja Exception: After the optional system message,
conversation roles must alternate user/assistant/user/assistant/...
Cause: §6 specified "exec output injected into context uses role 'user'
with a prefix tag '[exec output]'." This works for permissive templates
(qwen2.5-coder-1.5b, the `fast` preset) but produces a back-to-back
user/user pair on strict templates that enforce the OpenAI alternation
contract — `[exec output]` user turn followed by the user's actual
follow-up question.
Fix:
context.lua:
- new field `pending_exec_output` (initially nil)
- new method `:append_exec_output(out)` buffers (concat on subsequent
captures so multi-shell-then-ai still merges everything)
- new method `:append_user(content)` flushes buffered exec output as
a `[exec output]\n...\n\n` prefix and appends a user turn
- `:reset()` also clears the buffer
repl.lua:
- run_shell calls ctx:append_exec_output(out) instead of
ctx:append({role="user", content="[exec output]\n"..out})
- ask_ai calls ctx:append_user(text) instead of raw :append; saves
prev_pending so a broker error can restore the buffer for retry
PHASE0.md §6:
- amended the role-injection paragraph to describe the buffer-and-
prepend policy; the §3 invariants list is untouched (this was a §6
design detail, not a locked invariant)
Verification:
- context unit tests cover: alternation after the failing sequence,
multi-shell merge, reset clears buffer, broker-error retry path
- live reproduction against `deep` (mistral-nemo) of the exact
user-reported sequence succeeds; model responds with a sensible
`CMD: ls | grep '^lor'` instead of a Jinja exception
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three additions to PHASE0.md, all surfaced by the Phase 5 review of
the Phase 0 implementation. No invariant changes; manifest now matches
implementation reality.
§3 — FFI loader fallback paragraph. ffi.load("name") needs the
unversioned `libname.so` symlink that comes with the -dev package.
Phase 0 loaders try unversioned first then versioned sonames so
runtime-only hosts (no -dev) work as-is. Documents the actual
behavior in ffi/readline.lua and ffi/curl.lua.
§7 — LuaJIT 2.1 popen-close caveat paragraph. The §7 sketch had been
showing Lua 5.2's three-return io.popen():close() shape; LuaJIT 2.1
follows the Lua 5.1 ABI and returns just `true`. Phase 0 recovers
the exit status with a sentinel echo (`echo __AISH_EXIT_<tag>__$?`).
Phase 1 PTY+waitpid replaces the hack and the sketch becomes
accurate. Sketch left as-is (it's the right shape conceptually);
caveat now explicit.
§10 — cwd-relative package.path note. Phase 0 prepends `./?.lua;
./vendor/?.lua`, so aish must run from the repo root. Cwd-independent
resolution is a later concern. Also clarifies that --config is strict
(no fallback if the path is unopenable) — matches main.lua post the
review-followup commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the JSON-library decision noted as open in CLAUDE.md §6.
dkjson is pure Lua (preserves §3's "no compiled extensions" invariant),
single file, redistributable (MIT/X11). Sourced from Debian's `lua-dkjson`
package (/usr/share/lua/5.1/dkjson.lua, version 2.8) — Debian's curated
copy of the upstream at dkolf.de.
Vendoring (rather than relying on a system lua-dkjson install) keeps
aish self-contained per the §3 "no luarocks packages" invariant: any
host with luajit can run the tree as-is.
PHASE0.md §3 grows one row recording the choice.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures three findings from the review of 013c625 ("phase0 amendment:
insert MCP phase 2"). Opening as a PR rather than direct-to-main: the
non-PR-flow convention works fine for autonomous work, but feedback-
required iteration needs a readable medium that isn't the Claude Code
transcript.
§11 phase 2 row: spell out two scope items the original row left implicit —
the system-prompt rewrite to declare the tools schema (Phase 0's `CMD:`
contract is hard-coded into the prompt) and `safety.lua` extension to
gate tool calls (per Q8).
§13 Q6: explicit note that choosing "retire `CMD:`" requires a §3
invariant amendment in the same commit — keeps the substrate-vs-phase
boundary honest. Adds (§3 if retiring) to the impact column.
§13 Q9 (new): MCP system-prompt augmentation locus — static block in
broker.lua / per-request assembly from connected servers / hybrid.
Real architectural call with token-cost tradeoff per option.
§13 Q10 (new): tool-call streaming vs the Phase 1 SSE substrate —
phase-ordering question. Either Phase 2 lands on the blocking Phase 0
broker and refits when SSE arrives, or Phase 1 SSE moves before MCP
so tool-call deltas stream from day one.
MCP/tool-calling lands as a distinct phase, before Norris mode so the
autonomous planner has tools as substrate. lmcp speaks MCP standard
JSON-RPC 2.0 over HTTP/SSE — fits the existing libcurl FFI plan; tool
calls ride the OpenAI-compatible `tools` field on /v1/chat/completions,
so the §6 broker contract is unchanged at the transport level.
§8: tokenization concern bumped Phase 2 → Phase 3 (still tracks Norris).
§11: Norris→3, memory→4, routing→5, tree-sitter→6.
§13: Q1/Q2/Q3/Q5 phase numbers tracked the renumber; added Q6 (CMD: vs
tools coexistence), Q7 (server discovery), Q8 (tool-call auth gate).
No §3 invariant broken. No code touched — Phase 0 implementation per
the locked manifest is still the next move.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README, .gitignore, CLAUDE.md (project conventions)
- docs/PHASE0.md — full Phase 0 manifest (locked substrate)
- 10 root .lua modules + 4 ffi/ bindings, all stubs raising NotImplemented
with module-scoped responsibilities matching the manifest
- config.lua wired to current dirac/hossenfelder endpoints (qwen-coder-7b
snappy/32k + cloud via OpenRouter through hossenfelder)
File names match docs/PHASE0.md §4 exactly. Module bodies fill in across
later phases; the tree shape is locked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>