From 539408f480964026e5f3950c03927890a828b889 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sun, 10 May 2026 18:56:20 +0000 Subject: [PATCH] phase1 formulate: scope, tech decisions, module changes, open questions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Inner-loop Phase 1 (formulate) deliverable for the milestone Phase 1 of the aish project. Drafts docs/PHASE1.md to specify what lands on top of the Phase 0 substrate — no code changes, no §3 invariant amendments. Phase 1 milestone scope per PHASE0.md §11: 1. SSE streaming via libcurl FFI (existing WRITEFUNCTION hook) 2. PTY-backed exec via forkpty(3); replaces popen + retires the §7 sentinel exit-code workaround in favor of waitpid 3. Session persistence as append-only JSONL under /sessions/.jsonl 4. Readline custom bindings (rl_bind_keyseq); Phase 1 reserves \C-n as a no-op for Phase 3's Norris consumer Module growth (no new file names beyond the §4-stubs): ffi/curl -> M.post_sse(url, body, headers, on_event) ffi/pty -> M.spawn / read / write / close / wait ffi/libc -> waitpid + WEXITSTATUS + tcgetattr/tcsetattr ffi/readline -> M.bind(seq, fn) broker -> M.chat_stream; M.chat becomes a buffering wrapper executor -> PTY path; sentinel hack deleted repl -> :save, :resume , :sessions; streaming render renderer -> assistant_delta + assistant_flush history -> open / append / load / list_sessions Open questions Q11–Q16 (six new) tracked in §10: - SSE shape uniformity across OpenRouter routes (Q11, Phase 7) - CMD: highlight-on-stream strategy (Q12, plan phase) - tty raw-mode recovery on Lua error (Q13, plan phase) - bind \C-n now or defer to Phase 3 (Q14, plan phase) - :resume into non-empty context (Q15, plan phase) - session-log fsync policy (Q16, default close-only; tracked) Next inner phase is "analyze": for each module change, identify dependencies + risks + per-commit ordering. Then baseline (capture Phase 0 behaviors we want to preserve), plan, review, implement, verify, memory-update. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/PHASE1.md | 271 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 271 insertions(+) create mode 100644 docs/PHASE1.md diff --git a/docs/PHASE1.md b/docs/PHASE1.md new file mode 100644 index 0000000..153dfbb --- /dev/null +++ b/docs/PHASE1.md @@ -0,0 +1,271 @@ +# aish — Phase 1 Manifest + +**Project:** aish — AI-augmented conversational shell +**Document:** Phase 1 Requirements, Architecture & Design Decisions +**Status:** Formulate (pre-analysis) +**Date:** 2026-05-10 + +PHASE0.md is the locked substrate. This manifest specifies what Phase 1 +adds on top. Section numbers reference back to PHASE0.md when relevant. + +--- + +## 1. Scope of Phase 1 + +Four pillars per PHASE0.md §11: + +1. **SSE streaming** — assistant text arrives incrementally instead of as + a complete block at end of `curl_easy_perform`. Reuses the Phase 0 + WRITEFUNCTION hook in `ffi/curl.lua`. +2. **PTY-backed exec** via `forkpty` (libc FFI). Replaces Phase 0's + `io.popen` so interactive commands (`vim`, `less`, `htop`) work and so + the §7 sentinel-echo exit-code workaround can be retired in favor of + `waitpid`. +3. **Session persistence** — each session writes an append-only JSONL log + under `/sessions/.jsonl`. Optional `:resume` + loads a prior session's turns into context. +4. **Readline custom bindings** — wire the rebinding API on `ffi/readline.lua` + so subsequent phases can attach actions to key sequences. Phase 1 itself + binds nothing user-visible; Norris (Phase 3) is the first consumer. + +**Phase 1 is done when:** + +- Assistant responses arrive token-by-token (visible streaming) +- `vim` / `less` / `htop` work end-to-end via `$cmd` or `:exec cmd` +- A session is written to `sessions/*.jsonl` and resumable across `luajit main.lua` invocations +- The Phase 0 `executor.lua` sentinel hack is gone; PHASE0.md §7's sketch becomes accurate (waitpid surfaces the exit code) +- `rl_bind_keyseq` is callable from Lua and known not to crash with a no-op handler bound to a reserved sequence + +--- + +## 2. Technology Decisions (delta from Phase 0) + +| Decision | Choice | Rationale | +|---|---|---| +| Streaming transport | SSE over the existing libcurl easy interface | OpenAI-compat servers (llama.cpp, hossenfelder) emit `text/event-stream` when the request body has `stream: true`. The Phase 0 WRITEFUNCTION callback already receives incremental chunks; the only change is the parsing strategy. | +| Streaming concurrency | Single blocking `curl_easy_perform`; the WRITEFUNCTION calls a Lua `on_delta` callback synchronously | LuaJIT FFI callbacks run on the libcurl thread but Phase 0's WRITEFUNCTION already ran fine that way. No coroutines / no threads in Phase 1. | +| PTY library | `forkpty(3)` from libutil (linked separately on glibc) | Standard, single-call setup of master/slave pair + fork + dup2. Avoids hand-rolling the openpty/grantpt/unlockpt/ptsname dance. | +| Exec uniformity | All shell exec goes through PTY (no `io.popen` fallback) | One code path. Non-interactive cmds (`ls`) work fine on a PTY too. Avoids the per-cmd "is this interactive?" classifier. | +| Exit code recovery | `waitpid(WEXITSTATUS)` from the PTY parent | The §7 sentinel-echo hack is retired. Same commit that lands PTY exec also amends PHASE0.md §7 to drop the LuaJIT-2.1 popen caveat. | +| Session log format | Append-only JSONL (one turn per line) | Streaming-friendly; grep-able; robust to truncation; no parser dependency beyond the vendored dkjson. | +| Session location | `/sessions/.jsonl` | Default `~/.local/share/aish/sessions/` per Phase 0 config. Per-session file → concurrent aish processes don't collide. | +| Session save trigger | Auto-write on `:quit` AND explicit `:save` for mid-session checkpoint | Closes Q3 from PHASE0.md §13 with both. The auto path means kept-by-default; explicit path exists for users who want a checkpoint name. | +| Readline bindings API | Bind via `rl_bind_keyseq` (GNU readline) — `M.bind(seq, lua_fn)` wrapper | Phase 1 ships the wiring; bound sequences with no consuming phase yet are reserved with a logged-status no-op. Phase 3+ replace handlers. | + +--- + +## 3. Module Changes + +No new module file names beyond the §4 stubs already present (`ffi/pty.lua`, +`history.lua`). All changes are growth of existing files. + +| File | Phase 0 | Phase 1 | +|---|---|---| +| `ffi/curl.lua` | Blocking POST; response captured into a Lua string | Add `M.post_sse(url, body, headers, on_event)`. `on_event(delta)` is called per parsed SSE `data:` line. The Phase 0 `M.post` stays for non-streaming consumers. | +| `ffi/pty.lua` | Stub | Implement: `M.spawn(argv) -> handle`; handle exposes `:read()`, `:write(data)`, `:close()`, `:wait() -> exit_code`. Uses `forkpty` + `waitpid`. | +| `ffi/libc.lua` | `chdir`, `errno`, `strerror` | Add `waitpid`, `WEXITSTATUS` (macro materialized in Lua), `read`, `write`, `close`, `kill`, optional `tcgetattr`/`tcsetattr` for raw-mode toggle on the controlling tty. | +| `ffi/readline.lua` | `readline`, `add_history` | Add `rl_bind_keyseq` binding; expose `M.bind(seq, fn)`. | +| `broker.lua` | `M.chat(cfg, msgs)` blocking | Add `M.chat_stream(cfg, msgs, on_delta)`. `M.chat` becomes a thin wrapper that buffers deltas. | +| `executor.lua` | `popen` + sentinel exit-code recovery + `cd` interception + `CMD:` extract | Replace popen path with `pty.spawn`. The sentinel hack is deleted. `cd` interception unchanged (still routes through `libc.chdir`). `CMD:` extract unchanged. | +| `repl.lua` | Blocking ask_ai → renderer.assistant | `chat_stream` with renderer.assistant_delta per chunk; closing flush highlights any completed `CMD:` lines. New meta: `:save`, `:resume `, `:sessions`. | +| `renderer.lua` | `assistant(text)` whole block | Add `assistant_delta(chunk)` and `assistant_flush()`. Streaming path emits raw chunks; flush re-highlights completed `CMD:` lines if needed. | +| `history.lua` | Stub | Implement: `M.open(path) -> session`; `session:append(turn)`; `M.load(path) -> turns`; `M.list_sessions(dir) -> [{name, mtime, turns}]`. | +| `config.lua` | history.dir set | Optional new fields: `session.autosave` (default true), `session.resume_on_start` (default false). | + +--- + +## 4. SSE Streaming + +### Request shape (delta from PHASE0 §6) + +``` +POST /v1/chat/completions +Content-Type: application/json + +{ + "model": "...", + "messages": [...], + "stream": true, + "temperature": 0.2 +} +``` + +### Event format (per OpenAI / llama.cpp) + +``` +data: {"choices":[{"delta":{"content":"Hel"}}]} + +data: {"choices":[{"delta":{"content":"lo"}}]} + +data: [DONE] +``` + +Events are `\n\n`-terminated. `data: ` prefix carries either JSON or the +literal `[DONE]` sentinel. SSE comments (lines starting with `:`) are +ignored. + +### Parser (in `ffi/curl.lua` post_sse) + +1. WRITEFUNCTION accumulates into a buffer. +2. After each callback delivery, scan for `\n\n` event terminators. +3. For each complete event: + - Skip `:` comment lines. + - Strip the `data: ` prefix. + - If body is `[DONE]`, signal end. + - Else `dkjson.decode(body)`, extract `choices[1].delta.content`, call `on_event(content)`. +4. Carry incomplete tail of buffer into next callback. + +UTF-8 codepoint splits at chunk boundaries are tolerated because we hold +delivery in the buffer until a full event is assembled before decoding. + +### Renderer streaming + +`renderer.assistant_delta(chunk)` writes raw characters to stdout (no +ANSI markup yet — the `CMD:` highlight depends on seeing a complete +line). `renderer.assistant_flush()` is called after the SSE stream ends: +it scans the accumulated stdout buffer (kept in renderer-local state) for +completed `CMD:` lines and emits ANSI sequences after-the-fact via cursor +manipulation. Open question Q12 below. + +--- + +## 5. PTY Execution Model + +``` +parent (aish) child (cmd) +───────────── ─────────── +forkpty() │ + │ │ + ├─ master fd ───────┐ │ + │ └────────┴── slave PTY (becomes child stdin/stdout/stderr) + │ + ├─ select / read master fd → renderer.exec_output_delta(chunk) + ├─ write master fd ← user keystrokes (when interactive) + │ + └─ waitpid() → exit_code = WEXITSTATUS(status) +``` + +For Phase 1's interactive cmds (vim/less/htop), aish flips its own +controlling tty to raw mode (`tcgetattr` + `tcsetattr` ICANON/ECHO off) +while the child is running, and restores on exit. Ctrl-C sends `SIGINT` +to the child via `kill(pid, SIGINT)` rather than the aish parent. + +Non-interactive cmds (`ls`, `git status`) run on the same path; the +output is read from the master fd and rendered exactly as Phase 0's +exec_output frame did. The fact that the tty is a PTY rather than a pipe +does not change the visible UX for these. + +Exit code: `waitpid(pid, &status, 0); WEXITSTATUS(status)`. The §7 +sentinel-echo hack is gone. PHASE0.md §7's amendment ("LuaJIT 2.1 +popen-close caveat") becomes obsolete — same commit that lands the PTY +work amends §7 again to drop the caveat. + +--- + +## 6. Session Persistence + +### Format + +Each session is one JSONL file. One turn per line: + +```jsonl +{"ts":"2026-05-10T19:00:01Z","role":"user","content":"list files"} +{"ts":"2026-05-10T19:00:04Z","role":"assistant","content":"CMD: ls"} +{"ts":"2026-05-10T19:00:05Z","role":"user","content":"[exec output]\n..."} +``` + +The first line is special: `{"meta":{"started":"...","model":"fast","aish_version":"phase1"}}`. + +### Lifecycle + +- On startup, `history.lua` opens `/sessions/.jsonl` for append. +- Every `ctx:append_user(...)` and assistant turn triggers a `session:append(turn)`. +- `:quit` closes the file and flushes (auto-save default). +- `:save []` renames the current session file to `.jsonl` (or copies if user wants both auto + named). +- `:resume ` reads a JSONL file, recreates a Context, swaps it in. Q15 below covers the warn/refuse semantics on a non-empty current context. +- `:sessions` lists files in the dir with mtime + turn count. + +### Recovery semantics + +Append-only JSONL means a partial last line (process killed mid-write) +is recoverable: `history.load` skips lines that fail to JSON-parse and +emits a warning. No fsync after every line in Phase 1 (overhead); a +crash may lose the most recent turn. Q? deferred. + +--- + +## 7. Readline Custom Bindings + +Wire `rl_bind_keyseq` from libreadline: + +```c +int rl_bind_keyseq(const char *keyseq, rl_command_func_t function); +``` + +Lua wrapper: + +```lua +function M.bind(seq, fn) + -- ffi.cast a closure that calls fn() and returns 0 + rl.rl_bind_keyseq(seq, fn_cast) +end +``` + +Phase 1 binds nothing user-visible. The reserved-key list is documented +here so subsequent phases don't collide: + +| Sequence | Reserved for | Phase | +|---|---|---| +| `\C-n` | Norris autonomous mode toggle | 3 | +| `\C-x\C-c` | Cancel running CMD: confirm prompt | 3 (or here) | + +Phase 1 binds `\C-n` to a no-op handler that emits a `[aish] Norris mode +not yet implemented (Phase 3)` status, just to verify the wiring works. + +--- + +## 8. Migration from Phase 0 + +User-visible changes: +- Assistant responses stream instead of arriving in a block. +- All exec routes through PTY; `vim`/`less`/`htop` work. +- A session log is written by default; `:reset` no longer loses the conversation forever (it's in the JSONL). + +Substrate (PHASE0.md §3) invariants are unchanged. The §6 broker +contract grows (request body adds `stream: true`; response handling adds +SSE) but the Phase 0 blocking shape stays callable. The §7 amendment +about LuaJIT 2.1 popen-close gets retired in the same commit that lands +PTY exec. + +--- + +## 9. Out of Scope (Phase 1) + +Per PHASE0.md §11, these belong elsewhere: +- Tool-calling / MCP (Phase 2) +- Norris autonomous mode (Phase 3) +- `memory.jsonl` summarization (Phase 4) +- Multi-model routing / cloud fallback (Phase 5) +- Tree-sitter syntax highlighting (Phase 6) + +Specifically out of Phase 1 scope despite proximity: +- Any binding consumer beyond the no-op `\C-n` reserved key. +- Streaming partial-tool-call deltas (Phase 2). +- Session search / pruning beyond `:sessions` listing (Phase 4). + +--- + +## 10. Open Questions + +| # | Question | Impact | Resolve by | +|---|---|---|---| +| Q11 | Hossenfelder-via-OpenRouter SSE: do all routed cloud models emit identical event shape, or do some flatten / re-frame? | broker.lua streaming parser robustness | Phase 7 (verify) | +| Q12 | `CMD:` highlight on streaming output: highlight as the line completes (delayed render), or live-highlight starting at the `CMD: ` prefix detection? Cursor-positioning re-render trade-off. | renderer.lua | Phase 4 (plan) | +| Q13 | TTY raw-mode restore on uncaught Lua error during PTY exec: SIGWINCH handler + on-exit hook, or accept that a crashed aish leaves a wrecked terminal? | executor + signal handling | Phase 4 (plan) | +| Q14 | `\C-n` reserved binding: bind a no-op now (verifies wiring) or defer the entire binding API to Phase 3 (where Norris is the first real consumer)? | ffi/readline + repl scope | Phase 4 (plan) | +| Q15 | `:resume ` into a non-empty current context: refuse with a warning, prompt-overwrite, or merge? | repl + history | Phase 4 (plan) | +| Q16 | Session log fsync: per-line (safe, slow) or close-only (fast, lossy on crash)? Default Phase 1 = close-only; revisit if crash recovery becomes a real concern. | history.lua | Phase 1 default; tracked for Phase 4 if it bites | + +--- + +*End of Phase 1 Manifest — aish*