Files
aish/docs/PHASE1.md
T
marfrit 1f1065157e review BLOCKER: PTY input forwarding + raw mode toggle
Phase 1 review caught a structural gap: executor.exec only drained the
PTY master fd, never forwarded user keystrokes — vim/less/htop/nano
would render and hang on input. PHASE1.md §5 specified bidirectional
multiplex but only the read leg landed. tcgetattr/tcsetattr were also
missing, so even with input forwarding the parent's line discipline
would buffer until newline (breaking single-key UIs).

ffi/libc:
  - struct termios opaque buffer + tcgetattr/tcsetattr + cfmakeraw
  - M.set_raw(fd) saves termios + applies cfmakeraw; returns saved or
    (nil, err) when fd isn't a tty (scripted / piped-stdin runs)
  - M.restore_termios(fd, saved)
  - struct pollfd + M.poll (POLLIN constant)

executor:
  - multiplex(sess): poll(stdin, master); reads master on any revents
    (POLLHUP fires when child closes its slave end, not POLLIN — the
    revents != 0 check catches both); forwards stdin keystrokes to
    master; loop exits when master read returns 0 (EOF / child gone)
  - stdin polling is only enabled when stdin_is_tty (set_raw succeeded);
    piped-stdin runs (tests / scripted) would otherwise drain queued
    aish commands into the child of the *current* cmd, swallowing them
  - raw mode is restored before returning so the user lands back at the
    aish prompt in canonical mode

renderer + repl:
  - exec_output(out, code) split into exec_begin() (top rule, before
    spawn) + exec_end(code) (closing rule with exit, after wait). PTY
    multiplex streams the body live to stdout in between; the renderer
    never re-prints the body.

PHASE1.md §3:
  - tcgetattr/tcsetattr changed from "optional" to "required for
    single-key UIs to work — done-criteria #2"; poll added to the libc
    row description.

Verified:
  - non-interactive smoke (echo / false / exit 7 / ls /nonexistent /
    printf multi-line) — all exit codes correct, output streamed live,
    a\nb\nc\n preserved byte-for-byte
  - scripted-stdin run reaches all expected lines (no stdin draining
    into a non-interactive child)
  - aish prompt + framed exec block + exit-code line all render in
    correct order

Live interactive verification (vim / less / htop in a real terminal)
still needs a user-test pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:00:53 +00:00

272 lines
13 KiB
Markdown

# aish — Phase 1 Manifest
**Project:** aish — AI-augmented conversational shell
**Document:** Phase 1 Requirements, Architecture & Design Decisions
**Status:** Formulate (pre-analysis)
**Date:** 2026-05-10
PHASE0.md is the locked substrate. This manifest specifies what Phase 1
adds on top. Section numbers reference back to PHASE0.md when relevant.
---
## 1. Scope of Phase 1
Four pillars per PHASE0.md §11:
1. **SSE streaming** — assistant text arrives incrementally instead of as
a complete block at end of `curl_easy_perform`. Reuses the Phase 0
WRITEFUNCTION hook in `ffi/curl.lua`.
2. **PTY-backed exec** via `forkpty` (libc FFI). Replaces Phase 0's
`io.popen` so interactive commands (`vim`, `less`, `htop`) work and so
the §7 sentinel-echo exit-code workaround can be retired in favor of
`waitpid`.
3. **Session persistence** — each session writes an append-only JSONL log
under `<config.history.dir>/sessions/<utc>.jsonl`. Optional `:resume`
loads a prior session's turns into context.
4. **Readline custom bindings** — wire the rebinding API on `ffi/readline.lua`
so subsequent phases can attach actions to key sequences. Phase 1 itself
binds nothing user-visible; Norris (Phase 3) is the first consumer.
**Phase 1 is done when:**
- Assistant responses arrive token-by-token (visible streaming)
- `vim` / `less` / `htop` work end-to-end via `$cmd` or `:exec cmd`
- A session is written to `sessions/*.jsonl` and resumable across `luajit main.lua` invocations
- The Phase 0 `executor.lua` sentinel hack is gone; PHASE0.md §7's sketch becomes accurate (waitpid surfaces the exit code)
- `rl_bind_keyseq` is callable from Lua and known not to crash with a no-op handler bound to a reserved sequence
---
## 2. Technology Decisions (delta from Phase 0)
| Decision | Choice | Rationale |
|---|---|---|
| Streaming transport | SSE over the existing libcurl easy interface | OpenAI-compat servers (llama.cpp, hossenfelder) emit `text/event-stream` when the request body has `stream: true`. The Phase 0 WRITEFUNCTION callback already receives incremental chunks; the only change is the parsing strategy. |
| Streaming concurrency | Single blocking `curl_easy_perform`; the WRITEFUNCTION calls a Lua `on_delta` callback synchronously | LuaJIT FFI callbacks run on the libcurl thread but Phase 0's WRITEFUNCTION already ran fine that way. No coroutines / no threads in Phase 1. |
| PTY library | `forkpty(3)` from libutil (linked separately on glibc) | Standard, single-call setup of master/slave pair + fork + dup2. Avoids hand-rolling the openpty/grantpt/unlockpt/ptsname dance. |
| Exec uniformity | All shell exec goes through PTY (no `io.popen` fallback) | One code path. Non-interactive cmds (`ls`) work fine on a PTY too. Avoids the per-cmd "is this interactive?" classifier. |
| Exit code recovery | `waitpid(WEXITSTATUS)` from the PTY parent | The §7 sentinel-echo hack is retired. Same commit that lands PTY exec also amends PHASE0.md §7 to drop the LuaJIT-2.1 popen caveat. |
| Session log format | Append-only JSONL (one turn per line) | Streaming-friendly; grep-able; robust to truncation; no parser dependency beyond the vendored dkjson. |
| Session location | `<config.history.dir>/sessions/<UTC-iso8601>.jsonl` | Default `~/.local/share/aish/sessions/` per Phase 0 config. Per-session file → concurrent aish processes don't collide. |
| Session save trigger | Auto-write on `:quit` AND explicit `:save` for mid-session checkpoint | Closes Q3 from PHASE0.md §13 with both. The auto path means kept-by-default; explicit path exists for users who want a checkpoint name. |
| Readline bindings API | Bind via `rl_bind_keyseq` (GNU readline) — `M.bind(seq, lua_fn)` wrapper | Phase 1 ships the wiring; bound sequences with no consuming phase yet are reserved with a logged-status no-op. Phase 3+ replace handlers. |
---
## 3. Module Changes
No new module file names beyond the §4 stubs already present (`ffi/pty.lua`,
`history.lua`). All changes are growth of existing files.
| File | Phase 0 | Phase 1 |
|---|---|---|
| `ffi/curl.lua` | Blocking POST; response captured into a Lua string | Add `M.post_sse(url, body, headers, on_event)`. `on_event(delta)` is called per parsed SSE `data:` line. The Phase 0 `M.post` stays for non-streaming consumers. |
| `ffi/pty.lua` | Stub | Implement: `M.spawn(argv) -> handle`; handle exposes `:read()`, `:write(data)`, `:close()`, `:wait() -> exit_code`. Uses `forkpty` + `waitpid`. |
| `ffi/libc.lua` | `chdir`, `errno`, `strerror` | Add `waitpid`, `WEXITSTATUS` (macro materialized in Lua), `read`, `write`, `close`, `kill`, `tcgetattr`/`tcsetattr` + `cfmakeraw` for raw-mode toggle on the controlling tty (required for single-key UIs to work — done-criteria #2), `poll` for stdin↔master multiplex in executor. |
| `ffi/readline.lua` | `readline`, `add_history` | Add `rl_bind_keyseq` binding; expose `M.bind(seq, fn)`. |
| `broker.lua` | `M.chat(cfg, msgs)` blocking | Add `M.chat_stream(cfg, msgs, on_delta)`. `M.chat` becomes a thin wrapper that buffers deltas. |
| `executor.lua` | `popen` + sentinel exit-code recovery + `cd` interception + `CMD:` extract | Replace popen path with `pty.spawn`. The sentinel hack is deleted. `cd` interception unchanged (still routes through `libc.chdir`). `CMD:` extract unchanged. |
| `repl.lua` | Blocking ask_ai → renderer.assistant | `chat_stream` with renderer.assistant_delta per chunk; closing flush highlights any completed `CMD:` lines. New meta: `:save`, `:resume <name>`, `:sessions`. |
| `renderer.lua` | `assistant(text)` whole block | Add `assistant_delta(chunk)` and `assistant_flush()`. Streaming path emits raw chunks; flush re-highlights completed `CMD:` lines if needed. |
| `history.lua` | Stub | Implement: `M.open(path) -> session`; `session:append(turn)`; `M.load(path) -> turns`; `M.list_sessions(dir) -> [{name, mtime, turns}]`. |
| `config.lua` | history.dir set | Optional new fields: `session.autosave` (default true), `session.resume_on_start` (default false). |
---
## 4. SSE Streaming
### Request shape (delta from PHASE0 §6)
```
POST /v1/chat/completions
Content-Type: application/json
{
"model": "...",
"messages": [...],
"stream": true,
"temperature": 0.2
}
```
### Event format (per OpenAI / llama.cpp)
```
data: {"choices":[{"delta":{"content":"Hel"}}]}
data: {"choices":[{"delta":{"content":"lo"}}]}
data: [DONE]
```
Events are `\n\n`-terminated. `data: ` prefix carries either JSON or the
literal `[DONE]` sentinel. SSE comments (lines starting with `:`) are
ignored.
### Parser (in `ffi/curl.lua` post_sse)
1. WRITEFUNCTION accumulates into a buffer.
2. After each callback delivery, scan for `\n\n` event terminators.
3. For each complete event:
- Skip `:` comment lines.
- Strip the `data: ` prefix.
- If body is `[DONE]`, signal end.
- Else `dkjson.decode(body)`, extract `choices[1].delta.content`, call `on_event(content)`.
4. Carry incomplete tail of buffer into next callback.
UTF-8 codepoint splits at chunk boundaries are tolerated because we hold
delivery in the buffer until a full event is assembled before decoding.
### Renderer streaming
`renderer.assistant_delta(chunk)` writes raw characters to stdout (no
ANSI markup yet — the `CMD:` highlight depends on seeing a complete
line). `renderer.assistant_flush()` is called after the SSE stream ends:
it scans the accumulated stdout buffer (kept in renderer-local state) for
completed `CMD:` lines and emits ANSI sequences after-the-fact via cursor
manipulation. Open question Q12 below.
---
## 5. PTY Execution Model
```
parent (aish) child (cmd)
───────────── ───────────
forkpty() │
│ │
├─ master fd ───────┐ │
│ └────────┴── slave PTY (becomes child stdin/stdout/stderr)
├─ select / read master fd → renderer.exec_output_delta(chunk)
├─ write master fd ← user keystrokes (when interactive)
└─ waitpid() → exit_code = WEXITSTATUS(status)
```
For Phase 1's interactive cmds (vim/less/htop), aish flips its own
controlling tty to raw mode (`tcgetattr` + `tcsetattr` ICANON/ECHO off)
while the child is running, and restores on exit. Ctrl-C sends `SIGINT`
to the child via `kill(pid, SIGINT)` rather than the aish parent.
Non-interactive cmds (`ls`, `git status`) run on the same path; the
output is read from the master fd and rendered exactly as Phase 0's
exec_output frame did. The fact that the tty is a PTY rather than a pipe
does not change the visible UX for these.
Exit code: `waitpid(pid, &status, 0); WEXITSTATUS(status)`. The §7
sentinel-echo hack is gone. PHASE0.md §7's amendment ("LuaJIT 2.1
popen-close caveat") becomes obsolete — same commit that lands the PTY
work amends §7 again to drop the caveat.
---
## 6. Session Persistence
### Format
Each session is one JSONL file. One turn per line:
```jsonl
{"ts":"2026-05-10T19:00:01Z","role":"user","content":"list files"}
{"ts":"2026-05-10T19:00:04Z","role":"assistant","content":"CMD: ls"}
{"ts":"2026-05-10T19:00:05Z","role":"user","content":"[exec output]\n..."}
```
The first line is special: `{"meta":{"started":"...","model":"fast","aish_version":"phase1"}}`.
### Lifecycle
- On startup, `history.lua` opens `<config.history.dir>/sessions/<utc-iso8601>.jsonl` for append.
- Every `ctx:append_user(...)` and assistant turn triggers a `session:append(turn)`.
- `:quit` closes the file and flushes (auto-save default).
- `:save [<name>]` renames the current session file to `<name>.jsonl` (or copies if user wants both auto + named).
- `:resume <name>` reads a JSONL file, recreates a Context, swaps it in. Q15 below covers the warn/refuse semantics on a non-empty current context.
- `:sessions` lists files in the dir with mtime + turn count.
### Recovery semantics
Append-only JSONL means a partial last line (process killed mid-write)
is recoverable: `history.load` skips lines that fail to JSON-parse and
emits a warning. No fsync after every line in Phase 1 (overhead); a
crash may lose the most recent turn. Q? deferred.
---
## 7. Readline Custom Bindings
Wire `rl_bind_keyseq` from libreadline:
```c
int rl_bind_keyseq(const char *keyseq, rl_command_func_t function);
```
Lua wrapper:
```lua
function M.bind(seq, fn)
-- ffi.cast a closure that calls fn() and returns 0
rl.rl_bind_keyseq(seq, fn_cast)
end
```
Phase 1 binds nothing user-visible. The reserved-key list is documented
here so subsequent phases don't collide:
| Sequence | Reserved for | Phase |
|---|---|---|
| `\C-n` | Norris autonomous mode toggle | 3 |
| `\C-x\C-c` | Cancel running CMD: confirm prompt | 3 (or here) |
Phase 1 binds `\C-n` to a no-op handler that emits a `[aish] Norris mode
not yet implemented (Phase 3)` status, just to verify the wiring works.
---
## 8. Migration from Phase 0
User-visible changes:
- Assistant responses stream instead of arriving in a block.
- All exec routes through PTY; `vim`/`less`/`htop` work.
- A session log is written by default; `:reset` no longer loses the conversation forever (it's in the JSONL).
Substrate (PHASE0.md §3) invariants are unchanged. The §6 broker
contract grows (request body adds `stream: true`; response handling adds
SSE) but the Phase 0 blocking shape stays callable. The §7 amendment
about LuaJIT 2.1 popen-close gets retired in the same commit that lands
PTY exec.
---
## 9. Out of Scope (Phase 1)
Per PHASE0.md §11, these belong elsewhere:
- Tool-calling / MCP (Phase 2)
- Norris autonomous mode (Phase 3)
- `memory.jsonl` summarization (Phase 4)
- Multi-model routing / cloud fallback (Phase 5)
- Tree-sitter syntax highlighting (Phase 6)
Specifically out of Phase 1 scope despite proximity:
- Any binding consumer beyond the no-op `\C-n` reserved key.
- Streaming partial-tool-call deltas (Phase 2).
- Session search / pruning beyond `:sessions` listing (Phase 4).
---
## 10. Open Questions
| # | Question | Impact | Resolve by |
|---|---|---|---|
| Q11 | Hossenfelder-via-OpenRouter SSE: do all routed cloud models emit identical event shape, or do some flatten / re-frame? | broker.lua streaming parser robustness | Phase 7 (verify) |
| Q12 | `CMD:` highlight on streaming output: highlight as the line completes (delayed render), or live-highlight starting at the `CMD: ` prefix detection? Cursor-positioning re-render trade-off. | renderer.lua | Phase 4 (plan) |
| Q13 | TTY raw-mode restore on uncaught Lua error during PTY exec: SIGWINCH handler + on-exit hook, or accept that a crashed aish leaves a wrecked terminal? | executor + signal handling | Phase 4 (plan) |
| Q14 | `\C-n` reserved binding: bind a no-op now (verifies wiring) or defer the entire binding API to Phase 3 (where Norris is the first real consumer)? | ffi/readline + repl scope | Phase 4 (plan) |
| Q15 | `:resume <name>` into a non-empty current context: refuse with a warning, prompt-overwrite, or merge? | repl + history | Phase 4 (plan) |
| Q16 | Session log fsync: per-line (safe, slow) or close-only (fast, lossy on crash)? Default Phase 1 = close-only; revisit if crash recovery becomes a real concern. | history.lua | Phase 1 default; tracked for Phase 4 if it bites |
---
*End of Phase 1 Manifest — aish*