# aish — Phase 0 Manifest **Project:** aish — AI-augmented conversational shell **Document:** Phase 0 Requirements, Architecture & Design Decisions **Status:** Pre-implementation **Date:** 2026-05-10 --- ## 1. Project Vision aish is a conversational shell that presents a unified REPL to the user, backed by one or more language models accessed through a llama.cpp broker. Its purpose is to support interactive command execution, code assistance, debugging, and re-engineering tasks from a single terminal interface. The model layer is transparent to the user but configurable and extensible. aish is not a wrapper around an existing shell. It is a first-class interactive environment that composes shell execution, AI inference, context management, and session memory into a coherent workflow. --- ## 2. Scope of Phase 0 Phase 0 is the minimal working skeleton. It establishes the REPL loop, input dispatch, model communication, and basic meta-command handling. No streaming, no autonomous mode, no persistence, no PTY. **Phase 0 is done when:** - The user can type natural language and receive a model response - The user can type a shell command and have it executed with output captured and displayed - The user can type `:meta` commands to control aish itself - The conversation history is maintained in memory for the session - The codebase structure matches the target layout so later phases slot in without refactoring --- ## 3. Technology Decisions | Decision | Choice | Rationale | |---|---|---| | Implementation language | LuaJIT 2.x | Compact, embeddable, FFI eliminates need for C shim layer | | FFI strategy | LuaJIT FFI only (no C extension modules) | Direct `ffi.cdef` access to libc, libcurl, readline; no build toolchain required | | HTTP client | libcurl via FFI | Supports SSE streaming (Phase 1+); available on all target platforms | | Terminal input | GNU readline via FFI | Full readline semantics, custom key bindings, history, without dependency on a Lua readline package | | Model backend | llama.cpp OpenAI-compatible server (`/v1/chat/completions`) | Externally managed; aish does not own the process lifecycle | | Shell execution | `io.popen` in Phase 0; `forkpty` via libc FFI from Phase 1 | `popen` sufficient for non-interactive commands; PTY required for vim, htop, etc. | | Session persistence | Deferred to Phase 1 | Phase 0 holds history in memory only | | Config format | Lua table (plain `.lua` file sourced at startup) | No parser dependency; native types; easily extended | | JSON encode/decode | dkjson 2.8 vendored under `vendor/dkjson.lua` | Pure Lua (preserves §3 "no compiled extensions" invariant); single-file vendor avoids `luarocks`; sourced from Debian's `lua-dkjson` package, originally from dkolf.de | **FFI loader fallback.** `ffi.load("readline")` and `ffi.load("curl")` look for the unversioned `lib.so` symlink, which is only installed by the `-dev` package. Phase 0 loaders try the unversioned name first then fall back to versioned sonames (`readline.so.8`, `readline.so.7`, `curl.so.4`, `curl-gnutls.so.4`) so a runtime-only host (Debian/ALARM without `lib-dev`) just works. --- ## 4. Target Directory Layout ``` aish/ ├── main.lua # Entry point: arg parsing, config load, REPL start ├── repl.lua # Readline loop, input dispatch, prompt rendering ├── broker.lua # llama.cpp HTTP client; Phase 0: blocking POST ├── mcp.lua # MCP JSON-RPC 2.0 client (Phase 2; added 2026-05-12) ├── router.lua # Task classifier: shell / AI / meta ├── executor.lua # Command execution; Phase 0: io.popen ├── context.lua # In-memory conversation history, token budget ├── history.lua # Persistent session log + memory.jsonl (Phase 1) ├── safety.lua # Destructive op heuristic, Chuck Norris gate (Phase 3) ├── renderer.lua # Output formatting, ANSI sequences ├── config.lua # Model registry, routing rules, user preferences └── ffi/ ├── curl.lua # libcurl easy interface binding ├── readline.lua # GNU readline binding ├── pty.lua # forkpty, openpty, waitpid (Phase 1) └── libc.lua # Shared: errno, signal, write, read, misc ``` All modules are required explicitly from `main.lua`. No module autoloading. File names are stable across phases — later phases fill in bodies, not rename files. Adding new files is permitted and additive (e.g. `mcp.lua` was inserted at Phase 2 per docs/PHASE2.md §9); the rename prohibition is what keeps cross-phase wiring stable. --- ## 5. Input Dispatch Model Every line of user input is classified by `router.lua` before any action is taken. ``` input ├── :command [args] → meta handler (repl.lua) ├── $ prefix or → shell executor (executor.lua) │ heuristic match └── everything else → AI broker (broker.lua) ``` ### 5.1 Shell heuristic An input line is treated as a shell command if it: - Begins with `$` (explicit override, prefix stripped before exec) - Matches a known command prefix from a configurable allowlist (e.g. `ls`, `cd`, `git`, `make`, `grep`, `cat`, `find`, `cp`, `mv`, `mkdir`) - Contains a bare path-like token as first word (`./foo`, `/usr/bin/bar`) Everything else goes to the model. The user can always force routing with `$` or with `:exec`. ### 5.2 Meta commands (Phase 0 set) | Command | Action | |---|---| | `:quit` / `:q` | Exit aish | | `:clear` | Clear screen and reset display; keep history | | `:reset` | Clear in-memory conversation history | | `:model ` | Switch active model (must exist in config) | | `:models` | List configured models and active selection | | `:history` | Print conversation turns for current session | | `:exec ` | Force shell execution regardless of heuristic | | `:ask ` | Force AI query regardless of heuristic | | `:help` | Print meta command reference | --- ## 6. Broker Interface (Phase 0) Phase 0 uses a single blocking HTTP POST. libcurl FFI is wired but SSE streaming is not yet consumed — the response is read to completion and returned as a string. ### Request shape ``` POST http:///v1/chat/completions Content-Type: application/json { "model": "", "messages": [ ], "stream": false, "temperature": 0.2 } ``` ### Conversation history format Each turn is stored in `context.lua` as: ```lua { role = "system" | "user" | "assistant", content = "..." } ``` The system prompt is prepended on every request and is not stored as a history turn. **Exec output injection.** Captured shell-exec output is **not** appended as its own user turn — that produces user/user back-to-back, which strict chat templates (e.g. `mistral-nemo-instruct`'s Jinja) reject with `roles must alternate user/assistant/...`. Instead, exec output is buffered on the context and prepended to the **next** user turn with a `[exec output]` tag. Multiple shell calls between AI turns concatenate. `:reset` clears the buffer. The user-visible behavior is unchanged; only the role alternation seen by the broker differs. ### System prompt (Phase 0 default) ``` You are aish, an AI-augmented shell assistant. You help the user execute shell commands, write and debug code, and re-engineer software. When suggesting shell commands, output them on a line beginning with exactly "CMD: " so aish can identify and optionally execute them. Be concise. Prefer concrete actions over explanations unless asked. ``` The `CMD:` prefix convention is the extraction contract between the model and `executor.lua`. Phase 0 presents CMD lines with a confirmation prompt before execution. --- ## 7. Execution Model (Phase 0) ```lua -- executor.lua Phase 0 (illustrative — see note below) local function exec(cmd) local handle = io.popen(cmd .. " 2>&1", "r") local output = handle:read("*a") local ok, _, code = handle:close() return output, code end ``` **Superseded by Phase 1.** The §7 sketch was never quite accurate on LuaJIT 2.1 (which follows the Lua 5.1 ABI for `io.popen():close()` and returns only `true` — no exit status). The Phase 0 implementation worked around this with a sentinel-echo wrapper (`(cmd) 2>&1; echo __AISH_EXIT___$?`) and parsed the status back out of stdout. Phase 1 retired the workaround entirely: `executor.lua` now spawns the child via `forkpty` and recovers exit status via `waitpid(WEXITSTATUS)`. See docs/PHASE1.md §5 for the current PTY model. Output is captured and: 1. Printed to the terminal 2. Injected into `context.lua` as a `[exec output]` user turn `cd` is intercepted before `popen` and handled via `libc.chdir` (FFI) so the working directory change persists across calls — `popen` forks a subprocess and `cd` inside it would otherwise be discarded. --- ## 8. Context Management (Phase 0) In-memory only. No disk I/O in Phase 0. ```lua -- context.lua Phase 0 shape Context = { system_prompt = "...", turns = {}, -- ordered list of {role, content} max_turns = 40, -- sliding window; oldest non-system turns evicted token_budget = 4096, -- soft limit; rough char/4 estimate } ``` Token budget enforcement is approximate in Phase 0 (character count / 4). Accurate tokenization is a Phase 3 concern. When `max_turns` is reached, the oldest two turns (one user + one assistant) are evicted silently. The user is notified with a status line: `[context] oldest 2 turns evicted`. --- ## 9. readline Integration (Phase 0) Minimal FFI binding wrapping three calls: ```lua ffi.cdef[[ char *readline(const char *prompt); void add_history(const char *line); void free(void *ptr); ]] ``` - `add_history` called for every non-empty input line - Arrow keys and `Ctrl-R` reverse search work automatically via readline - Custom key bindings (`Ctrl-N` for Norris mode etc.) deferred to Phase 1 Prompt format: ``` [aish:fast]> ``` Where `fast` is the active model name as defined in `config.lua`. In Norris mode (Phase 3) this becomes: ``` [aish:fast ⚡]> ``` --- ## 10. Configuration (Phase 0) `config.lua` is loaded at startup with `dofile`. It returns a plain Lua table. ```lua -- config.lua example return { default_model = "fast", models = { fast = { endpoint = "http://localhost:8080", model = "qwen2.5-coder-1.5b-q8", temperature = 0.2, }, deep = { endpoint = "http://localhost:8081", model = "qwen2.5-coder-32b-q4", temperature = 0.1, }, cloud = { endpoint = "https://api.openai.com", model = "gpt-4o", key_env = "OPENAI_API_KEY", -- read from environment at startup temperature = 0.2, }, }, shell = { known_commands = { "ls", "cat", "cd", "grep", "find", "cp", "mv", "rm", "mkdir", "rmdir", "git", "make", "cmake", "gcc", "clang", "python3", "luajit", "ssh", "scp", "curl", "wget", }, capture_output = true, -- inject exec output into context confirm_cmd = true, -- prompt before executing CMD: suggestions }, context = { max_turns = 40, token_budget = 4096, }, history = { dir = os.getenv("HOME") .. "/.local/share/aish", }, } ``` Config path resolution order: 1. `--config ` CLI argument (explicit; failure if not openable, no fallback) 2. `$AISH_CONFIG` environment variable 3. `~/.config/aish/config.lua` 4. `./config.lua` (development fallback) Phase 9 adds a project-local overlay step AFTER the user config resolves: walks up from cwd looking for `.aish.lua` (stops at `$HOME` or `/`), prompts to trust on first encounter, sha256-pins the trust record, and shallow-merges the project's top-level keys onto the user config. See `docs/PHASE9.md`. **Cwd-relative module resolution.** Phase 0 prepends `./?.lua;./vendor/?.lua` to `package.path`, so `luajit main.lua` must be invoked with the repo root as cwd. Cwd-independent resolution (relative to the script's own directory) lands later — likely Phase 1 alongside the install path work, or whenever the first user reports trying `luajit ~/aish/main.lua` from somewhere else. --- ## 11. Planned Phase Sequence | Phase | Key additions | |---|---| | **0** | Blocking REPL, `io.popen` exec, single model, in-memory context, meta commands | | **1** | SSE streaming via libcurl FFI, PTY via `forkpty` FFI, session persistence (`sessions/*.jsonl`), readline custom bindings | | **2** | MCP client (`mcp.lua`): tool-calling via OpenAI-compatible `tools` field on `/v1/chat/completions`; MCP JSON-RPC 2.0 over HTTP/SSE transport (target: lmcp); tool-result turns in context; per-server config + runtime `:mcp` meta commands; system prompt rewrite to declare the tools schema (replaces or augments §6's `CMD:` contract — see Q6); `safety.lua` extended to gate tool calls (see Q8) | | **3** | Chuck Norris autonomous mode, destructive op heuristic (static + model), HALT/confirm gate, planning loop (now able to use MCP tools as well as `CMD:` lines) | | **4** | `memory.jsonl` summarization, startup context injection from memory, `:history` management, pruning | | **5** | Multi-model routing by task type, cloud fallback, context summarization via fast model on eviction | | **6** | Tree-sitter syntax highlighting hooks, diff-aware code injection, project-level context (file tree summary) | | **7** | Cost / usage observability: broker captures `usage` + `cost`; per-session accumulator on ctx; `:cost` reporter; optional warn thresholds | | **8** | Accurate tokenization: per-endpoint `/tokenize` probe (cached); `broker.token_count`; `Context:estimate_tokens` widened; `:cost detail` est-vs-actual annotation | | **9** | Project-local config overlay (`.aish.lua` walk-up from cwd to $HOME, sha256-pinned trust prompt, shallow merge over user config); `:config show` meta | | **10** | Cloud preplanner + local executor split for Norris (`cfg.norris.preplanner` emits TASK list once; `cfg.norris.executor` runs each step); `extract_task_lines`; `ctx.norris_tasks` anchor (survives eviction); cost category `"norris-preplan"` | --- ## 12. Out of Scope (All Phases) - aish does not manage llama.cpp server lifecycle - aish does not implement its own model inference - aish is not a multiplexer (no tmux-style window management) - aish does not provide a GUI or web interface - aish does not sandbox executed commands at the OS level (no namespaces, no seccomp) Security posture: aish trusts the local user. The destructive-op gate in Norris mode is a workflow safeguard, not a security boundary. --- ## 13. Open Questions (Tracked) | # | Question | Impact | Target Phase | |---|---|---|---| | Q1 | Token counting: use model's `/tokenize` endpoint or keep char/4 heuristic? | Context eviction accuracy | Phase 3 | | Q2 | Norris mode: should the planner emit a numbered step list and track progress, or re-plan after each step? | Loop structure in safety.lua | Phase 3 | | Q3 | Summarization at session end: automatic on `:quit`, or explicit `:save`? | UX + history.lua API | Phase 4 | | Q4 | Should `CMD:` extraction support multi-command blocks (here-doc style)? | executor.lua parser | Phase 1 | | Q5 | Cloud model routing: explicit `:model cloud` only, or automatic fallback on local timeout? | router.lua policy | Phase 5 | | Q6 | How do `CMD:` extraction (Phase 0) and MCP tool-calls (Phase 2) coexist — both, prefer tools, retire `CMD:`? Note: choosing "retire `CMD:`" requires a §3 invariant amendment in the same commit, not just a Phase 2 internal call. | broker.lua + executor.lua + system prompt + (§3 if retiring) | Phase 2 | | Q7 | MCP server discovery: declared in `config.lua` only, runtime `:mcp connect `, or both? | config.lua schema + repl.lua meta set | Phase 2 | | Q8 | Tool-call authorization gate: per-call confirm (like `confirm_cmd`), per-tool policy in config, or trust-list by server? | safety.lua + mcp.lua + Norris-mode interaction | Phase 2 (informs Phase 3) | | Q9 | MCP system-prompt augmentation locus: static block in `broker.lua`, assembled per-request from connected servers' tool schemas, or hybrid (static frame + dynamic tool list)? Per-request assembly costs tokens on every turn; static drifts from server reality; hybrid splits the cost. | broker.lua + mcp.lua + system prompt | Phase 2 | | Q10 | Tool-call streaming vs the Phase 1 SSE substrate: does Phase 2 land tool calls on the still-blocking Phase 0 broker (and refit when SSE arrives in Phase 1), or require Phase 1 SSE to land first so tool-call deltas stream from day one? Phase ordering implication either way. | broker.lua + mcp.lua + phase ordering | Phase 2 (informs Phase 1 ordering) | --- *End of Phase 0 Manifest — aish*