2704edd57d
Captures the JSON-library decision noted as open in CLAUDE.md §6. dkjson is pure Lua (preserves §3's "no compiled extensions" invariant), single file, redistributable (MIT/X11). Sourced from Debian's `lua-dkjson` package (/usr/share/lua/5.1/dkjson.lua, version 2.8) — Debian's curated copy of the upstream at dkolf.de. Vendoring (rather than relying on a system lua-dkjson install) keeps aish self-contained per the §3 "no luarocks packages" invariant: any host with luajit can run the tree as-is. PHASE0.md §3 grows one row recording the choice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
324 lines
14 KiB
Markdown
324 lines
14 KiB
Markdown
# aish — Phase 0 Manifest
|
|
|
|
**Project:** aish — AI-augmented conversational shell
|
|
**Document:** Phase 0 Requirements, Architecture & Design Decisions
|
|
**Status:** Pre-implementation
|
|
**Date:** 2026-05-10
|
|
|
|
---
|
|
|
|
## 1. Project Vision
|
|
|
|
aish is a conversational shell that presents a unified REPL to the user, backed by one or more language models accessed through a llama.cpp broker. Its purpose is to support interactive command execution, code assistance, debugging, and re-engineering tasks from a single terminal interface. The model layer is transparent to the user but configurable and extensible.
|
|
|
|
aish is not a wrapper around an existing shell. It is a first-class interactive environment that composes shell execution, AI inference, context management, and session memory into a coherent workflow.
|
|
|
|
---
|
|
|
|
## 2. Scope of Phase 0
|
|
|
|
Phase 0 is the minimal working skeleton. It establishes the REPL loop, input dispatch, model communication, and basic meta-command handling. No streaming, no autonomous mode, no persistence, no PTY.
|
|
|
|
**Phase 0 is done when:**
|
|
|
|
- The user can type natural language and receive a model response
|
|
- The user can type a shell command and have it executed with output captured and displayed
|
|
- The user can type `:meta` commands to control aish itself
|
|
- The conversation history is maintained in memory for the session
|
|
- The codebase structure matches the target layout so later phases slot in without refactoring
|
|
|
|
---
|
|
|
|
## 3. Technology Decisions
|
|
|
|
| Decision | Choice | Rationale |
|
|
|---|---|---|
|
|
| Implementation language | LuaJIT 2.x | Compact, embeddable, FFI eliminates need for C shim layer |
|
|
| FFI strategy | LuaJIT FFI only (no C extension modules) | Direct `ffi.cdef` access to libc, libcurl, readline; no build toolchain required |
|
|
| HTTP client | libcurl via FFI | Supports SSE streaming (Phase 1+); available on all target platforms |
|
|
| Terminal input | GNU readline via FFI | Full readline semantics, custom key bindings, history, without dependency on a Lua readline package |
|
|
| Model backend | llama.cpp OpenAI-compatible server (`/v1/chat/completions`) | Externally managed; aish does not own the process lifecycle |
|
|
| Shell execution | `io.popen` in Phase 0; `forkpty` via libc FFI from Phase 1 | `popen` sufficient for non-interactive commands; PTY required for vim, htop, etc. |
|
|
| Session persistence | Deferred to Phase 1 | Phase 0 holds history in memory only |
|
|
| Config format | Lua table (plain `.lua` file sourced at startup) | No parser dependency; native types; easily extended |
|
|
| JSON encode/decode | dkjson 2.8 vendored under `vendor/dkjson.lua` | Pure Lua (preserves §3 "no compiled extensions" invariant); single-file vendor avoids `luarocks`; sourced from Debian's `lua-dkjson` package, originally from dkolf.de |
|
|
|
|
---
|
|
|
|
## 4. Target Directory Layout
|
|
|
|
```
|
|
aish/
|
|
├── main.lua # Entry point: arg parsing, config load, REPL start
|
|
├── repl.lua # Readline loop, input dispatch, prompt rendering
|
|
├── broker.lua # llama.cpp HTTP client; Phase 0: blocking POST
|
|
├── router.lua # Task classifier: shell / AI / meta
|
|
├── executor.lua # Command execution; Phase 0: io.popen
|
|
├── context.lua # In-memory conversation history, token budget
|
|
├── history.lua # Persistent session log + memory.jsonl (Phase 1)
|
|
├── safety.lua # Destructive op heuristic, Chuck Norris gate (Phase 3)
|
|
├── renderer.lua # Output formatting, ANSI sequences
|
|
├── config.lua # Model registry, routing rules, user preferences
|
|
└── ffi/
|
|
├── curl.lua # libcurl easy interface binding
|
|
├── readline.lua # GNU readline binding
|
|
├── pty.lua # forkpty, openpty, waitpid (Phase 1)
|
|
└── libc.lua # Shared: errno, signal, write, read, misc
|
|
```
|
|
|
|
All modules are required explicitly from `main.lua`. No module autoloading. File names are stable across phases — later phases fill in bodies, not rename files.
|
|
|
|
---
|
|
|
|
## 5. Input Dispatch Model
|
|
|
|
Every line of user input is classified by `router.lua` before any action is taken.
|
|
|
|
```
|
|
input
|
|
├── :command [args] → meta handler (repl.lua)
|
|
├── $ prefix or → shell executor (executor.lua)
|
|
│ heuristic match
|
|
└── everything else → AI broker (broker.lua)
|
|
```
|
|
|
|
### 5.1 Shell heuristic
|
|
|
|
An input line is treated as a shell command if it:
|
|
|
|
- Begins with `$` (explicit override, prefix stripped before exec)
|
|
- Matches a known command prefix from a configurable allowlist (e.g. `ls`, `cd`, `git`, `make`, `grep`, `cat`, `find`, `cp`, `mv`, `mkdir`)
|
|
- Contains a bare path-like token as first word (`./foo`, `/usr/bin/bar`)
|
|
|
|
Everything else goes to the model. The user can always force routing with `$` or with `:exec`.
|
|
|
|
### 5.2 Meta commands (Phase 0 set)
|
|
|
|
| Command | Action |
|
|
|---|---|
|
|
| `:quit` / `:q` | Exit aish |
|
|
| `:clear` | Clear screen and reset display; keep history |
|
|
| `:reset` | Clear in-memory conversation history |
|
|
| `:model <name>` | Switch active model (must exist in config) |
|
|
| `:models` | List configured models and active selection |
|
|
| `:history` | Print conversation turns for current session |
|
|
| `:exec <cmd>` | Force shell execution regardless of heuristic |
|
|
| `:ask <text>` | Force AI query regardless of heuristic |
|
|
| `:help` | Print meta command reference |
|
|
|
|
---
|
|
|
|
## 6. Broker Interface (Phase 0)
|
|
|
|
Phase 0 uses a single blocking HTTP POST. libcurl FFI is wired but SSE streaming is not yet consumed — the response is read to completion and returned as a string.
|
|
|
|
### Request shape
|
|
|
|
```
|
|
POST http://<endpoint>/v1/chat/completions
|
|
Content-Type: application/json
|
|
|
|
{
|
|
"model": "<model_id>",
|
|
"messages": [ <conversation history> ],
|
|
"stream": false,
|
|
"temperature": 0.2
|
|
}
|
|
```
|
|
|
|
### Conversation history format
|
|
|
|
Each turn is stored in `context.lua` as:
|
|
|
|
```lua
|
|
{ role = "system" | "user" | "assistant", content = "..." }
|
|
```
|
|
|
|
The system prompt is prepended on every request and is not stored as a history turn. Exec output injected into context uses role `"user"` with a prefix tag `[exec output]`.
|
|
|
|
### System prompt (Phase 0 default)
|
|
|
|
```
|
|
You are aish, an AI-augmented shell assistant. You help the user execute shell
|
|
commands, write and debug code, and re-engineer software. When suggesting shell
|
|
commands, output them on a line beginning with exactly "CMD: " so aish can
|
|
identify and optionally execute them. Be concise. Prefer concrete actions over
|
|
explanations unless asked.
|
|
```
|
|
|
|
The `CMD:` prefix convention is the extraction contract between the model and `executor.lua`. Phase 0 presents CMD lines with a confirmation prompt before execution.
|
|
|
|
---
|
|
|
|
## 7. Execution Model (Phase 0)
|
|
|
|
```lua
|
|
-- executor.lua Phase 0
|
|
local function exec(cmd)
|
|
local handle = io.popen(cmd .. " 2>&1", "r")
|
|
local output = handle:read("*a")
|
|
local ok, _, code = handle:close()
|
|
return output, code
|
|
end
|
|
```
|
|
|
|
Output is captured and:
|
|
1. Printed to the terminal
|
|
2. Injected into `context.lua` as a `[exec output]` user turn
|
|
|
|
`cd` is intercepted before `popen` and handled via `posix.chdir` (libc FFI) so the working directory change persists across calls — `popen` forks a subprocess and `cd` inside it would otherwise be discarded.
|
|
|
|
---
|
|
|
|
## 8. Context Management (Phase 0)
|
|
|
|
In-memory only. No disk I/O in Phase 0.
|
|
|
|
```lua
|
|
-- context.lua Phase 0 shape
|
|
Context = {
|
|
system_prompt = "...",
|
|
turns = {}, -- ordered list of {role, content}
|
|
max_turns = 40, -- sliding window; oldest non-system turns evicted
|
|
token_budget = 4096, -- soft limit; rough char/4 estimate
|
|
}
|
|
```
|
|
|
|
Token budget enforcement is approximate in Phase 0 (character count / 4). Accurate tokenization is a Phase 3 concern.
|
|
|
|
When `max_turns` is reached, the oldest two turns (one user + one assistant) are evicted silently. The user is notified with a status line: `[context] oldest 2 turns evicted`.
|
|
|
|
---
|
|
|
|
## 9. readline Integration (Phase 0)
|
|
|
|
Minimal FFI binding wrapping three calls:
|
|
|
|
```lua
|
|
ffi.cdef[[
|
|
char *readline(const char *prompt);
|
|
void add_history(const char *line);
|
|
void free(void *ptr);
|
|
]]
|
|
```
|
|
|
|
- `add_history` called for every non-empty input line
|
|
- Arrow keys and `Ctrl-R` reverse search work automatically via readline
|
|
- Custom key bindings (`Ctrl-N` for Norris mode etc.) deferred to Phase 1
|
|
|
|
Prompt format:
|
|
|
|
```
|
|
[aish:fast]>
|
|
```
|
|
|
|
Where `fast` is the active model name as defined in `config.lua`. In Norris mode (Phase 3) this becomes:
|
|
|
|
```
|
|
[aish:fast ⚡]>
|
|
```
|
|
|
|
---
|
|
|
|
## 10. Configuration (Phase 0)
|
|
|
|
`config.lua` is loaded at startup with `dofile`. It returns a plain Lua table.
|
|
|
|
```lua
|
|
-- config.lua example
|
|
return {
|
|
default_model = "fast",
|
|
|
|
models = {
|
|
fast = {
|
|
endpoint = "http://localhost:8080",
|
|
model = "qwen2.5-coder-1.5b-q8",
|
|
temperature = 0.2,
|
|
},
|
|
deep = {
|
|
endpoint = "http://localhost:8081",
|
|
model = "qwen2.5-coder-32b-q4",
|
|
temperature = 0.1,
|
|
},
|
|
cloud = {
|
|
endpoint = "https://api.openai.com",
|
|
model = "gpt-4o",
|
|
key_env = "OPENAI_API_KEY", -- read from environment at startup
|
|
temperature = 0.2,
|
|
},
|
|
},
|
|
|
|
shell = {
|
|
known_commands = {
|
|
"ls", "cat", "cd", "grep", "find", "cp", "mv", "rm",
|
|
"mkdir", "rmdir", "git", "make", "cmake", "gcc", "clang",
|
|
"python3", "luajit", "ssh", "scp", "curl", "wget",
|
|
},
|
|
capture_output = true, -- inject exec output into context
|
|
confirm_cmd = true, -- prompt before executing CMD: suggestions
|
|
},
|
|
|
|
context = {
|
|
max_turns = 40,
|
|
token_budget = 4096,
|
|
},
|
|
|
|
history = {
|
|
dir = os.getenv("HOME") .. "/.local/share/aish",
|
|
},
|
|
}
|
|
```
|
|
|
|
Config path resolution order:
|
|
1. `--config <path>` CLI argument
|
|
2. `$AISH_CONFIG` environment variable
|
|
3. `~/.config/aish/config.lua`
|
|
4. `./config.lua` (development fallback)
|
|
|
|
---
|
|
|
|
## 11. Planned Phase Sequence
|
|
|
|
| Phase | Key additions |
|
|
|---|---|
|
|
| **0** | Blocking REPL, `io.popen` exec, single model, in-memory context, meta commands |
|
|
| **1** | SSE streaming via libcurl FFI, PTY via `forkpty` FFI, session persistence (`sessions/*.jsonl`), readline custom bindings |
|
|
| **2** | MCP client (`mcp.lua`): tool-calling via OpenAI-compatible `tools` field on `/v1/chat/completions`; MCP JSON-RPC 2.0 over HTTP/SSE transport (target: lmcp); tool-result turns in context; per-server config + runtime `:mcp` meta commands; system prompt rewrite to declare the tools schema (replaces or augments §6's `CMD:` contract — see Q6); `safety.lua` extended to gate tool calls (see Q8) |
|
|
| **3** | Chuck Norris autonomous mode, destructive op heuristic (static + model), HALT/confirm gate, planning loop (now able to use MCP tools as well as `CMD:` lines) |
|
|
| **4** | `memory.jsonl` summarization, startup context injection from memory, `:history` management, pruning |
|
|
| **5** | Multi-model routing by task type, cloud fallback, context summarization via fast model on eviction |
|
|
| **6** | Tree-sitter syntax highlighting hooks, diff-aware code injection, project-level context (file tree summary) |
|
|
|
|
---
|
|
|
|
## 12. Out of Scope (All Phases)
|
|
|
|
- aish does not manage llama.cpp server lifecycle
|
|
- aish does not implement its own model inference
|
|
- aish is not a multiplexer (no tmux-style window management)
|
|
- aish does not provide a GUI or web interface
|
|
- aish does not sandbox executed commands at the OS level (no namespaces, no seccomp)
|
|
|
|
Security posture: aish trusts the local user. The destructive-op gate in Norris mode is a workflow safeguard, not a security boundary.
|
|
|
|
---
|
|
|
|
## 13. Open Questions (Tracked)
|
|
|
|
| # | Question | Impact | Target Phase |
|
|
|---|---|---|---|
|
|
| Q1 | Token counting: use model's `/tokenize` endpoint or keep char/4 heuristic? | Context eviction accuracy | Phase 3 |
|
|
| Q2 | Norris mode: should the planner emit a numbered step list and track progress, or re-plan after each step? | Loop structure in safety.lua | Phase 3 |
|
|
| Q3 | Summarization at session end: automatic on `:quit`, or explicit `:save`? | UX + history.lua API | Phase 4 |
|
|
| Q4 | Should `CMD:` extraction support multi-command blocks (here-doc style)? | executor.lua parser | Phase 1 |
|
|
| Q5 | Cloud model routing: explicit `:model cloud` only, or automatic fallback on local timeout? | router.lua policy | Phase 5 |
|
|
| Q6 | How do `CMD:` extraction (Phase 0) and MCP tool-calls (Phase 2) coexist — both, prefer tools, retire `CMD:`? Note: choosing "retire `CMD:`" requires a §3 invariant amendment in the same commit, not just a Phase 2 internal call. | broker.lua + executor.lua + system prompt + (§3 if retiring) | Phase 2 |
|
|
| Q7 | MCP server discovery: declared in `config.lua` only, runtime `:mcp connect <url>`, or both? | config.lua schema + repl.lua meta set | Phase 2 |
|
|
| Q8 | Tool-call authorization gate: per-call confirm (like `confirm_cmd`), per-tool policy in config, or trust-list by server? | safety.lua + mcp.lua + Norris-mode interaction | Phase 2 (informs Phase 3) |
|
|
| Q9 | MCP system-prompt augmentation locus: static block in `broker.lua`, assembled per-request from connected servers' tool schemas, or hybrid (static frame + dynamic tool list)? Per-request assembly costs tokens on every turn; static drifts from server reality; hybrid splits the cost. | broker.lua + mcp.lua + system prompt | Phase 2 |
|
|
| Q10 | Tool-call streaming vs the Phase 1 SSE substrate: does Phase 2 land tool calls on the still-blocking Phase 0 broker (and refit when SSE arrives in Phase 1), or require Phase 1 SSE to land first so tool-call deltas stream from day one? Phase ordering implication either way. | broker.lua + mcp.lua + phase ordering | Phase 2 (informs Phase 1 ordering) |
|
|
|
|
---
|
|
|
|
*End of Phase 0 Manifest — aish*
|