Files
aish/docs/PHASE0.md
T
marfrit 3bad07b2da docs/PHASE7: formulate — cost / usage observability
Phase 7 formulate manifest + PHASE0 §11 amendment to add the Phase 7
row (substrate amendment per CLAUDE.md §3, lands in the same commit).

Four pillars:

  1. Usage capture in broker.chat_stream — extract `usage` from the
     final SSE chunk (OpenAI streaming spec with `stream_options:
     {include_usage: true}`). Surface via new on_delta("usage",
     payload) kind. broker.chat returns (text, usage) — backward-
     compat: existing callers ignore the second value.

  2. Per-session accumulator on ctx — ctx.usage_totals[model][category]
     tables (categories: main / delegate / summarize / memory_summarize
     / probe / norris, tagged at the call site via opts.category).
     :reset preserves usage_totals (R8 parity with memory_items /
     project). Session JSONL gains an optional `usage` field on
     assistant turns for after-the-fact analysis.

  3. :cost meta surface — :cost (summary), :cost detail (per-model +
     per-category breakdown), :cost reset (zero the meter). Pure-Lua
     read of ctx.usage_totals; no broker calls.

  4. Optional warn thresholds — cfg.cost.warn_at_dollars /
     warn_at_tokens emit a one-shot status when crossed. Default off;
     useful with cloud presets configured.

Doc covers scope + done-when criteria, tech decisions table, module
changes, per-pillar deep dive with code sketches, UX surface, out of
scope, risks, 6 open questions to resolve in analyze.

Open at formulate:
  Q-C1 — provider-without-usage handling (local llama.cpp probably)
  Q-C2 — cross-session persistence (defer to phase 8)
  Q-C3 — categories closed-set vs free-form
  Q-C4 — does hossenfelder forward stream_options to all backends?
  Q-C5 — warn fires on the call that crosses, or the next one?
  Q-C6 — :reset clears cost_warn_fired too, or only :cost reset?

Scope confirmed via AskUserQuestion: cost/usage observability
(chosen over project-local config overlay and session search/tag).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:47:58 +00:00

16 KiB

aish — Phase 0 Manifest

Project: aish — AI-augmented conversational shell Document: Phase 0 Requirements, Architecture & Design Decisions Status: Pre-implementation Date: 2026-05-10


1. Project Vision

aish is a conversational shell that presents a unified REPL to the user, backed by one or more language models accessed through a llama.cpp broker. Its purpose is to support interactive command execution, code assistance, debugging, and re-engineering tasks from a single terminal interface. The model layer is transparent to the user but configurable and extensible.

aish is not a wrapper around an existing shell. It is a first-class interactive environment that composes shell execution, AI inference, context management, and session memory into a coherent workflow.


2. Scope of Phase 0

Phase 0 is the minimal working skeleton. It establishes the REPL loop, input dispatch, model communication, and basic meta-command handling. No streaming, no autonomous mode, no persistence, no PTY.

Phase 0 is done when:

  • The user can type natural language and receive a model response
  • The user can type a shell command and have it executed with output captured and displayed
  • The user can type :meta commands to control aish itself
  • The conversation history is maintained in memory for the session
  • The codebase structure matches the target layout so later phases slot in without refactoring

3. Technology Decisions

Decision Choice Rationale
Implementation language LuaJIT 2.x Compact, embeddable, FFI eliminates need for C shim layer
FFI strategy LuaJIT FFI only (no C extension modules) Direct ffi.cdef access to libc, libcurl, readline; no build toolchain required
HTTP client libcurl via FFI Supports SSE streaming (Phase 1+); available on all target platforms
Terminal input GNU readline via FFI Full readline semantics, custom key bindings, history, without dependency on a Lua readline package
Model backend llama.cpp OpenAI-compatible server (/v1/chat/completions) Externally managed; aish does not own the process lifecycle
Shell execution io.popen in Phase 0; forkpty via libc FFI from Phase 1 popen sufficient for non-interactive commands; PTY required for vim, htop, etc.
Session persistence Deferred to Phase 1 Phase 0 holds history in memory only
Config format Lua table (plain .lua file sourced at startup) No parser dependency; native types; easily extended
JSON encode/decode dkjson 2.8 vendored under vendor/dkjson.lua Pure Lua (preserves §3 "no compiled extensions" invariant); single-file vendor avoids luarocks; sourced from Debian's lua-dkjson package, originally from dkolf.de

FFI loader fallback. ffi.load("readline") and ffi.load("curl") look for the unversioned lib<name>.so symlink, which is only installed by the -dev package. Phase 0 loaders try the unversioned name first then fall back to versioned sonames (readline.so.8, readline.so.7, curl.so.4, curl-gnutls.so.4) so a runtime-only host (Debian/ALARM without lib<name>-dev) just works.


4. Target Directory Layout

aish/
├── main.lua              # Entry point: arg parsing, config load, REPL start
├── repl.lua              # Readline loop, input dispatch, prompt rendering
├── broker.lua            # llama.cpp HTTP client; Phase 0: blocking POST
├── mcp.lua               # MCP JSON-RPC 2.0 client (Phase 2; added 2026-05-12)
├── router.lua            # Task classifier: shell / AI / meta
├── executor.lua          # Command execution; Phase 0: io.popen
├── context.lua           # In-memory conversation history, token budget
├── history.lua           # Persistent session log + memory.jsonl (Phase 1)
├── safety.lua            # Destructive op heuristic, Chuck Norris gate (Phase 3)
├── renderer.lua          # Output formatting, ANSI sequences
├── config.lua            # Model registry, routing rules, user preferences
└── ffi/
    ├── curl.lua          # libcurl easy interface binding
    ├── readline.lua      # GNU readline binding
    ├── pty.lua           # forkpty, openpty, waitpid (Phase 1)
    └── libc.lua          # Shared: errno, signal, write, read, misc

All modules are required explicitly from main.lua. No module autoloading. File names are stable across phases — later phases fill in bodies, not rename files. Adding new files is permitted and additive (e.g. mcp.lua was inserted at Phase 2 per docs/PHASE2.md §9); the rename prohibition is what keeps cross-phase wiring stable.


5. Input Dispatch Model

Every line of user input is classified by router.lua before any action is taken.

input
  ├── :command [args]    →  meta handler (repl.lua)
  ├── $ prefix or       →  shell executor (executor.lua)
  │   heuristic match
  └── everything else   →  AI broker (broker.lua)

5.1 Shell heuristic

An input line is treated as a shell command if it:

  • Begins with $ (explicit override, prefix stripped before exec)
  • Matches a known command prefix from a configurable allowlist (e.g. ls, cd, git, make, grep, cat, find, cp, mv, mkdir)
  • Contains a bare path-like token as first word (./foo, /usr/bin/bar)

Everything else goes to the model. The user can always force routing with $ or with :exec.

5.2 Meta commands (Phase 0 set)

Command Action
:quit / :q Exit aish
:clear Clear screen and reset display; keep history
:reset Clear in-memory conversation history
:model <name> Switch active model (must exist in config)
:models List configured models and active selection
:history Print conversation turns for current session
:exec <cmd> Force shell execution regardless of heuristic
:ask <text> Force AI query regardless of heuristic
:help Print meta command reference

6. Broker Interface (Phase 0)

Phase 0 uses a single blocking HTTP POST. libcurl FFI is wired but SSE streaming is not yet consumed — the response is read to completion and returned as a string.

Request shape

POST http://<endpoint>/v1/chat/completions
Content-Type: application/json

{
  "model": "<model_id>",
  "messages": [ <conversation history> ],
  "stream": false,
  "temperature": 0.2
}

Conversation history format

Each turn is stored in context.lua as:

{ role = "system" | "user" | "assistant", content = "..." }

The system prompt is prepended on every request and is not stored as a history turn.

Exec output injection. Captured shell-exec output is not appended as its own user turn — that produces user/user back-to-back, which strict chat templates (e.g. mistral-nemo-instruct's Jinja) reject with roles must alternate user/assistant/.... Instead, exec output is buffered on the context and prepended to the next user turn with a [exec output] tag. Multiple shell calls between AI turns concatenate. :reset clears the buffer. The user-visible behavior is unchanged; only the role alternation seen by the broker differs.

System prompt (Phase 0 default)

You are aish, an AI-augmented shell assistant. You help the user execute shell
commands, write and debug code, and re-engineer software. When suggesting shell
commands, output them on a line beginning with exactly "CMD: " so aish can
identify and optionally execute them. Be concise. Prefer concrete actions over
explanations unless asked.

The CMD: prefix convention is the extraction contract between the model and executor.lua. Phase 0 presents CMD lines with a confirmation prompt before execution.


7. Execution Model (Phase 0)

-- executor.lua Phase 0 (illustrative — see note below)
local function exec(cmd)
    local handle = io.popen(cmd .. " 2>&1", "r")
    local output = handle:read("*a")
    local ok, _, code = handle:close()
    return output, code
end

Superseded by Phase 1. The §7 sketch was never quite accurate on LuaJIT 2.1 (which follows the Lua 5.1 ABI for io.popen():close() and returns only true — no exit status). The Phase 0 implementation worked around this with a sentinel-echo wrapper ((cmd) 2>&1; echo __AISH_EXIT_<tag>__$?) and parsed the status back out of stdout. Phase 1 retired the workaround entirely: executor.lua now spawns the child via forkpty and recovers exit status via waitpid(WEXITSTATUS). See docs/PHASE1.md §5 for the current PTY model.

Output is captured and:

  1. Printed to the terminal
  2. Injected into context.lua as a [exec output] user turn

cd is intercepted before popen and handled via libc.chdir (FFI) so the working directory change persists across calls — popen forks a subprocess and cd inside it would otherwise be discarded.


8. Context Management (Phase 0)

In-memory only. No disk I/O in Phase 0.

-- context.lua Phase 0 shape
Context = {
    system_prompt = "...",
    turns = {},           -- ordered list of {role, content}
    max_turns = 40,       -- sliding window; oldest non-system turns evicted
    token_budget = 4096,  -- soft limit; rough char/4 estimate
}

Token budget enforcement is approximate in Phase 0 (character count / 4). Accurate tokenization is a Phase 3 concern.

When max_turns is reached, the oldest two turns (one user + one assistant) are evicted silently. The user is notified with a status line: [context] oldest 2 turns evicted.


9. readline Integration (Phase 0)

Minimal FFI binding wrapping three calls:

ffi.cdef[[
    char *readline(const char *prompt);
    void  add_history(const char *line);
    void  free(void *ptr);
]]
  • add_history called for every non-empty input line
  • Arrow keys and Ctrl-R reverse search work automatically via readline
  • Custom key bindings (Ctrl-N for Norris mode etc.) deferred to Phase 1

Prompt format:

[aish:fast]>

Where fast is the active model name as defined in config.lua. In Norris mode (Phase 3) this becomes:

[aish:fast ⚡]>

10. Configuration (Phase 0)

config.lua is loaded at startup with dofile. It returns a plain Lua table.

-- config.lua example
return {
    default_model = "fast",

    models = {
        fast = {
            endpoint = "http://localhost:8080",
            model    = "qwen2.5-coder-1.5b-q8",
            temperature = 0.2,
        },
        deep = {
            endpoint = "http://localhost:8081",
            model    = "qwen2.5-coder-32b-q4",
            temperature = 0.1,
        },
        cloud = {
            endpoint  = "https://api.openai.com",
            model     = "gpt-4o",
            key_env   = "OPENAI_API_KEY",   -- read from environment at startup
            temperature = 0.2,
        },
    },

    shell = {
        known_commands = {
            "ls", "cat", "cd", "grep", "find", "cp", "mv", "rm",
            "mkdir", "rmdir", "git", "make", "cmake", "gcc", "clang",
            "python3", "luajit", "ssh", "scp", "curl", "wget",
        },
        capture_output = true,    -- inject exec output into context
        confirm_cmd    = true,    -- prompt before executing CMD: suggestions
    },

    context = {
        max_turns   = 40,
        token_budget = 4096,
    },

    history = {
        dir = os.getenv("HOME") .. "/.local/share/aish",
    },
}

Config path resolution order:

  1. --config <path> CLI argument (explicit; failure if not openable, no fallback)
  2. $AISH_CONFIG environment variable
  3. ~/.config/aish/config.lua
  4. ./config.lua (development fallback)

Cwd-relative module resolution. Phase 0 prepends ./?.lua;./vendor/?.lua to package.path, so luajit main.lua must be invoked with the repo root as cwd. Cwd-independent resolution (relative to the script's own directory) lands later — likely Phase 1 alongside the install path work, or whenever the first user reports trying luajit ~/aish/main.lua from somewhere else.


11. Planned Phase Sequence

Phase Key additions
0 Blocking REPL, io.popen exec, single model, in-memory context, meta commands
1 SSE streaming via libcurl FFI, PTY via forkpty FFI, session persistence (sessions/*.jsonl), readline custom bindings
2 MCP client (mcp.lua): tool-calling via OpenAI-compatible tools field on /v1/chat/completions; MCP JSON-RPC 2.0 over HTTP/SSE transport (target: lmcp); tool-result turns in context; per-server config + runtime :mcp meta commands; system prompt rewrite to declare the tools schema (replaces or augments §6's CMD: contract — see Q6); safety.lua extended to gate tool calls (see Q8)
3 Chuck Norris autonomous mode, destructive op heuristic (static + model), HALT/confirm gate, planning loop (now able to use MCP tools as well as CMD: lines)
4 memory.jsonl summarization, startup context injection from memory, :history management, pruning
5 Multi-model routing by task type, cloud fallback, context summarization via fast model on eviction
6 Tree-sitter syntax highlighting hooks, diff-aware code injection, project-level context (file tree summary)
7 Cost / usage observability: broker captures usage + cost; per-session accumulator on ctx; :cost reporter; optional warn thresholds

12. Out of Scope (All Phases)

  • aish does not manage llama.cpp server lifecycle
  • aish does not implement its own model inference
  • aish is not a multiplexer (no tmux-style window management)
  • aish does not provide a GUI or web interface
  • aish does not sandbox executed commands at the OS level (no namespaces, no seccomp)

Security posture: aish trusts the local user. The destructive-op gate in Norris mode is a workflow safeguard, not a security boundary.


13. Open Questions (Tracked)

# Question Impact Target Phase
Q1 Token counting: use model's /tokenize endpoint or keep char/4 heuristic? Context eviction accuracy Phase 3
Q2 Norris mode: should the planner emit a numbered step list and track progress, or re-plan after each step? Loop structure in safety.lua Phase 3
Q3 Summarization at session end: automatic on :quit, or explicit :save? UX + history.lua API Phase 4
Q4 Should CMD: extraction support multi-command blocks (here-doc style)? executor.lua parser Phase 1
Q5 Cloud model routing: explicit :model cloud only, or automatic fallback on local timeout? router.lua policy Phase 5
Q6 How do CMD: extraction (Phase 0) and MCP tool-calls (Phase 2) coexist — both, prefer tools, retire CMD:? Note: choosing "retire CMD:" requires a §3 invariant amendment in the same commit, not just a Phase 2 internal call. broker.lua + executor.lua + system prompt + (§3 if retiring) Phase 2
Q7 MCP server discovery: declared in config.lua only, runtime :mcp connect <url>, or both? config.lua schema + repl.lua meta set Phase 2
Q8 Tool-call authorization gate: per-call confirm (like confirm_cmd), per-tool policy in config, or trust-list by server? safety.lua + mcp.lua + Norris-mode interaction Phase 2 (informs Phase 3)
Q9 MCP system-prompt augmentation locus: static block in broker.lua, assembled per-request from connected servers' tool schemas, or hybrid (static frame + dynamic tool list)? Per-request assembly costs tokens on every turn; static drifts from server reality; hybrid splits the cost. broker.lua + mcp.lua + system prompt Phase 2
Q10 Tool-call streaming vs the Phase 1 SSE substrate: does Phase 2 land tool calls on the still-blocking Phase 0 broker (and refit when SSE arrives in Phase 1), or require Phase 1 SSE to land first so tool-call deltas stream from day one? Phase ordering implication either way. broker.lua + mcp.lua + phase ordering Phase 2 (informs Phase 1 ordering)

End of Phase 0 Manifest — aish