Files

T

marfrit 3bad07b2da docs/PHASE7: formulate — cost / usage observability

Phase 7 formulate manifest + PHASE0 §11 amendment to add the Phase 7
row (substrate amendment per CLAUDE.md §3, lands in the same commit).

Four pillars:

  1. Usage capture in broker.chat_stream — extract `usage` from the
     final SSE chunk (OpenAI streaming spec with `stream_options:
     {include_usage: true}`). Surface via new on_delta("usage",
     payload) kind. broker.chat returns (text, usage) — backward-
     compat: existing callers ignore the second value.

  2. Per-session accumulator on ctx — ctx.usage_totals[model][category]
     tables (categories: main / delegate / summarize / memory_summarize
     / probe / norris, tagged at the call site via opts.category).
     :reset preserves usage_totals (R8 parity with memory_items /
     project). Session JSONL gains an optional `usage` field on
     assistant turns for after-the-fact analysis.

  3. :cost meta surface — :cost (summary), :cost detail (per-model +
     per-category breakdown), :cost reset (zero the meter). Pure-Lua
     read of ctx.usage_totals; no broker calls.

  4. Optional warn thresholds — cfg.cost.warn_at_dollars /
     warn_at_tokens emit a one-shot status when crossed. Default off;
     useful with cloud presets configured.

Doc covers scope + done-when criteria, tech decisions table, module
changes, per-pillar deep dive with code sketches, UX surface, out of
scope, risks, 6 open questions to resolve in analyze.

Open at formulate:
  Q-C1 — provider-without-usage handling (local llama.cpp probably)
  Q-C2 — cross-session persistence (defer to phase 8)
  Q-C3 — categories closed-set vs free-form
  Q-C4 — does hossenfelder forward stream_options to all backends?
  Q-C5 — warn fires on the call that crosses, or the next one?
  Q-C6 — :reset clears cost_warn_fired too, or only :cost reset?

Scope confirmed via AskUserQuestion: cost/usage observability
(chosen over project-local config overlay and session search/tag).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-16 22:47:58 +00:00

16 KiB

Raw Blame History

aish — Phase 0 Manifest

Project: aish — AI-augmented conversational shell Document: Phase 0 Requirements, Architecture & Design Decisions Status: Pre-implementation Date: 2026-05-10

1. Project Vision

aish is a conversational shell that presents a unified REPL to the user, backed by one or more language models accessed through a llama.cpp broker. Its purpose is to support interactive command execution, code assistance, debugging, and re-engineering tasks from a single terminal interface. The model layer is transparent to the user but configurable and extensible.

aish is not a wrapper around an existing shell. It is a first-class interactive environment that composes shell execution, AI inference, context management, and session memory into a coherent workflow.

2. Scope of Phase 0

Phase 0 is the minimal working skeleton. It establishes the REPL loop, input dispatch, model communication, and basic meta-command handling. No streaming, no autonomous mode, no persistence, no PTY.

Phase 0 is done when:

The user can type natural language and receive a model response
The user can type a shell command and have it executed with output captured and displayed
The user can type :meta commands to control aish itself
The conversation history is maintained in memory for the session
The codebase structure matches the target layout so later phases slot in without refactoring

3. Technology Decisions

Decision	Choice	Rationale
Implementation language	LuaJIT 2.x	Compact, embeddable, FFI eliminates need for C shim layer
FFI strategy	LuaJIT FFI only (no C extension modules)	Direct `ffi.cdef` access to libc, libcurl, readline; no build toolchain required
HTTP client	libcurl via FFI	Supports SSE streaming (Phase 1+); available on all target platforms
Terminal input	GNU readline via FFI	Full readline semantics, custom key bindings, history, without dependency on a Lua readline package
Model backend	llama.cpp OpenAI-compatible server (`/v1/chat/completions`)	Externally managed; aish does not own the process lifecycle
Shell execution	`io.popen` in Phase 0; `forkpty` via libc FFI from Phase 1	`popen` sufficient for non-interactive commands; PTY required for vim, htop, etc.
Session persistence	Deferred to Phase 1	Phase 0 holds history in memory only
Config format	Lua table (plain `.lua` file sourced at startup)	No parser dependency; native types; easily extended
JSON encode/decode	dkjson 2.8 vendored under `vendor/dkjson.lua`	Pure Lua (preserves §3 "no compiled extensions" invariant); single-file vendor avoids `luarocks`; sourced from Debian's `lua-dkjson` package, originally from dkolf.de

FFI loader fallback. ffi.load("readline") and ffi.load("curl") look for the unversioned lib<name>.so symlink, which is only installed by the -dev package. Phase 0 loaders try the unversioned name first then fall back to versioned sonames (readline.so.8, readline.so.7, curl.so.4, curl-gnutls.so.4) so a runtime-only host (Debian/ALARM without lib<name>-dev) just works.

4. Target Directory Layout

aish/
├── main.lua              # Entry point: arg parsing, config load, REPL start
├── repl.lua              # Readline loop, input dispatch, prompt rendering
├── broker.lua            # llama.cpp HTTP client; Phase 0: blocking POST
├── mcp.lua               # MCP JSON-RPC 2.0 client (Phase 2; added 2026-05-12)
├── router.lua            # Task classifier: shell / AI / meta
├── executor.lua          # Command execution; Phase 0: io.popen
├── context.lua           # In-memory conversation history, token budget
├── history.lua           # Persistent session log + memory.jsonl (Phase 1)
├── safety.lua            # Destructive op heuristic, Chuck Norris gate (Phase 3)
├── renderer.lua          # Output formatting, ANSI sequences
├── config.lua            # Model registry, routing rules, user preferences
└── ffi/
    ├── curl.lua          # libcurl easy interface binding
    ├── readline.lua      # GNU readline binding
    ├── pty.lua           # forkpty, openpty, waitpid (Phase 1)
    └── libc.lua          # Shared: errno, signal, write, read, misc

All modules are required explicitly from main.lua. No module autoloading. File names are stable across phases — later phases fill in bodies, not rename files. Adding new files is permitted and additive (e.g. mcp.lua was inserted at Phase 2 per docs/PHASE2.md §9); the rename prohibition is what keeps cross-phase wiring stable.

5. Input Dispatch Model

Every line of user input is classified by router.lua before any action is taken.

input
  ├── :command [args]    →  meta handler (repl.lua)
  ├── $ prefix or       →  shell executor (executor.lua)
  │   heuristic match
  └── everything else   →  AI broker (broker.lua)

5.1 Shell heuristic

An input line is treated as a shell command if it:

Begins with $ (explicit override, prefix stripped before exec)
Matches a known command prefix from a configurable allowlist (e.g. ls, cd, git, make, grep, cat, find, cp, mv, mkdir)
Contains a bare path-like token as first word (./foo, /usr/bin/bar)

Everything else goes to the model. The user can always force routing with $ or with :exec.

5.2 Meta commands (Phase 0 set)

Command	Action
`:quit` / `:q`	Exit aish
`:clear`	Clear screen and reset display; keep history
`:reset`	Clear in-memory conversation history
`:model <name>`	Switch active model (must exist in config)
`:models`	List configured models and active selection
`:history`	Print conversation turns for current session
`:exec <cmd>`	Force shell execution regardless of heuristic
`:ask <text>`	Force AI query regardless of heuristic
`:help`	Print meta command reference

6. Broker Interface (Phase 0)

Phase 0 uses a single blocking HTTP POST. libcurl FFI is wired but SSE streaming is not yet consumed — the response is read to completion and returned as a string.

Request shape

POST http://<endpoint>/v1/chat/completions
Content-Type: application/json

{
  "model": "<model_id>",
  "messages": [ <conversation history> ],
  "stream": false,
  "temperature": 0.2
}

Conversation history format

Each turn is stored in context.lua as:

{ role = "system" | "user" | "assistant", content = "..." }

The system prompt is prepended on every request and is not stored as a history turn.

Exec output injection. Captured shell-exec output is not appended as its own user turn — that produces user/user back-to-back, which strict chat templates (e.g. mistral-nemo-instruct's Jinja) reject with roles must alternate user/assistant/.... Instead, exec output is buffered on the context and prepended to the next user turn with a [exec output] tag. Multiple shell calls between AI turns concatenate. :reset clears the buffer. The user-visible behavior is unchanged; only the role alternation seen by the broker differs.

System prompt (Phase 0 default)

You are aish, an AI-augmented shell assistant. You help the user execute shell
commands, write and debug code, and re-engineer software. When suggesting shell
commands, output them on a line beginning with exactly "CMD: " so aish can
identify and optionally execute them. Be concise. Prefer concrete actions over
explanations unless asked.

The CMD: prefix convention is the extraction contract between the model and executor.lua. Phase 0 presents CMD lines with a confirmation prompt before execution.

7. Execution Model (Phase 0)

-- executor.lua Phase 0 (illustrative — see note below)
local function exec(cmd)
    local handle = io.popen(cmd .. " 2>&1", "r")
    local output = handle:read("*a")
    local ok, _, code = handle:close()
    return output, code
end

Superseded by Phase 1. The §7 sketch was never quite accurate on LuaJIT 2.1 (which follows the Lua 5.1 ABI for io.popen():close() and returns only true — no exit status). The Phase 0 implementation worked around this with a sentinel-echo wrapper ((cmd) 2>&1; echo __AISH_EXIT_<tag>__$?) and parsed the status back out of stdout. Phase 1 retired the workaround entirely: executor.lua now spawns the child via forkpty and recovers exit status via waitpid(WEXITSTATUS). See docs/PHASE1.md §5 for the current PTY model.

Output is captured and:

Printed to the terminal
Injected into context.lua as a [exec output] user turn

cd is intercepted before popen and handled via libc.chdir (FFI) so the working directory change persists across calls — popen forks a subprocess and cd inside it would otherwise be discarded.

8. Context Management (Phase 0)

In-memory only. No disk I/O in Phase 0.

-- context.lua Phase 0 shape
Context = {
    system_prompt = "...",
    turns = {},           -- ordered list of {role, content}
    max_turns = 40,       -- sliding window; oldest non-system turns evicted
    token_budget = 4096,  -- soft limit; rough char/4 estimate
}

Token budget enforcement is approximate in Phase 0 (character count / 4). Accurate tokenization is a Phase 3 concern.

When max_turns is reached, the oldest two turns (one user + one assistant) are evicted silently. The user is notified with a status line: [context] oldest 2 turns evicted.

9. readline Integration (Phase 0)

Minimal FFI binding wrapping three calls:

ffi.cdef[[
    char *readline(const char *prompt);
    void  add_history(const char *line);
    void  free(void *ptr);
]]

add_history called for every non-empty input line
Arrow keys and Ctrl-R reverse search work automatically via readline
Custom key bindings (Ctrl-N for Norris mode etc.) deferred to Phase 1

Prompt format:

[aish:fast]>

Where fast is the active model name as defined in config.lua. In Norris mode (Phase 3) this becomes:

[aish:fast ⚡]>

10. Configuration (Phase 0)

config.lua is loaded at startup with dofile. It returns a plain Lua table.

-- config.lua example
return {
    default_model = "fast",

    models = {
        fast = {
            endpoint = "http://localhost:8080",
            model    = "qwen2.5-coder-1.5b-q8",
            temperature = 0.2,
        },
        deep = {
            endpoint = "http://localhost:8081",
            model    = "qwen2.5-coder-32b-q4",
            temperature = 0.1,
        },
        cloud = {
            endpoint  = "https://api.openai.com",
            model     = "gpt-4o",
            key_env   = "OPENAI_API_KEY",   -- read from environment at startup
            temperature = 0.2,
        },
    },

    shell = {
        known_commands = {
            "ls", "cat", "cd", "grep", "find", "cp", "mv", "rm",
            "mkdir", "rmdir", "git", "make", "cmake", "gcc", "clang",
            "python3", "luajit", "ssh", "scp", "curl", "wget",
        },
        capture_output = true,    -- inject exec output into context
        confirm_cmd    = true,    -- prompt before executing CMD: suggestions
    },

    context = {
        max_turns   = 40,
        token_budget = 4096,
    },

    history = {
        dir = os.getenv("HOME") .. "/.local/share/aish",
    },
}

Config path resolution order:

--config <path> CLI argument (explicit; failure if not openable, no fallback)
$AISH_CONFIG environment variable
~/.config/aish/config.lua
./config.lua (development fallback)

Cwd-relative module resolution. Phase 0 prepends ./?.lua;./vendor/?.lua to package.path, so luajit main.lua must be invoked with the repo root as cwd. Cwd-independent resolution (relative to the script's own directory) lands later — likely Phase 1 alongside the install path work, or whenever the first user reports trying luajit ~/aish/main.lua from somewhere else.

11. Planned Phase Sequence

Phase	Key additions
0	Blocking REPL, `io.popen` exec, single model, in-memory context, meta commands
1	SSE streaming via libcurl FFI, PTY via `forkpty` FFI, session persistence (`sessions/*.jsonl`), readline custom bindings
2	MCP client (`mcp.lua`): tool-calling via OpenAI-compatible `tools` field on `/v1/chat/completions`; MCP JSON-RPC 2.0 over HTTP/SSE transport (target: lmcp); tool-result turns in context; per-server config + runtime `:mcp` meta commands; system prompt rewrite to declare the tools schema (replaces or augments §6's `CMD:` contract — see Q6); `safety.lua` extended to gate tool calls (see Q8)
3	Chuck Norris autonomous mode, destructive op heuristic (static + model), HALT/confirm gate, planning loop (now able to use MCP tools as well as `CMD:` lines)
4	`memory.jsonl` summarization, startup context injection from memory, `:history` management, pruning
5	Multi-model routing by task type, cloud fallback, context summarization via fast model on eviction
6	Tree-sitter syntax highlighting hooks, diff-aware code injection, project-level context (file tree summary)
7	Cost / usage observability: broker captures `usage` + `cost`; per-session accumulator on ctx; `:cost` reporter; optional warn thresholds

12. Out of Scope (All Phases)

aish does not manage llama.cpp server lifecycle
aish does not implement its own model inference
aish is not a multiplexer (no tmux-style window management)
aish does not provide a GUI or web interface
aish does not sandbox executed commands at the OS level (no namespaces, no seccomp)

Security posture: aish trusts the local user. The destructive-op gate in Norris mode is a workflow safeguard, not a security boundary.

13. Open Questions (Tracked)

#	Question	Impact	Target Phase
Q1	Token counting: use model's `/tokenize` endpoint or keep char/4 heuristic?	Context eviction accuracy	Phase 3
Q2	Norris mode: should the planner emit a numbered step list and track progress, or re-plan after each step?	Loop structure in safety.lua	Phase 3
Q3	Summarization at session end: automatic on `:quit`, or explicit `:save`?	UX + history.lua API	Phase 4
Q4	Should `CMD:` extraction support multi-command blocks (here-doc style)?	executor.lua parser	Phase 1
Q5	Cloud model routing: explicit `:model cloud` only, or automatic fallback on local timeout?	router.lua policy	Phase 5
Q6	How do `CMD:` extraction (Phase 0) and MCP tool-calls (Phase 2) coexist — both, prefer tools, retire `CMD:`? Note: choosing "retire `CMD:`" requires a §3 invariant amendment in the same commit, not just a Phase 2 internal call.	broker.lua + executor.lua + system prompt + (§3 if retiring)	Phase 2
Q7	MCP server discovery: declared in `config.lua` only, runtime `:mcp connect <url>`, or both?	config.lua schema + repl.lua meta set	Phase 2
Q8	Tool-call authorization gate: per-call confirm (like `confirm_cmd`), per-tool policy in config, or trust-list by server?	safety.lua + mcp.lua + Norris-mode interaction	Phase 2 (informs Phase 3)
Q9	MCP system-prompt augmentation locus: static block in `broker.lua`, assembled per-request from connected servers' tool schemas, or hybrid (static frame + dynamic tool list)? Per-request assembly costs tokens on every turn; static drifts from server reality; hybrid splits the cost.	broker.lua + mcp.lua + system prompt	Phase 2
Q10	Tool-call streaming vs the Phase 1 SSE substrate: does Phase 2 land tool calls on the still-blocking Phase 0 broker (and refit when SSE arrives in Phase 1), or require Phase 1 SSE to land first so tool-call deltas stream from day one? Phase ordering implication either way.	broker.lua + mcp.lua + phase ordering	Phase 2 (informs Phase 1 ordering)

End of Phase 0 Manifest — aish

16 KiB Raw Blame History