# aish — Phase 6 Manifest **Project:** aish — AI-augmented conversational shell **Document:** Phase 6 Requirements, Architecture & Design Decisions **Status:** Analyze (formulate complete; current tree at `f596743` probed) **Date:** 2026-05-16 **Analyze findings (2026-05-16):** A1. **renderer.lua surface clean** — `assistant_delta(chunk)` already concatenates into a `stream_buf` then `emit()`s the chunk; `assistant_flush()` finalizes with a trailing newline if missing. The fence-aware highlight filter slots in between chunk receipt and `emit` without restructuring; no callers besides `repl.lua` touch `stream_buf` so the filter state can live alongside it. A2. **executor surface clean** — `executor.exec(cmd)` already forkpty-spawns, captures + live-streams output, returns `(out, code)`. Phase 6's `:diff` and `_scan_project_tree` reuse this path verbatim; no new IO model. `git`-rooted commands inherit cwd from the parent (which `libc.chdir` already mutates), so a `:diff` after `cd` reads the right repo. A3. **context composition order locked** — current `to_messages` builds `sys_content = base + [background] + [earlier summary] + NORRIS suffix`. Phase 6 inserts `[project]` between `[background]` and `[earlier summary]`. Same Norris-suppression guard already in place (`if not self.norris_active`). A4. **Q-H1 RESOLVED: tmpfile roundtrip** for `tree-sitter highlight` write+read. Avoids ARGMAX risk on large code blocks (vs `printf BODY | tree-sitter ...`) and shell-escape complexity. Two file handles, deterministic cleanup via `os.remove`. Sketch: ```lua local tmp = os.tmpname() local w = io.popen(("tree-sitter highlight --lang %s > %s") :format(lang, tmp), "w") w:write(body); local _, _, code = w:close() local f = io.open(tmp, "rb"); local out = f:read("*a"); f:close() os.remove(tmp) if code ~= 0 then return body end -- pass-through on failure return out ``` A5. **Q-D1 RESOLVED: no confirm gate on `:diff`.** `git diff` is read-only; matches `:history`, `:sessions`, `:safety check` — none of which gate. Permission DSL (#9) only applies to AI-suggested `CMD:` lines, not user-issued metas. A6. **Q-D2 RESOLVED: tiered resolution for `@`.** The mention parser tries `` as a file path first; if it doesn't resolve AND the token contains `..`, retry as a diff range. This keeps `@../sibling.txt` (path) working AND allows `@origin/main..feature` (ref range — resolves via second attempt since no such file exists). No grammar prefix needed. A7. **Q-H2 RESOLVED: highlighting is assistant-output only in v1.** `expand_mentions` content lands in the user-turn payload — visible on the terminal via readline echo, not via `assistant_delta`. Filing "highlight @path-expanded code in echo" as v2 polish. Reason: intercepting readline echo for ANSI injection is non-trivial and orthogonal to the stream filter. A8. **Q-T1 RESOLVED: project tree captured at scan time, not auto- refreshed on cd.** `cd /other-project` leaves the existing `ctx.project` stale; `:tree refresh` is the manual verb to update. Auto-refresh on cd intercept is a v2 polish (the cd interceptor in `executor.maybe_chdir` is a clean hook for it). A9. **Q-T2 RESOLVED: rely on `.gitignore` via `git ls-files`** in repos; fall back to `find` with simple excludes outside. Custom include/exclude glob lists deferred to v2. Reason: most users live inside git repos; `.gitignore` already encodes their notion of "noise". Out-of-repo users get the simple fallback and can scope via `:tree `. A10. **`expand_mentions` punct-peel does NOT strip `/`** — so `@HEAD~1..HEAD,` peels the `,` and the underlying token `HEAD~1..HEAD` has no slash; the path-then-diff retry from A6 catches it. No new peel logic needed. A11. **Auto-injection ordering for `[project]`** — if both `cfg.memory. inject_max_chars` and `cfg.project.auto_tree` fire at startup, the order is: memory load → tree scan → first ask_ai. The composition in `to_messages` places `[background]` (memory) before `[project]` so the model reads memory facts before file tree. Documented in §3. A12. **Norris interaction** — `[project]` block follows the established [background]/[earlier summary] suppression rule under `ctx.norris_active`. Planner stays on its goal anchor; the tree can be re-introduced via the goal text if needed. Matches R-C1/R-C4. PHASE0 is the locked substrate; PHASE1-5 are layered on top. This manifest specifies what Phase 6 adds — **tree-sitter syntax highlighting hooks**, **diff-aware code injection**, and **project-level context (file-tree summary)**. --- ## 1. Scope of Phase 6 Three pillars per PHASE0.md §11 row 6: 1. **Tree-sitter syntax highlighting hooks** — when an external `tree-sitter` CLI is detected at startup, assistant code-fence content is filtered through it for ANSI-colorized display. Plain prose streams unchanged. When the CLI is absent, the filter is the identity function (zero overhead, zero hard dependency). Toggleable at runtime with `:highlight on|off`. Default off until the user opts in (don't surprise existing users with a display change). 2. **Diff-aware code injection** — surface git diffs as first-class context. Two entry points: - Meta verb: `:diff [args]` runs `git diff ` from cwd, appends output to context as exec-output. `:diff staged`, `:diff HEAD~3`, `:diff main..feature` all delegate to git's argument grammar. - @-mention extension: `@HEAD..feature` (a ref-range expression anywhere a `@path` would go) expands inline as a fenced `diff` block, mirroring how `@README.md` already works. 3. **Project-level context (file-tree summary)** — `git ls-files`-based tree summary of the cwd, injected as a `[project]` block in the system prompt. Two entry points: - Meta verb: `:tree [depth]` injects on demand; `:tree refresh` re-scans. - Auto-inject at startup when `cfg.project.auto_tree = true` — gated like memory injection so existing configs don't change behavior. **Phase 6 is done when:** - With `tree-sitter` CLI installed and `:highlight on`, the assistant reply ```py\nprint("hi")\n``` shows up with ANSI colors. Without the CLI, `:highlight on` is a no-op + emits a status warning. - `:diff` from a dirty git repo shows the working-tree diff in the exec-output frame; the model sees it on the next ask_ai turn. - `@HEAD~1..HEAD` in a prompt expands inline to a fenced diff block. - `:tree` injects a `[project] :` block visible in `ctx:to_messages()` (via the system prompt assembly). - With `cfg.project.auto_tree = true`, the project block appears on every broker call (subject to `max_chars` cap). - Existing configs without `cfg.project` and with `:highlight off` (default) behave exactly like Phase 5 (Phase 5 regression coverage). --- ## 2. Technology Decisions (delta from Phase 5) | Decision | Choice | Rationale | |---|---|---| | Highlight backend | External `tree-sitter` CLI (`tree-sitter highlight --lang X`) | Honors PHASE0 §3: no compiled extensions, no luarocks. Detected once at startup; absence → identity filter. Opt-in via `:highlight on` so install-state changes don't break users. | | Highlight buffering | Accumulate inside fenced code blocks, emit on closing fence; pass-through outside fences | Streaming UX preserved for prose. Code blocks get colorized atomically, accepting a per-block latency (~ block streaming time). Per-chunk highlighting would split a token across `tree-sitter` invocations and corrupt the output. | | Lang detection | First-line fence info-string (` ```py`, ` ```python`, ` ```lua`) → normalized via small map (py→python, js→javascript, etc.) | The lang tag mirrors the one we already emit in `expand_mentions` (#7). No tag → identity (no highlight). | | Diff backend | Shell out to `git diff ` via `executor.exec` | Honors substrate (no libgit2 FFI). The existing exec frame handles capture + stream. `git` is universally present where aish makes sense. | | Diff failure | Bail with status `[aish] :diff failed (not a git repo / bad ref)`; do NOT inject empty output | Avoids polluting context with stale or empty diffs. | | Tree backend | `git ls-files --cached --others --exclude-standard` when cwd is a git repo, else `find . -type f -not -path './.*'` | Free `.gitignore` honor in repos; sensible default outside. Both are POSIX-portable. | | Tree summary form | Sorted relative paths, grouped by directory at depth ≤ `cfg.project.tree_depth` (default 3), truncated by char count `cfg.project.tree_max_chars` (default 4096) | One block, deterministic order, cheap to compute. Matches the [background] memory block convention (Phase 4) so the system prompt's compositional shape stays familiar. | | Tree injection point | `context.lua`: new `compose_project(...)` adds a `[project]
\n` block to the system content, between [background] and [earlier summary] | Same suppression rule as [background]/[earlier summary]: NOT injected during Norris (R-C1 / R-C4 — planner stays on its anchor). | | Tree refresh policy | One scan at startup if auto; `:tree refresh` to re-scan on demand | Scanning on every ask_ai is wasteful for slow filesystems. Manual refresh is sufficient for v1. | | @-mention diff syntax | `@..` (two `..` separator) only — recognized via the existing trailing-punct peel logic | Avoids ambiguity with literal paths. `@HEAD` alone is NOT a diff trigger (would collide with files literally named HEAD). | --- ## 3. Module Changes | File | State after Phase 5 | Phase 6 changes | |---|---|---| | `renderer.lua` | `assistant_delta(text)` writes chunks; `assistant_flush()` finalizes | Add fence-aware filter inside the assistant stream. State machine: outside-fence (pass-through) / inside-fence (buffer, emit on close). On close, pipe buffer through `tree-sitter highlight --lang ` (if highlight enabled), emit result. Toggle exposed as `renderer.set_highlight(bool)`. | | `executor.lua` | `extract_cmd_lines`, `extract_cmd_bg_lines`, `extract_delegate_lines` | No changes. Diff and tree use the existing `exec` path. | | `context.lua` | system prompt = base + [background] + [earlier summary] + NORRIS suffix | Add `self.project = "..."` string field + `compose_project(self.project)` helper. Injection between [background] and [earlier summary] (A11: memory facts read before file tree). Suppressed under Norris (A12, parity with R-C1/R-C4). | | `repl.lua` | meta dispatch + main loop + #13 secrets wiring | New helpers: `_detect_treesitter()` (run once at startup), `_run_git_diff(args)`, `_scan_project_tree(dir, opts)`. New meta: `:highlight`, `:diff`, `:tree`. Extend `expand_mentions` to recognize `..` token shape. | | `config.lua` | example blocks for mcp/safety/memory/routing/secrets/etc. | Add commented-out `project = { auto_tree = false, tree_depth = 3, tree_max_chars = 4096 }` block. | No new module files in v1. Three new helpers in `repl.lua` keep the file growing but consolidate the Phase 6 surface. If the highlighter filter grows past ~80 LOC, lift it into `highlight.lua` as a follow-up. --- ## 4. Pillar 1 — Tree-sitter highlighting ### Detection (startup, once) ```lua local function _detect_treesitter() local pipe = io.popen("command -v tree-sitter 2>/dev/null && tree-sitter --version 2>/dev/null") local ok = pipe and pipe:read("*l") and pipe:close() return ok end ``` If not present, `renderer.set_highlight(true)` emits a status warning and leaves the filter as a no-op. Don't error; the user can install tree-sitter and re-toggle. ### Stream filter The filter wraps `renderer.assistant_delta`. State machine: ``` state = "outside" | "inside" buf = "" -- only used in "inside" lang = nil -- captured at fence open push(chunk): if state == "outside": look for ```\n in chunk if found: emit chunk up to fence-open state = "inside"; lang = parsed; buf = chunk after fence-open else: emit chunk as-is if state == "inside": buf = buf .. chunk look for \n``` in buf if found: fence_body = buf up to closing rest = buf after closing emit highlighted(fence_body, lang) emit closing fence verbatim emit rest as-is (recurse with state="outside") state = "outside"; buf = "" else: -- still buffering; nothing emitted this push ``` Edge cases: chunk boundary lands inside the fence marker itself (e.g., chunk ends with ` `` `, next starts with `\n`). The state machine looks at the cumulative `buf`, so partial markers are recovered correctly. `highlighted(body, lang)` — resolved per A4 (tmpfile roundtrip): ```lua if not highlight_enabled or not lang_map[lang] then return body end local tmp = os.tmpname() local w = io.popen(("tree-sitter highlight --lang %s > %s") :format(lang_map[lang], tmp), "w") w:write(body) local _, _, code = w:close() local f = io.open(tmp, "rb") local out = f and f:read("*a") or body if f then f:close() end os.remove(tmp) if code ~= 0 then return body end -- pass-through on highlighter failure return out ``` The two-handle design avoids the ARGMAX risk of shelling `printf '%s' BODY | tree-sitter ...` (Linux ARGMAX is ~128KB but LuaJIT strings can be larger) and sidesteps shell-escape edge cases (body may contain arbitrary bytes). Cost is one syscall per code block for the tmp file create/remove cycle — negligible vs the highlighter invocation itself. ### Lang map (v1) ```lua local LANG_MAP = { py = "python", python = "python", lua = "lua", js = "javascript", javascript = "javascript", ts = "typescript", sh = "bash", bash = "bash", c = "c", h = "c", cpp = "cpp", cc = "cpp", rs = "rust", go = "go", java = "java", rb = "ruby", md = "markdown", json = "json", } ``` Reuses the same map as `expand_mentions`. Factor into a shared helper once both reference it (small `_lang_of_ext()` in repl.lua). ### Toggle `:highlight` (no arg) → flip. `:highlight on|off` → set explicit. `:highlight status` → report enabled + whether tree-sitter is present. Default: off (don't change existing-user UX). --- ## 5. Pillar 2 — Diff-aware code injection ### Meta: `:diff [args]` - `:diff` → `git diff` (working tree vs index) - `:diff staged` → `git diff --cached` - `:diff HEAD` → `git diff HEAD` - `:diff main..feature` → `git diff main..feature` - `:diff ` → passed verbatim to `git diff ` Implementation: ```lua meta.diff = function(args) args = (args or ""):gsub("^%s+", ""):gsub("%s+$", "") local cmd = "git diff " .. args local out, code = executor.exec(cmd) if code ~= 0 then renderer.status(("diff failed (exit %d)"):format(code)) return end if out == "" or out:gsub("%s", "") == "" then renderer.status("(no diff)") return end ctx:append_exec_output(("[diff %s]\n%s"):format( args == "" and "(working tree)" or args, out)) end ``` The `[diff ...]\n` framing matches the `[bg:N exited]` / `[delegate X]` conventions established in Phase 5 / #6 / #8. ### @-mention: `@..` — tiered resolution (A6) Extends `expand_mentions` (#7) by adding a SECOND resolution attempt when the first (path lookup) fails AND the token contains `..`: ```lua -- Existing path-attempt block ends with content = _read_truncated(path) -- which returns nil if no such file. Add the diff retry there: if not content and path:find("..", 1, true) then local r1, r2 = path:match("^(.-)%.%.(.+)$") if r1 and r2 and r1 ~= "" and r2 ~= "" then local pipe = io.popen(("git diff %s..%s 2>/dev/null") :format(shq(r1), shq(r2))) local diff = pipe and pipe:read("*a") or "" local _, _, code = pipe and pipe:close() if code == 0 and diff:match("%S") then content = diff -- Note: language tag becomes "diff" regardless of path lang lang_override = "diff" end end end ``` Output replaces the token with: ```` ```diff path=.. ``` ```` Tiered resolution semantics: - `@README.md` → file lookup succeeds → file expansion - `@../sibling.txt` → file lookup succeeds → file expansion - `@HEAD~1..HEAD` → file lookup fails, `..` present, ref-range succeeds → diff - `@origin/main..feature` → file lookup fails (no such file), `..` present, ref-range succeeds → diff. The token has `/` in `r1` but `git diff` accepts it as a ref; no `/`-based heuristic needed (resolves Q-D2). - `@nonexistent-file..but-also-not-a-ref` → both fail; literal token preserved with the existing `[aish] @X: not found` status path. --- ## 6. Pillar 3 — Project file-tree ### Meta: `:tree [depth]` - `:tree` → scan + inject with default depth and char cap - `:tree ` → override depth for this scan - `:tree refresh` → re-scan with cached opts - `:tree off` → clear `ctx.project` ### Scan logic ```lua local function _scan_project_tree(dir, opts) opts = opts or {} local max_chars = opts.max_chars or 4096 local depth = opts.depth or 3 -- Prefer git ls-files for .gitignore honor; fall back to find. local in_git = os.execute("cd " .. shq(dir) .. " && git rev-parse --git-dir >/dev/null 2>&1") == 0 local listcmd if in_git then listcmd = ("cd %s && git ls-files --cached --others --exclude-standard"):format(shq(dir)) else listcmd = ("find %s -maxdepth %d -type f -not -path '*/\\.*' 2>/dev/null"):format(shq(dir), depth + 1) end local pipe = io.popen(listcmd) if not pipe then return nil, "scan failed" end local files = {} for line in pipe:lines() do -- Depth filter: count `/` separators local _, slashes = line:gsub("/", "") if slashes < depth then files[#files + 1] = line end end pipe:close() table.sort(files) -- Build a tree-ish summary, truncate by char count. local body = table.concat(files, "\n") local truncated = false if #body > max_chars then body = body:sub(1, max_chars) .. "\n... (truncated)" truncated = true end return body, { file_count = #files, truncated = truncated } end ``` ### Injection `ctx.project = "..."` (string), composed into the system prompt between [background] and [earlier conversation summary]: ``` [project] 142 files (truncated at 4096B): README.md broker.lua config.lua context.lua ... ``` Suppressed under Norris (R-C1 / R-C4 — planner stays focused; the project context can be re-introduced via the Norris goal text if needed). ### Auto-inject `cfg.project.auto_tree = true` runs the scan once at startup and sets `ctx.project`. Default false (existing configs unchanged). --- ## 7. UX Surface Summary | Meta | Behavior | |---|---| | `:highlight [on/off/status]` | Toggle tree-sitter highlighter (no-op when CLI absent) | | `:diff [args]` | `git diff `, append output to context as `[diff ...]` | | `:tree [N/refresh/off]` | Scan/refresh/clear project file-tree block | | @-mention | Behavior | |---|---| | `@path` | Existing (#7) file expansion | | `@..` | New: inline `git diff ..` expansion | | Config | Default | Effect | |---|---|---| | `cfg.project.auto_tree` | `false` | Inject project tree at startup | | `cfg.project.tree_depth` | `3` | Depth filter for the scan | | `cfg.project.tree_max_chars` | `4096` | Truncation cap for the injected block | | (no config flag for `:highlight`) | — | Runtime toggle only; no persistence in v1 | --- ## 8. Out of Scope (Phase 6) - **Pure-Lua syntax highlighter** — defer to a future phase if tree-sitter CLI absence becomes a practical pain point. v1 says "install tree-sitter or accept plain text". - **bat/glow/chroma integration** — only `tree-sitter` is wired. Other highlighters can be added behind the same `:highlight` toggle later (config field `cfg.highlight.backend = "tree-sitter"|"bat"|...`). - **Smart diff context selection** — no AI-driven "which diff to show". User explicitly says `:diff ` or `@..`. - **File-tree LRU / smart summarization** — v1 is a flat truncated list. Hierarchical roll-up ("docs/ — 8 files") is a v2 polish. - **Watching for file changes** — no fs-notify reload. Re-scan via `:tree refresh`. - **Diff history** — `:diff` doesn't track its previous invocations. Each invocation is independent. - **Inline diff highlighting** — the `diff` lang is in `LANG_MAP` so `tree-sitter highlight --lang diff` works, but we don't ship custom ANSI for added/removed lines — tree-sitter's own theme covers it. - **Highlighter on @-mention echo** (v2 polish per A7) — `:highlight` applies to assistant output only. Highlighting user-pasted code as it's echoed by readline would need a separate hook in the readline display path; out of scope here. - **Auto-refresh project tree on `cd`** (v2 polish per A8) — the cd interceptor in `executor.maybe_chdir` is a clean place to call `_scan_project_tree(libc.getcwd(), ...)` on every successful cd. Skipped in v1 because the scan can be slow on large trees; manual refresh via `:tree refresh` is the v1 verb. - **Custom include/exclude globs for project tree** (v2 polish per A9) — `cfg.project = { include = {...}, exclude = {...} }` would extend beyond `.gitignore`. v1 ships with `.gitignore`-only honor (via `git ls-files --exclude-standard`) plus the `find` fallback for non-repo cwds. --- ## 9. Risks | Risk | Mitigation | |---|---| | `tree-sitter` CLI not on fleet → most users get no highlighting | It's opt-in; default off; status warning on toggle when absent. | | Highlighter latency on long code blocks (whole-block buffering) | Accepted trade-off vs corrupting output. If painful in practice, add a per-block size cap above which we pass-through unhighlighted. | | `git diff` on huge changesets blows context budget | Diff output reuses `enforce_budget` eviction (it's just exec output). User can `:diff ` to scope. v2 could add a `--max-bytes` truncation. | | `git ls-files` in a non-git cwd → falls back to `find`, may pick up node_modules / target / etc. | Document in config example; v2 could honor `.aishignore` or similar. | | @`..` collides with paths like `@../sibling.txt` | A6: tiered resolution — try as path first; only fall through to diff retry when path lookup fails AND token contains `..`. `@../sibling.txt` hits the path branch and never reaches the diff retry. | | Project tree injection adds tokens to every broker call | Char cap + opt-in `auto_tree = false` default. Suppressed under Norris. | | `:highlight on` mid-stream produces inconsistent rendering for the in-flight turn | Toggle takes effect from the NEXT assistant turn. Document this. | --- ## 10. Open Questions (Phase 6) All six formulate-time Qs were resolved in analyze (A4–A9). None remain open as blockers for implementation. | # | Question | Resolution | |---|---|---| | Q-H1 | popen3 for `tree-sitter highlight` | A4: tmpfile roundtrip — `io.popen("w")` writes body with stdout redirected to a tmp file, then `io.open` reads the file. Avoids ARGMAX + shell-escape complexity. | | Q-D1 | Confirm gate on `:diff`? | A5: no. `git diff` is read-only; matches `:history` / `:sessions` / `:safety check` (none gate). Permission DSL (#9) applies only to AI-suggested `CMD:` lines. | | Q-D2 | `@..` with refs containing `/` | A6: tiered resolution — file lookup first, then if it fails AND `..` is present, retry as ref-range. `@origin/main..feature` naturally falls through to the retry; no grammar prefix needed. | | Q-T1 | `cfg.project.auto_tree` update on cd | A8: no auto-refresh in v1. `:tree refresh` is the manual verb; cd-intercept hook is documented as v2 polish in §8. | | Q-T2 | Custom include/exclude globs | A9: rely on `.gitignore` via `git ls-files` in repos; `find` fallback outside. Custom globs deferred to v2. | | Q-H2 | Highlighting on @-mention echo | A7: assistant-output only in v1. Echo via readline is a different code path; deferred to v2 (see §8). | --- ## 11. Phase 6 → Phase 7+ Out-of-band The §11 "Planned Phase Sequence" table in PHASE0.md does not list phases beyond 6. After Phase 6 lands, candidate next iterations (non-binding, for the formulate of Phase 7 to confirm): - **Phase 7**: secret-redaction wiring into `safety.lua` (#52 follow-up filed during Phase 5/13 close); session-multiplex / tmux parity surfaces (out of scope per §12 — explicitly rejected); or other backlog as it accumulates on Gitea. Phase 6 itself is self-contained — none of its three pillars introduce substrate dependencies on phases not yet planned.