# aish — Phase 6 Manifest **Project:** aish — AI-augmented conversational shell **Document:** Phase 6 Requirements, Architecture & Design Decisions **Status:** Formulate (pre-analyze) **Date:** 2026-05-16 PHASE0 is the locked substrate; PHASE1-5 are layered on top. This manifest specifies what Phase 6 adds — **tree-sitter syntax highlighting hooks**, **diff-aware code injection**, and **project-level context (file-tree summary)**. --- ## 1. Scope of Phase 6 Three pillars per PHASE0.md §11 row 6: 1. **Tree-sitter syntax highlighting hooks** — when an external `tree-sitter` CLI is detected at startup, assistant code-fence content is filtered through it for ANSI-colorized display. Plain prose streams unchanged. When the CLI is absent, the filter is the identity function (zero overhead, zero hard dependency). Toggleable at runtime with `:highlight on|off`. Default off until the user opts in (don't surprise existing users with a display change). 2. **Diff-aware code injection** — surface git diffs as first-class context. Two entry points: - Meta verb: `:diff [args]` runs `git diff ` from cwd, appends output to context as exec-output. `:diff staged`, `:diff HEAD~3`, `:diff main..feature` all delegate to git's argument grammar. - @-mention extension: `@HEAD..feature` (a ref-range expression anywhere a `@path` would go) expands inline as a fenced `diff` block, mirroring how `@README.md` already works. 3. **Project-level context (file-tree summary)** — `git ls-files`-based tree summary of the cwd, injected as a `[project]` block in the system prompt. Two entry points: - Meta verb: `:tree [depth]` injects on demand; `:tree refresh` re-scans. - Auto-inject at startup when `cfg.project.auto_tree = true` — gated like memory injection so existing configs don't change behavior. **Phase 6 is done when:** - With `tree-sitter` CLI installed and `:highlight on`, the assistant reply ```py\nprint("hi")\n``` shows up with ANSI colors. Without the CLI, `:highlight on` is a no-op + emits a status warning. - `:diff` from a dirty git repo shows the working-tree diff in the exec-output frame; the model sees it on the next ask_ai turn. - `@HEAD~1..HEAD` in a prompt expands inline to a fenced diff block. - `:tree` injects a `[project] :` block visible in `ctx:to_messages()` (via the system prompt assembly). - With `cfg.project.auto_tree = true`, the project block appears on every broker call (subject to `max_chars` cap). - Existing configs without `cfg.project` and with `:highlight off` (default) behave exactly like Phase 5 (Phase 5 regression coverage). --- ## 2. Technology Decisions (delta from Phase 5) | Decision | Choice | Rationale | |---|---|---| | Highlight backend | External `tree-sitter` CLI (`tree-sitter highlight --lang X`) | Honors PHASE0 §3: no compiled extensions, no luarocks. Detected once at startup; absence → identity filter. Opt-in via `:highlight on` so install-state changes don't break users. | | Highlight buffering | Accumulate inside fenced code blocks, emit on closing fence; pass-through outside fences | Streaming UX preserved for prose. Code blocks get colorized atomically, accepting a per-block latency (~ block streaming time). Per-chunk highlighting would split a token across `tree-sitter` invocations and corrupt the output. | | Lang detection | First-line fence info-string (` ```py`, ` ```python`, ` ```lua`) → normalized via small map (py→python, js→javascript, etc.) | The lang tag mirrors the one we already emit in `expand_mentions` (#7). No tag → identity (no highlight). | | Diff backend | Shell out to `git diff ` via `executor.exec` | Honors substrate (no libgit2 FFI). The existing exec frame handles capture + stream. `git` is universally present where aish makes sense. | | Diff failure | Bail with status `[aish] :diff failed (not a git repo / bad ref)`; do NOT inject empty output | Avoids polluting context with stale or empty diffs. | | Tree backend | `git ls-files --cached --others --exclude-standard` when cwd is a git repo, else `find . -type f -not -path './.*'` | Free `.gitignore` honor in repos; sensible default outside. Both are POSIX-portable. | | Tree summary form | Sorted relative paths, grouped by directory at depth ≤ `cfg.project.tree_depth` (default 3), truncated by char count `cfg.project.tree_max_chars` (default 4096) | One block, deterministic order, cheap to compute. Matches the [background] memory block convention (Phase 4) so the system prompt's compositional shape stays familiar. | | Tree injection point | `context.lua`: new `compose_project(...)` adds a `[project]
\n` block to the system content, between [background] and [earlier summary] | Same suppression rule as [background]/[earlier summary]: NOT injected during Norris (R-C1 / R-C4 — planner stays on its anchor). | | Tree refresh policy | One scan at startup if auto; `:tree refresh` to re-scan on demand | Scanning on every ask_ai is wasteful for slow filesystems. Manual refresh is sufficient for v1. | | @-mention diff syntax | `@..` (two `..` separator) only — recognized via the existing trailing-punct peel logic | Avoids ambiguity with literal paths. `@HEAD` alone is NOT a diff trigger (would collide with files literally named HEAD). | --- ## 3. Module Changes | File | State after Phase 5 | Phase 6 changes | |---|---|---| | `renderer.lua` | `assistant_delta(text)` writes chunks; `assistant_flush()` finalizes | Add fence-aware filter inside the assistant stream. State machine: outside-fence (pass-through) / inside-fence (buffer, emit on close). On close, pipe buffer through `tree-sitter highlight --lang ` (if highlight enabled), emit result. Toggle exposed as `renderer.set_highlight(bool)`. | | `executor.lua` | `extract_cmd_lines`, `extract_cmd_bg_lines`, `extract_delegate_lines` | No changes. Diff and tree use the existing `exec` path. | | `context.lua` | system prompt = base + [background] + [earlier summary] + NORRIS suffix | Add `self.project = "..."` field + `compose_project(self.project)` helper. Injection between [background] and [earlier summary]. Suppressed under Norris. | | `repl.lua` | meta dispatch + main loop + #13 secrets wiring | New helpers: `_detect_treesitter()` (run once at startup), `_run_git_diff(args)`, `_scan_project_tree(dir, opts)`. New meta: `:highlight`, `:diff`, `:tree`. Extend `expand_mentions` to recognize `..` token shape. | | `config.lua` | example blocks for mcp/safety/memory/routing/secrets/etc. | Add commented-out `project = { auto_tree = false, tree_depth = 3, tree_max_chars = 4096 }` block. | No new module files in v1. Three new helpers in `repl.lua` keep the file growing but consolidate the Phase 6 surface. If the highlighter filter grows past ~80 LOC, lift it into `highlight.lua` as a follow-up. --- ## 4. Pillar 1 — Tree-sitter highlighting ### Detection (startup, once) ```lua local function _detect_treesitter() local pipe = io.popen("command -v tree-sitter 2>/dev/null && tree-sitter --version 2>/dev/null") local ok = pipe and pipe:read("*l") and pipe:close() return ok end ``` If not present, `renderer.set_highlight(true)` emits a status warning and leaves the filter as a no-op. Don't error; the user can install tree-sitter and re-toggle. ### Stream filter The filter wraps `renderer.assistant_delta`. State machine: ``` state = "outside" | "inside" buf = "" -- only used in "inside" lang = nil -- captured at fence open push(chunk): if state == "outside": look for ```\n in chunk if found: emit chunk up to fence-open state = "inside"; lang = parsed; buf = chunk after fence-open else: emit chunk as-is if state == "inside": buf = buf .. chunk look for \n``` in buf if found: fence_body = buf up to closing rest = buf after closing emit highlighted(fence_body, lang) emit closing fence verbatim emit rest as-is (recurse with state="outside") state = "outside"; buf = "" else: -- still buffering; nothing emitted this push ``` Edge cases: chunk boundary lands inside the fence marker itself (e.g., chunk ends with ` `` `, next starts with `\n`). The state machine looks at the cumulative `buf`, so partial markers are recovered correctly. `highlighted(body, lang)`: ``` if not highlight_enabled or not lang_map[lang]: return body pipe = io.popen("tree-sitter highlight --lang " .. lang_map[lang], "w") pipe:write(body); pipe:close() -- HOWEVER: io.popen("w") doesn't read back stdout. We need both: -- write body to stdin AND capture stdout. Easiest: temp file or -- /tmp pipe trick. tbd in analyze. ``` **Open Q-H1** (analyze): how to popen for write+read simultaneously without forkpty. Candidates: temp file roundtrip, `popen` with shell piping `printf '%s' BODY | tree-sitter highlight | cat`. The shell pipe is cleanest if we shell-escape body. ### Lang map (v1) ```lua local LANG_MAP = { py = "python", python = "python", lua = "lua", js = "javascript", javascript = "javascript", ts = "typescript", sh = "bash", bash = "bash", c = "c", h = "c", cpp = "cpp", cc = "cpp", rs = "rust", go = "go", java = "java", rb = "ruby", md = "markdown", json = "json", } ``` Reuses the same map as `expand_mentions`. Factor into a shared helper once both reference it (small `_lang_of_ext()` in repl.lua). ### Toggle `:highlight` (no arg) → flip. `:highlight on|off` → set explicit. `:highlight status` → report enabled + whether tree-sitter is present. Default: off (don't change existing-user UX). --- ## 5. Pillar 2 — Diff-aware code injection ### Meta: `:diff [args]` - `:diff` → `git diff` (working tree vs index) - `:diff staged` → `git diff --cached` - `:diff HEAD` → `git diff HEAD` - `:diff main..feature` → `git diff main..feature` - `:diff ` → passed verbatim to `git diff ` Implementation: ```lua meta.diff = function(args) args = (args or ""):gsub("^%s+", ""):gsub("%s+$", "") local cmd = "git diff " .. args local out, code = executor.exec(cmd) if code ~= 0 then renderer.status(("diff failed (exit %d)"):format(code)) return end if out == "" or out:gsub("%s", "") == "" then renderer.status("(no diff)") return end ctx:append_exec_output(("[diff %s]\n%s"):format( args == "" and "(working tree)" or args, out)) end ``` The `[diff ...]\n` framing matches the `[bg:N exited]` / `[delegate X]` conventions established in Phase 5 / #6 / #8. ### @-mention: `@..` Extends `expand_mentions` (#7). After the existing path-resolution attempt fails, try interpreting the token as a git diff-range: ```lua local r1, r2 = path:match("^(.-)%.%.(.+)$") if r1 and r2 and r1 ~= "" and r2 ~= "" then -- candidate diff range; try `git diff ..` local pipe = io.popen(("git diff %q..%q 2>/dev/null") :format(r1, r2)) ... end ``` Output replaces the token with: ```` ```diff ``` ```` Same fence-with-lang shape as the `@path` expansion. **Risk:** false-positive on legitimate paths containing `..` like `@../sibling.txt`. Mitigation: only interpret as diff-range when the token contains NO `/` (paths have `/`, ref-ranges don't). Refs with `/` like `origin/main..feature` ARE common — for those, the user can fall back to `:diff origin/main..feature`. --- ## 6. Pillar 3 — Project file-tree ### Meta: `:tree [depth]` - `:tree` → scan + inject with default depth and char cap - `:tree ` → override depth for this scan - `:tree refresh` → re-scan with cached opts - `:tree off` → clear `ctx.project` ### Scan logic ```lua local function _scan_project_tree(dir, opts) opts = opts or {} local max_chars = opts.max_chars or 4096 local depth = opts.depth or 3 -- Prefer git ls-files for .gitignore honor; fall back to find. local in_git = os.execute("cd " .. shq(dir) .. " && git rev-parse --git-dir >/dev/null 2>&1") == 0 local listcmd if in_git then listcmd = ("cd %s && git ls-files --cached --others --exclude-standard"):format(shq(dir)) else listcmd = ("find %s -maxdepth %d -type f -not -path '*/\\.*' 2>/dev/null"):format(shq(dir), depth + 1) end local pipe = io.popen(listcmd) if not pipe then return nil, "scan failed" end local files = {} for line in pipe:lines() do -- Depth filter: count `/` separators local _, slashes = line:gsub("/", "") if slashes < depth then files[#files + 1] = line end end pipe:close() table.sort(files) -- Build a tree-ish summary, truncate by char count. local body = table.concat(files, "\n") local truncated = false if #body > max_chars then body = body:sub(1, max_chars) .. "\n... (truncated)" truncated = true end return body, { file_count = #files, truncated = truncated } end ``` ### Injection `ctx.project = "..."` (string), composed into the system prompt between [background] and [earlier conversation summary]: ``` [project] 142 files (truncated at 4096B): README.md broker.lua config.lua context.lua ... ``` Suppressed under Norris (R-C1 / R-C4 — planner stays focused; the project context can be re-introduced via the Norris goal text if needed). ### Auto-inject `cfg.project.auto_tree = true` runs the scan once at startup and sets `ctx.project`. Default false (existing configs unchanged). --- ## 7. UX Surface Summary | Meta | Behavior | |---|---| | `:highlight [on/off/status]` | Toggle tree-sitter highlighter (no-op when CLI absent) | | `:diff [args]` | `git diff `, append output to context as `[diff ...]` | | `:tree [N/refresh/off]` | Scan/refresh/clear project file-tree block | | @-mention | Behavior | |---|---| | `@path` | Existing (#7) file expansion | | `@..` | New: inline `git diff ..` expansion | | Config | Default | Effect | |---|---|---| | `cfg.project.auto_tree` | `false` | Inject project tree at startup | | `cfg.project.tree_depth` | `3` | Depth filter for the scan | | `cfg.project.tree_max_chars` | `4096` | Truncation cap for the injected block | | (no config flag for `:highlight`) | — | Runtime toggle only; no persistence in v1 | --- ## 8. Out of Scope (Phase 6) - **Pure-Lua syntax highlighter** — defer to a future phase if tree-sitter CLI absence becomes a practical pain point. v1 says "install tree-sitter or accept plain text". - **bat/glow/chroma integration** — only `tree-sitter` is wired. Other highlighters can be added behind the same `:highlight` toggle later (config field `cfg.highlight.backend = "tree-sitter"|"bat"|...`). - **Smart diff context selection** — no AI-driven "which diff to show". User explicitly says `:diff ` or `@..`. - **File-tree LRU / smart summarization** — v1 is a flat truncated list. Hierarchical roll-up ("docs/ — 8 files") is a v2 polish. - **Watching for file changes** — no fs-notify reload. Re-scan via `:tree refresh`. - **Diff history** — `:diff` doesn't track its previous invocations. Each invocation is independent. - **Inline diff highlighting** — the `diff` lang is in `LANG_MAP` so `tree-sitter highlight --lang diff` works, but we don't ship custom ANSI for added/removed lines — tree-sitter's own theme covers it. --- ## 9. Risks | Risk | Mitigation | |---|---| | `tree-sitter` CLI not on fleet → most users get no highlighting | It's opt-in; default off; status warning on toggle when absent. | | Highlighter latency on long code blocks (whole-block buffering) | Accepted trade-off vs corrupting output. If painful in practice, add a per-block size cap above which we pass-through unhighlighted. | | `git diff` on huge changesets blows context budget | Diff output reuses `enforce_budget` eviction (it's just exec output). User can `:diff ` to scope. v2 could add a `--max-bytes` truncation. | | `git ls-files` in a non-git cwd → falls back to `find`, may pick up node_modules / target / etc. | Document in config example; v2 could honor `.aishignore` or similar. | | @`..` collides with paths like `@../sibling.txt` | Mitigation: require NO `/` in the token for diff interpretation. Paths with `..` segments use `:diff` explicitly. | | Project tree injection adds tokens to every broker call | Char cap + opt-in `auto_tree = false` default. Suppressed under Norris. | | `:highlight on` mid-stream produces inconsistent rendering for the in-flight turn | Toggle takes effect from the NEXT assistant turn. Document this. | --- ## 10. Open Questions (Phase 6) | # | Question | Impact | Resolution target | |---|---|---|---| | Q-H1 | How to popen `tree-sitter highlight` with simultaneous stdin write + stdout capture (Lua/LuaJIT lacks popen3). Candidates: temp-file roundtrip, shell-pipe wrapper `printf '%s' BODY \| tree-sitter ...` with shell-escape, or use `io.popen("w")` + a second `io.open(output_file)` after the process completes. | Highlighter correctness | Analyze | | Q-D1 | Should `:diff` honor a per-call confirm gate (it shells out and reads git history; safe but noisy)? | UX | Analyze | | Q-D2 | Should `@..` accept refs with `/` (`origin/main..feature`)? Doing so means we can't use the no-`/` heuristic to disambiguate from paths. Alternative: require explicit prefix like `@diff:origin/main..feature`. | @-mention grammar | Analyze | | Q-T1 | When `cfg.project.auto_tree = true`, should the project block update on `cd` (since the cwd changed)? Or stay fixed at startup-cwd? | UX expectation | Analyze | | Q-T2 | Should `cfg.project` accept a list of include/exclude glob patterns, or just rely on git's .gitignore? | Configurability | Analyze | | Q-H2 | Should highlighting also apply to user-pasted code (expand_mentions @path), not just assistant output? | Symmetry | Analyze | --- ## 11. Phase 6 → Phase 7+ Out-of-band The §11 "Planned Phase Sequence" table in PHASE0.md does not list phases beyond 6. After Phase 6 lands, candidate next iterations (non-binding, for the formulate of Phase 7 to confirm): - **Phase 7**: secret-redaction wiring into `safety.lua` (#52 follow-up filed during Phase 5/13 close); session-multiplex / tmux parity surfaces (out of scope per §12 — explicitly rejected); or other backlog as it accumulates on Gitea. Phase 6 itself is self-contained — none of its three pillars introduce substrate dependencies on phases not yet planned.