From f596743834532619ecff7cf3f6e487bb73b524ec Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sat, 16 May 2026 21:47:00 +0000 Subject: [PATCH] =?UTF-8?q?docs/PHASE6:=20formulate=20=E2=80=94=20tree-sit?= =?UTF-8?q?ter=20highlight=20+=20diff=20+=20project=20tree?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 6 formulate manifest. Three pillars per PHASE0 §11 row 6: 1. Tree-sitter syntax highlighting hooks External `tree-sitter` CLI when present, no-op otherwise. Honors PHASE0 §3 (no compiled extensions). Toggleable at runtime; off by default so existing UX is unchanged. 2. Diff-aware code injection :diff [args] meta + @.. @-mention extension. Shells out to `git diff`; output flows through the existing exec-output context channel. 3. Project-level file-tree context :tree meta + optional cfg.project.auto_tree startup inject. git ls-files in a repo, find fallback otherwise. Composed into the system prompt as a new [project] block between [background] and [earlier summary]. Suppressed under Norris (R-C1 / R-C4 parity). Module changes: renderer.lua (fence-aware highlight filter), context.lua (compose_project), repl.lua (3 new metas, 3 new helpers, expand_mentions extension). No new module files in v1. Doc covers: scope + done-when criteria, tech decisions table, module changes table, per-pillar deep dive with example code, UX surface summary, out-of-scope list, risks, and 6 open questions to resolve in analyze (Q-H1/Q-H2 highlighter, Q-D1/Q-D2 diff, Q-T1/Q-T2 tree). Scope confirmed via AskUserQuestion: all three subsurfaces in scope; tree-sitter approach is external CLI w/ no-op fallback. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/PHASE6.md | 416 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 416 insertions(+) create mode 100644 docs/PHASE6.md diff --git a/docs/PHASE6.md b/docs/PHASE6.md new file mode 100644 index 0000000..39a1de8 --- /dev/null +++ b/docs/PHASE6.md @@ -0,0 +1,416 @@ +# aish — Phase 6 Manifest + +**Project:** aish — AI-augmented conversational shell +**Document:** Phase 6 Requirements, Architecture & Design Decisions +**Status:** Formulate (pre-analyze) +**Date:** 2026-05-16 + +PHASE0 is the locked substrate; PHASE1-5 are layered on top. This manifest +specifies what Phase 6 adds — **tree-sitter syntax highlighting hooks**, +**diff-aware code injection**, and **project-level context (file-tree +summary)**. + +--- + +## 1. Scope of Phase 6 + +Three pillars per PHASE0.md §11 row 6: + +1. **Tree-sitter syntax highlighting hooks** — when an external + `tree-sitter` CLI is detected at startup, assistant code-fence + content is filtered through it for ANSI-colorized display. Plain + prose streams unchanged. When the CLI is absent, the filter is the + identity function (zero overhead, zero hard dependency). Toggleable + at runtime with `:highlight on|off`. Default off until the user + opts in (don't surprise existing users with a display change). + +2. **Diff-aware code injection** — surface git diffs as first-class + context. Two entry points: + + - Meta verb: `:diff [args]` runs `git diff ` from cwd, appends + output to context as exec-output. `:diff staged`, `:diff HEAD~3`, + `:diff main..feature` all delegate to git's argument grammar. + - @-mention extension: `@HEAD..feature` (a ref-range expression + anywhere a `@path` would go) expands inline as a fenced `diff` + block, mirroring how `@README.md` already works. + +3. **Project-level context (file-tree summary)** — `git ls-files`-based + tree summary of the cwd, injected as a `[project]` block in the + system prompt. Two entry points: + + - Meta verb: `:tree [depth]` injects on demand; `:tree refresh` + re-scans. + - Auto-inject at startup when `cfg.project.auto_tree = true` — + gated like memory injection so existing configs don't change + behavior. + +**Phase 6 is done when:** + +- With `tree-sitter` CLI installed and `:highlight on`, the assistant + reply ```py\nprint("hi")\n``` shows up with ANSI colors. Without + the CLI, `:highlight on` is a no-op + emits a status warning. +- `:diff` from a dirty git repo shows the working-tree diff in the + exec-output frame; the model sees it on the next ask_ai turn. +- `@HEAD~1..HEAD` in a prompt expands inline to a fenced diff block. +- `:tree` injects a `[project] :` block visible in + `ctx:to_messages()` (via the system prompt assembly). +- With `cfg.project.auto_tree = true`, the project block appears on + every broker call (subject to `max_chars` cap). +- Existing configs without `cfg.project` and with `:highlight off` + (default) behave exactly like Phase 5 (Phase 5 regression coverage). + +--- + +## 2. Technology Decisions (delta from Phase 5) + +| Decision | Choice | Rationale | +|---|---|---| +| Highlight backend | External `tree-sitter` CLI (`tree-sitter highlight --lang X`) | Honors PHASE0 §3: no compiled extensions, no luarocks. Detected once at startup; absence → identity filter. Opt-in via `:highlight on` so install-state changes don't break users. | +| Highlight buffering | Accumulate inside fenced code blocks, emit on closing fence; pass-through outside fences | Streaming UX preserved for prose. Code blocks get colorized atomically, accepting a per-block latency (~ block streaming time). Per-chunk highlighting would split a token across `tree-sitter` invocations and corrupt the output. | +| Lang detection | First-line fence info-string (` ```py`, ` ```python`, ` ```lua`) → normalized via small map (py→python, js→javascript, etc.) | The lang tag mirrors the one we already emit in `expand_mentions` (#7). No tag → identity (no highlight). | +| Diff backend | Shell out to `git diff ` via `executor.exec` | Honors substrate (no libgit2 FFI). The existing exec frame handles capture + stream. `git` is universally present where aish makes sense. | +| Diff failure | Bail with status `[aish] :diff failed (not a git repo / bad ref)`; do NOT inject empty output | Avoids polluting context with stale or empty diffs. | +| Tree backend | `git ls-files --cached --others --exclude-standard` when cwd is a git repo, else `find . -type f -not -path './.*'` | Free `.gitignore` honor in repos; sensible default outside. Both are POSIX-portable. | +| Tree summary form | Sorted relative paths, grouped by directory at depth ≤ `cfg.project.tree_depth` (default 3), truncated by char count `cfg.project.tree_max_chars` (default 4096) | One block, deterministic order, cheap to compute. Matches the [background] memory block convention (Phase 4) so the system prompt's compositional shape stays familiar. | +| Tree injection point | `context.lua`: new `compose_project(...)` adds a `[project]
\n` block to the system content, between [background] and [earlier summary] | Same suppression rule as [background]/[earlier summary]: NOT injected during Norris (R-C1 / R-C4 — planner stays on its anchor). | +| Tree refresh policy | One scan at startup if auto; `:tree refresh` to re-scan on demand | Scanning on every ask_ai is wasteful for slow filesystems. Manual refresh is sufficient for v1. | +| @-mention diff syntax | `@..` (two `..` separator) only — recognized via the existing trailing-punct peel logic | Avoids ambiguity with literal paths. `@HEAD` alone is NOT a diff trigger (would collide with files literally named HEAD). | + +--- + +## 3. Module Changes + +| File | State after Phase 5 | Phase 6 changes | +|---|---|---| +| `renderer.lua` | `assistant_delta(text)` writes chunks; `assistant_flush()` finalizes | Add fence-aware filter inside the assistant stream. State machine: outside-fence (pass-through) / inside-fence (buffer, emit on close). On close, pipe buffer through `tree-sitter highlight --lang ` (if highlight enabled), emit result. Toggle exposed as `renderer.set_highlight(bool)`. | +| `executor.lua` | `extract_cmd_lines`, `extract_cmd_bg_lines`, `extract_delegate_lines` | No changes. Diff and tree use the existing `exec` path. | +| `context.lua` | system prompt = base + [background] + [earlier summary] + NORRIS suffix | Add `self.project = "..."` field + `compose_project(self.project)` helper. Injection between [background] and [earlier summary]. Suppressed under Norris. | +| `repl.lua` | meta dispatch + main loop + #13 secrets wiring | New helpers: `_detect_treesitter()` (run once at startup), `_run_git_diff(args)`, `_scan_project_tree(dir, opts)`. New meta: `:highlight`, `:diff`, `:tree`. Extend `expand_mentions` to recognize `..` token shape. | +| `config.lua` | example blocks for mcp/safety/memory/routing/secrets/etc. | Add commented-out `project = { auto_tree = false, tree_depth = 3, tree_max_chars = 4096 }` block. | + +No new module files in v1. Three new helpers in `repl.lua` keep the +file growing but consolidate the Phase 6 surface. If the highlighter +filter grows past ~80 LOC, lift it into `highlight.lua` as a follow-up. + +--- + +## 4. Pillar 1 — Tree-sitter highlighting + +### Detection (startup, once) + +```lua +local function _detect_treesitter() + local pipe = io.popen("command -v tree-sitter 2>/dev/null && tree-sitter --version 2>/dev/null") + local ok = pipe and pipe:read("*l") and pipe:close() + return ok +end +``` + +If not present, `renderer.set_highlight(true)` emits a status warning +and leaves the filter as a no-op. Don't error; the user can install +tree-sitter and re-toggle. + +### Stream filter + +The filter wraps `renderer.assistant_delta`. State machine: + +``` +state = "outside" | "inside" +buf = "" -- only used in "inside" +lang = nil -- captured at fence open + +push(chunk): + if state == "outside": + look for ```\n in chunk + if found: + emit chunk up to fence-open + state = "inside"; lang = parsed; buf = chunk after fence-open + else: + emit chunk as-is + + if state == "inside": + buf = buf .. chunk + look for \n``` in buf + if found: + fence_body = buf up to closing + rest = buf after closing + emit highlighted(fence_body, lang) + emit closing fence verbatim + emit rest as-is (recurse with state="outside") + state = "outside"; buf = "" + else: + -- still buffering; nothing emitted this push +``` + +Edge cases: chunk boundary lands inside the fence marker itself +(e.g., chunk ends with ` `` `, next starts with `\n`). The state +machine looks at the cumulative `buf`, so partial markers are +recovered correctly. + +`highlighted(body, lang)`: + +``` +if not highlight_enabled or not lang_map[lang]: + return body +pipe = io.popen("tree-sitter highlight --lang " .. lang_map[lang], "w") +pipe:write(body); pipe:close() +-- HOWEVER: io.popen("w") doesn't read back stdout. We need both: +-- write body to stdin AND capture stdout. Easiest: temp file or +-- /tmp pipe trick. tbd in analyze. +``` + +**Open Q-H1** (analyze): how to popen for write+read simultaneously +without forkpty. Candidates: temp file roundtrip, `popen` with shell +piping `printf '%s' BODY | tree-sitter highlight | cat`. The shell +pipe is cleanest if we shell-escape body. + +### Lang map (v1) + +```lua +local LANG_MAP = { + py = "python", python = "python", + lua = "lua", + js = "javascript", javascript = "javascript", ts = "typescript", + sh = "bash", bash = "bash", + c = "c", h = "c", cpp = "cpp", cc = "cpp", + rs = "rust", go = "go", java = "java", rb = "ruby", + md = "markdown", json = "json", +} +``` + +Reuses the same map as `expand_mentions`. Factor into a shared +helper once both reference it (small `_lang_of_ext()` in repl.lua). + +### Toggle + +`:highlight` (no arg) → flip. `:highlight on|off` → set explicit. +`:highlight status` → report enabled + whether tree-sitter is present. +Default: off (don't change existing-user UX). + +--- + +## 5. Pillar 2 — Diff-aware code injection + +### Meta: `:diff [args]` + +- `:diff` → `git diff` (working tree vs index) +- `:diff staged` → `git diff --cached` +- `:diff HEAD` → `git diff HEAD` +- `:diff main..feature` → `git diff main..feature` +- `:diff ` → passed verbatim to `git diff ` + +Implementation: + +```lua +meta.diff = function(args) + args = (args or ""):gsub("^%s+", ""):gsub("%s+$", "") + local cmd = "git diff " .. args + local out, code = executor.exec(cmd) + if code ~= 0 then + renderer.status(("diff failed (exit %d)"):format(code)) + return + end + if out == "" or out:gsub("%s", "") == "" then + renderer.status("(no diff)") + return + end + ctx:append_exec_output(("[diff %s]\n%s"):format( + args == "" and "(working tree)" or args, out)) +end +``` + +The `[diff ...]\n` framing matches the `[bg:N exited]` / +`[delegate X]` conventions established in Phase 5 / #6 / #8. + +### @-mention: `@..` + +Extends `expand_mentions` (#7). After the existing path-resolution +attempt fails, try interpreting the token as a git diff-range: + +```lua +local r1, r2 = path:match("^(.-)%.%.(.+)$") +if r1 and r2 and r1 ~= "" and r2 ~= "" then + -- candidate diff range; try `git diff ..` + local pipe = io.popen(("git diff %q..%q 2>/dev/null") + :format(r1, r2)) + ... +end +``` + +Output replaces the token with: + +```` +```diff + +``` +```` + +Same fence-with-lang shape as the `@path` expansion. + +**Risk:** false-positive on legitimate paths containing `..` like +`@../sibling.txt`. Mitigation: only interpret as diff-range when +the token contains NO `/` (paths have `/`, ref-ranges don't). Refs +with `/` like `origin/main..feature` ARE common — for those, the +user can fall back to `:diff origin/main..feature`. + +--- + +## 6. Pillar 3 — Project file-tree + +### Meta: `:tree [depth]` + +- `:tree` → scan + inject with default depth and char cap +- `:tree ` → override depth for this scan +- `:tree refresh` → re-scan with cached opts +- `:tree off` → clear `ctx.project` + +### Scan logic + +```lua +local function _scan_project_tree(dir, opts) + opts = opts or {} + local max_chars = opts.max_chars or 4096 + local depth = opts.depth or 3 + + -- Prefer git ls-files for .gitignore honor; fall back to find. + local in_git = os.execute("cd " .. shq(dir) .. " && git rev-parse --git-dir >/dev/null 2>&1") == 0 + local listcmd + if in_git then + listcmd = ("cd %s && git ls-files --cached --others --exclude-standard"):format(shq(dir)) + else + listcmd = ("find %s -maxdepth %d -type f -not -path '*/\\.*' 2>/dev/null"):format(shq(dir), depth + 1) + end + + local pipe = io.popen(listcmd) + if not pipe then return nil, "scan failed" end + + local files = {} + for line in pipe:lines() do + -- Depth filter: count `/` separators + local _, slashes = line:gsub("/", "") + if slashes < depth then files[#files + 1] = line end + end + pipe:close() + + table.sort(files) + + -- Build a tree-ish summary, truncate by char count. + local body = table.concat(files, "\n") + local truncated = false + if #body > max_chars then + body = body:sub(1, max_chars) .. "\n... (truncated)" + truncated = true + end + return body, { file_count = #files, truncated = truncated } +end +``` + +### Injection + +`ctx.project = "..."` (string), composed into the system prompt +between [background] and [earlier conversation summary]: + +``` +[project] 142 files (truncated at 4096B): +README.md +broker.lua +config.lua +context.lua +... +``` + +Suppressed under Norris (R-C1 / R-C4 — planner stays focused; the +project context can be re-introduced via the Norris goal text if +needed). + +### Auto-inject + +`cfg.project.auto_tree = true` runs the scan once at startup and +sets `ctx.project`. Default false (existing configs unchanged). + +--- + +## 7. UX Surface Summary + +| Meta | Behavior | +|---|---| +| `:highlight [on/off/status]` | Toggle tree-sitter highlighter (no-op when CLI absent) | +| `:diff [args]` | `git diff `, append output to context as `[diff ...]` | +| `:tree [N/refresh/off]` | Scan/refresh/clear project file-tree block | + +| @-mention | Behavior | +|---|---| +| `@path` | Existing (#7) file expansion | +| `@..` | New: inline `git diff ..` expansion | + +| Config | Default | Effect | +|---|---|---| +| `cfg.project.auto_tree` | `false` | Inject project tree at startup | +| `cfg.project.tree_depth` | `3` | Depth filter for the scan | +| `cfg.project.tree_max_chars` | `4096` | Truncation cap for the injected block | +| (no config flag for `:highlight`) | — | Runtime toggle only; no persistence in v1 | + +--- + +## 8. Out of Scope (Phase 6) + +- **Pure-Lua syntax highlighter** — defer to a future phase if + tree-sitter CLI absence becomes a practical pain point. v1 says + "install tree-sitter or accept plain text". +- **bat/glow/chroma integration** — only `tree-sitter` is wired. + Other highlighters can be added behind the same `:highlight` toggle + later (config field `cfg.highlight.backend = "tree-sitter"|"bat"|...`). +- **Smart diff context selection** — no AI-driven "which diff to show". + User explicitly says `:diff ` or `@..`. +- **File-tree LRU / smart summarization** — v1 is a flat truncated list. + Hierarchical roll-up ("docs/ — 8 files") is a v2 polish. +- **Watching for file changes** — no fs-notify reload. Re-scan via + `:tree refresh`. +- **Diff history** — `:diff` doesn't track its previous invocations. + Each invocation is independent. +- **Inline diff highlighting** — the `diff` lang is in `LANG_MAP` so + `tree-sitter highlight --lang diff` works, but we don't ship custom + ANSI for added/removed lines — tree-sitter's own theme covers it. + +--- + +## 9. Risks + +| Risk | Mitigation | +|---|---| +| `tree-sitter` CLI not on fleet → most users get no highlighting | It's opt-in; default off; status warning on toggle when absent. | +| Highlighter latency on long code blocks (whole-block buffering) | Accepted trade-off vs corrupting output. If painful in practice, add a per-block size cap above which we pass-through unhighlighted. | +| `git diff` on huge changesets blows context budget | Diff output reuses `enforce_budget` eviction (it's just exec output). User can `:diff ` to scope. v2 could add a `--max-bytes` truncation. | +| `git ls-files` in a non-git cwd → falls back to `find`, may pick up node_modules / target / etc. | Document in config example; v2 could honor `.aishignore` or similar. | +| @`..` collides with paths like `@../sibling.txt` | Mitigation: require NO `/` in the token for diff interpretation. Paths with `..` segments use `:diff` explicitly. | +| Project tree injection adds tokens to every broker call | Char cap + opt-in `auto_tree = false` default. Suppressed under Norris. | +| `:highlight on` mid-stream produces inconsistent rendering for the in-flight turn | Toggle takes effect from the NEXT assistant turn. Document this. | + +--- + +## 10. Open Questions (Phase 6) + +| # | Question | Impact | Resolution target | +|---|---|---|---| +| Q-H1 | How to popen `tree-sitter highlight` with simultaneous stdin write + stdout capture (Lua/LuaJIT lacks popen3). Candidates: temp-file roundtrip, shell-pipe wrapper `printf '%s' BODY \| tree-sitter ...` with shell-escape, or use `io.popen("w")` + a second `io.open(output_file)` after the process completes. | Highlighter correctness | Analyze | +| Q-D1 | Should `:diff` honor a per-call confirm gate (it shells out and reads git history; safe but noisy)? | UX | Analyze | +| Q-D2 | Should `@..` accept refs with `/` (`origin/main..feature`)? Doing so means we can't use the no-`/` heuristic to disambiguate from paths. Alternative: require explicit prefix like `@diff:origin/main..feature`. | @-mention grammar | Analyze | +| Q-T1 | When `cfg.project.auto_tree = true`, should the project block update on `cd` (since the cwd changed)? Or stay fixed at startup-cwd? | UX expectation | Analyze | +| Q-T2 | Should `cfg.project` accept a list of include/exclude glob patterns, or just rely on git's .gitignore? | Configurability | Analyze | +| Q-H2 | Should highlighting also apply to user-pasted code (expand_mentions @path), not just assistant output? | Symmetry | Analyze | + +--- + +## 11. Phase 6 → Phase 7+ Out-of-band + +The §11 "Planned Phase Sequence" table in PHASE0.md does not list +phases beyond 6. After Phase 6 lands, candidate next iterations +(non-binding, for the formulate of Phase 7 to confirm): + +- **Phase 7**: secret-redaction wiring into `safety.lua` (#52 + follow-up filed during Phase 5/13 close); session-multiplex / tmux + parity surfaces (out of scope per §12 — explicitly rejected); + or other backlog as it accumulates on Gitea. + +Phase 6 itself is self-contained — none of its three pillars introduce +substrate dependencies on phases not yet planned.