docs/PHASE6: formulate — tree-sitter highlight + diff + project tree

Phase 6 formulate manifest. Three pillars per PHASE0 §11 row 6:

  1. Tree-sitter syntax highlighting hooks
     External `tree-sitter` CLI when present, no-op otherwise.
     Honors PHASE0 §3 (no compiled extensions). Toggleable
     at runtime; off by default so existing UX is unchanged.

  2. Diff-aware code injection
     :diff [args] meta + @<ref1>..<ref2> @-mention extension.
     Shells out to `git diff`; output flows through the existing
     exec-output context channel.

  3. Project-level file-tree context
     :tree meta + optional cfg.project.auto_tree startup inject.
     git ls-files in a repo, find fallback otherwise. Composed
     into the system prompt as a new [project] block between
     [background] and [earlier summary]. Suppressed under Norris
     (R-C1 / R-C4 parity).

Module changes: renderer.lua (fence-aware highlight filter), context.lua
(compose_project), repl.lua (3 new metas, 3 new helpers, expand_mentions
extension). No new module files in v1.

Doc covers: scope + done-when criteria, tech decisions table, module
changes table, per-pillar deep dive with example code, UX surface
summary, out-of-scope list, risks, and 6 open questions to resolve
in analyze (Q-H1/Q-H2 highlighter, Q-D1/Q-D2 diff, Q-T1/Q-T2 tree).

Scope confirmed via AskUserQuestion: all three subsurfaces in scope;
tree-sitter approach is external CLI w/ no-op fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 21:47:00 +00:00
parent d852acadc2
commit f596743834
+416
View File
@@ -0,0 +1,416 @@
# aish — Phase 6 Manifest
**Project:** aish — AI-augmented conversational shell
**Document:** Phase 6 Requirements, Architecture & Design Decisions
**Status:** Formulate (pre-analyze)
**Date:** 2026-05-16
PHASE0 is the locked substrate; PHASE1-5 are layered on top. This manifest
specifies what Phase 6 adds — **tree-sitter syntax highlighting hooks**,
**diff-aware code injection**, and **project-level context (file-tree
summary)**.
---
## 1. Scope of Phase 6
Three pillars per PHASE0.md §11 row 6:
1. **Tree-sitter syntax highlighting hooks** — when an external
`tree-sitter` CLI is detected at startup, assistant code-fence
content is filtered through it for ANSI-colorized display. Plain
prose streams unchanged. When the CLI is absent, the filter is the
identity function (zero overhead, zero hard dependency). Toggleable
at runtime with `:highlight on|off`. Default off until the user
opts in (don't surprise existing users with a display change).
2. **Diff-aware code injection** — surface git diffs as first-class
context. Two entry points:
- Meta verb: `:diff [args]` runs `git diff <args>` from cwd, appends
output to context as exec-output. `:diff staged`, `:diff HEAD~3`,
`:diff main..feature` all delegate to git's argument grammar.
- @-mention extension: `@HEAD..feature` (a ref-range expression
anywhere a `@path` would go) expands inline as a fenced `diff`
block, mirroring how `@README.md` already works.
3. **Project-level context (file-tree summary)**`git ls-files`-based
tree summary of the cwd, injected as a `[project]` block in the
system prompt. Two entry points:
- Meta verb: `:tree [depth]` injects on demand; `:tree refresh`
re-scans.
- Auto-inject at startup when `cfg.project.auto_tree = true`
gated like memory injection so existing configs don't change
behavior.
**Phase 6 is done when:**
- With `tree-sitter` CLI installed and `:highlight on`, the assistant
reply ```py\nprint("hi")\n``` shows up with ANSI colors. Without
the CLI, `:highlight on` is a no-op + emits a status warning.
- `:diff` from a dirty git repo shows the working-tree diff in the
exec-output frame; the model sees it on the next ask_ai turn.
- `@HEAD~1..HEAD` in a prompt expands inline to a fenced diff block.
- `:tree` injects a `[project] <N files>:` block visible in
`ctx:to_messages()` (via the system prompt assembly).
- With `cfg.project.auto_tree = true`, the project block appears on
every broker call (subject to `max_chars` cap).
- Existing configs without `cfg.project` and with `:highlight off`
(default) behave exactly like Phase 5 (Phase 5 regression coverage).
---
## 2. Technology Decisions (delta from Phase 5)
| Decision | Choice | Rationale |
|---|---|---|
| Highlight backend | External `tree-sitter` CLI (`tree-sitter highlight --lang X`) | Honors PHASE0 §3: no compiled extensions, no luarocks. Detected once at startup; absence → identity filter. Opt-in via `:highlight on` so install-state changes don't break users. |
| Highlight buffering | Accumulate inside fenced code blocks, emit on closing fence; pass-through outside fences | Streaming UX preserved for prose. Code blocks get colorized atomically, accepting a per-block latency (~ block streaming time). Per-chunk highlighting would split a token across `tree-sitter` invocations and corrupt the output. |
| Lang detection | First-line fence info-string (` ```py`, ` ```python`, ` ```lua`) → normalized via small map (py→python, js→javascript, etc.) | The lang tag mirrors the one we already emit in `expand_mentions` (#7). No tag → identity (no highlight). |
| Diff backend | Shell out to `git diff <args>` via `executor.exec` | Honors substrate (no libgit2 FFI). The existing exec frame handles capture + stream. `git` is universally present where aish makes sense. |
| Diff failure | Bail with status `[aish] :diff failed (not a git repo / bad ref)`; do NOT inject empty output | Avoids polluting context with stale or empty diffs. |
| Tree backend | `git ls-files --cached --others --exclude-standard` when cwd is a git repo, else `find . -type f -not -path './.*'` | Free `.gitignore` honor in repos; sensible default outside. Both are POSIX-portable. |
| Tree summary form | Sorted relative paths, grouped by directory at depth ≤ `cfg.project.tree_depth` (default 3), truncated by char count `cfg.project.tree_max_chars` (default 4096) | One block, deterministic order, cheap to compute. Matches the [background] memory block convention (Phase 4) so the system prompt's compositional shape stays familiar. |
| Tree injection point | `context.lua`: new `compose_project(...)` adds a `[project] <header>\n<body>` block to the system content, between [background] and [earlier summary] | Same suppression rule as [background]/[earlier summary]: NOT injected during Norris (R-C1 / R-C4 — planner stays on its anchor). |
| Tree refresh policy | One scan at startup if auto; `:tree refresh` to re-scan on demand | Scanning on every ask_ai is wasteful for slow filesystems. Manual refresh is sufficient for v1. |
| @-mention diff syntax | `@<ref>..<ref>` (two `..` separator) only — recognized via the existing trailing-punct peel logic | Avoids ambiguity with literal paths. `@HEAD` alone is NOT a diff trigger (would collide with files literally named HEAD). |
---
## 3. Module Changes
| File | State after Phase 5 | Phase 6 changes |
|---|---|---|
| `renderer.lua` | `assistant_delta(text)` writes chunks; `assistant_flush()` finalizes | Add fence-aware filter inside the assistant stream. State machine: outside-fence (pass-through) / inside-fence (buffer, emit on close). On close, pipe buffer through `tree-sitter highlight --lang <X>` (if highlight enabled), emit result. Toggle exposed as `renderer.set_highlight(bool)`. |
| `executor.lua` | `extract_cmd_lines`, `extract_cmd_bg_lines`, `extract_delegate_lines` | No changes. Diff and tree use the existing `exec` path. |
| `context.lua` | system prompt = base + [background] + [earlier summary] + NORRIS suffix | Add `self.project = "..."` field + `compose_project(self.project)` helper. Injection between [background] and [earlier summary]. Suppressed under Norris. |
| `repl.lua` | meta dispatch + main loop + #13 secrets wiring | New helpers: `_detect_treesitter()` (run once at startup), `_run_git_diff(args)`, `_scan_project_tree(dir, opts)`. New meta: `:highlight`, `:diff`, `:tree`. Extend `expand_mentions` to recognize `<ref>..<ref>` token shape. |
| `config.lua` | example blocks for mcp/safety/memory/routing/secrets/etc. | Add commented-out `project = { auto_tree = false, tree_depth = 3, tree_max_chars = 4096 }` block. |
No new module files in v1. Three new helpers in `repl.lua` keep the
file growing but consolidate the Phase 6 surface. If the highlighter
filter grows past ~80 LOC, lift it into `highlight.lua` as a follow-up.
---
## 4. Pillar 1 — Tree-sitter highlighting
### Detection (startup, once)
```lua
local function _detect_treesitter()
local pipe = io.popen("command -v tree-sitter 2>/dev/null && tree-sitter --version 2>/dev/null")
local ok = pipe and pipe:read("*l") and pipe:close()
return ok
end
```
If not present, `renderer.set_highlight(true)` emits a status warning
and leaves the filter as a no-op. Don't error; the user can install
tree-sitter and re-toggle.
### Stream filter
The filter wraps `renderer.assistant_delta`. State machine:
```
state = "outside" | "inside"
buf = "" -- only used in "inside"
lang = nil -- captured at fence open
push(chunk):
if state == "outside":
look for ```<lang>\n in chunk
if found:
emit chunk up to fence-open
state = "inside"; lang = parsed; buf = chunk after fence-open
else:
emit chunk as-is
if state == "inside":
buf = buf .. chunk
look for \n``` in buf
if found:
fence_body = buf up to closing
rest = buf after closing
emit highlighted(fence_body, lang)
emit closing fence verbatim
emit rest as-is (recurse with state="outside")
state = "outside"; buf = ""
else:
-- still buffering; nothing emitted this push
```
Edge cases: chunk boundary lands inside the fence marker itself
(e.g., chunk ends with ` `` `, next starts with `\n`). The state
machine looks at the cumulative `buf`, so partial markers are
recovered correctly.
`highlighted(body, lang)`:
```
if not highlight_enabled or not lang_map[lang]:
return body
pipe = io.popen("tree-sitter highlight --lang " .. lang_map[lang], "w")
pipe:write(body); pipe:close()
-- HOWEVER: io.popen("w") doesn't read back stdout. We need both:
-- write body to stdin AND capture stdout. Easiest: temp file or
-- /tmp pipe trick. tbd in analyze.
```
**Open Q-H1** (analyze): how to popen for write+read simultaneously
without forkpty. Candidates: temp file roundtrip, `popen` with shell
piping `printf '%s' BODY | tree-sitter highlight | cat`. The shell
pipe is cleanest if we shell-escape body.
### Lang map (v1)
```lua
local LANG_MAP = {
py = "python", python = "python",
lua = "lua",
js = "javascript", javascript = "javascript", ts = "typescript",
sh = "bash", bash = "bash",
c = "c", h = "c", cpp = "cpp", cc = "cpp",
rs = "rust", go = "go", java = "java", rb = "ruby",
md = "markdown", json = "json",
}
```
Reuses the same map as `expand_mentions`. Factor into a shared
helper once both reference it (small `_lang_of_ext()` in repl.lua).
### Toggle
`:highlight` (no arg) → flip. `:highlight on|off` → set explicit.
`:highlight status` → report enabled + whether tree-sitter is present.
Default: off (don't change existing-user UX).
---
## 5. Pillar 2 — Diff-aware code injection
### Meta: `:diff [args]`
- `:diff``git diff` (working tree vs index)
- `:diff staged``git diff --cached`
- `:diff HEAD``git diff HEAD`
- `:diff main..feature``git diff main..feature`
- `:diff <anything else>` → passed verbatim to `git diff <anything>`
Implementation:
```lua
meta.diff = function(args)
args = (args or ""):gsub("^%s+", ""):gsub("%s+$", "")
local cmd = "git diff " .. args
local out, code = executor.exec(cmd)
if code ~= 0 then
renderer.status(("diff failed (exit %d)"):format(code))
return
end
if out == "" or out:gsub("%s", "") == "" then
renderer.status("(no diff)")
return
end
ctx:append_exec_output(("[diff %s]\n%s"):format(
args == "" and "(working tree)" or args, out))
end
```
The `[diff ...]\n<output>` framing matches the `[bg:N exited]` /
`[delegate X]` conventions established in Phase 5 / #6 / #8.
### @-mention: `@<ref1>..<ref2>`
Extends `expand_mentions` (#7). After the existing path-resolution
attempt fails, try interpreting the token as a git diff-range:
```lua
local r1, r2 = path:match("^(.-)%.%.(.+)$")
if r1 and r2 and r1 ~= "" and r2 ~= "" then
-- candidate diff range; try `git diff <r1>..<r2>`
local pipe = io.popen(("git diff %q..%q 2>/dev/null")
:format(r1, r2))
...
end
```
Output replaces the token with:
````
```diff
<content>
```
````
Same fence-with-lang shape as the `@path` expansion.
**Risk:** false-positive on legitimate paths containing `..` like
`@../sibling.txt`. Mitigation: only interpret as diff-range when
the token contains NO `/` (paths have `/`, ref-ranges don't). Refs
with `/` like `origin/main..feature` ARE common — for those, the
user can fall back to `:diff origin/main..feature`.
---
## 6. Pillar 3 — Project file-tree
### Meta: `:tree [depth]`
- `:tree` → scan + inject with default depth and char cap
- `:tree <N>` → override depth for this scan
- `:tree refresh` → re-scan with cached opts
- `:tree off` → clear `ctx.project`
### Scan logic
```lua
local function _scan_project_tree(dir, opts)
opts = opts or {}
local max_chars = opts.max_chars or 4096
local depth = opts.depth or 3
-- Prefer git ls-files for .gitignore honor; fall back to find.
local in_git = os.execute("cd " .. shq(dir) .. " && git rev-parse --git-dir >/dev/null 2>&1") == 0
local listcmd
if in_git then
listcmd = ("cd %s && git ls-files --cached --others --exclude-standard"):format(shq(dir))
else
listcmd = ("find %s -maxdepth %d -type f -not -path '*/\\.*' 2>/dev/null"):format(shq(dir), depth + 1)
end
local pipe = io.popen(listcmd)
if not pipe then return nil, "scan failed" end
local files = {}
for line in pipe:lines() do
-- Depth filter: count `/` separators
local _, slashes = line:gsub("/", "")
if slashes < depth then files[#files + 1] = line end
end
pipe:close()
table.sort(files)
-- Build a tree-ish summary, truncate by char count.
local body = table.concat(files, "\n")
local truncated = false
if #body > max_chars then
body = body:sub(1, max_chars) .. "\n... (truncated)"
truncated = true
end
return body, { file_count = #files, truncated = truncated }
end
```
### Injection
`ctx.project = "..."` (string), composed into the system prompt
between [background] and [earlier conversation summary]:
```
[project] 142 files (truncated at 4096B):
README.md
broker.lua
config.lua
context.lua
...
```
Suppressed under Norris (R-C1 / R-C4 — planner stays focused; the
project context can be re-introduced via the Norris goal text if
needed).
### Auto-inject
`cfg.project.auto_tree = true` runs the scan once at startup and
sets `ctx.project`. Default false (existing configs unchanged).
---
## 7. UX Surface Summary
| Meta | Behavior |
|---|---|
| `:highlight [on/off/status]` | Toggle tree-sitter highlighter (no-op when CLI absent) |
| `:diff [args]` | `git diff <args>`, append output to context as `[diff ...]` |
| `:tree [N/refresh/off]` | Scan/refresh/clear project file-tree block |
| @-mention | Behavior |
|---|---|
| `@path` | Existing (#7) file expansion |
| `@<ref1>..<ref2>` | New: inline `git diff <r1>..<r2>` expansion |
| Config | Default | Effect |
|---|---|---|
| `cfg.project.auto_tree` | `false` | Inject project tree at startup |
| `cfg.project.tree_depth` | `3` | Depth filter for the scan |
| `cfg.project.tree_max_chars` | `4096` | Truncation cap for the injected block |
| (no config flag for `:highlight`) | — | Runtime toggle only; no persistence in v1 |
---
## 8. Out of Scope (Phase 6)
- **Pure-Lua syntax highlighter** — defer to a future phase if
tree-sitter CLI absence becomes a practical pain point. v1 says
"install tree-sitter or accept plain text".
- **bat/glow/chroma integration** — only `tree-sitter` is wired.
Other highlighters can be added behind the same `:highlight` toggle
later (config field `cfg.highlight.backend = "tree-sitter"|"bat"|...`).
- **Smart diff context selection** — no AI-driven "which diff to show".
User explicitly says `:diff <range>` or `@<r1>..<r2>`.
- **File-tree LRU / smart summarization** — v1 is a flat truncated list.
Hierarchical roll-up ("docs/ — 8 files") is a v2 polish.
- **Watching for file changes** — no fs-notify reload. Re-scan via
`:tree refresh`.
- **Diff history** — `:diff` doesn't track its previous invocations.
Each invocation is independent.
- **Inline diff highlighting** — the `diff` lang is in `LANG_MAP` so
`tree-sitter highlight --lang diff` works, but we don't ship custom
ANSI for added/removed lines — tree-sitter's own theme covers it.
---
## 9. Risks
| Risk | Mitigation |
|---|---|
| `tree-sitter` CLI not on fleet → most users get no highlighting | It's opt-in; default off; status warning on toggle when absent. |
| Highlighter latency on long code blocks (whole-block buffering) | Accepted trade-off vs corrupting output. If painful in practice, add a per-block size cap above which we pass-through unhighlighted. |
| `git diff` on huge changesets blows context budget | Diff output reuses `enforce_budget` eviction (it's just exec output). User can `:diff <subdir>` to scope. v2 could add a `--max-bytes` truncation. |
| `git ls-files` in a non-git cwd → falls back to `find`, may pick up node_modules / target / etc. | Document in config example; v2 could honor `.aishignore` or similar. |
| @`<ref1>..<ref2>` collides with paths like `@../sibling.txt` | Mitigation: require NO `/` in the token for diff interpretation. Paths with `..` segments use `:diff` explicitly. |
| Project tree injection adds tokens to every broker call | Char cap + opt-in `auto_tree = false` default. Suppressed under Norris. |
| `:highlight on` mid-stream produces inconsistent rendering for the in-flight turn | Toggle takes effect from the NEXT assistant turn. Document this. |
---
## 10. Open Questions (Phase 6)
| # | Question | Impact | Resolution target |
|---|---|---|---|
| Q-H1 | How to popen `tree-sitter highlight` with simultaneous stdin write + stdout capture (Lua/LuaJIT lacks popen3). Candidates: temp-file roundtrip, shell-pipe wrapper `printf '%s' BODY \| tree-sitter ...` with shell-escape, or use `io.popen("w")` + a second `io.open(output_file)` after the process completes. | Highlighter correctness | Analyze |
| Q-D1 | Should `:diff` honor a per-call confirm gate (it shells out and reads git history; safe but noisy)? | UX | Analyze |
| Q-D2 | Should `@<r1>..<r2>` accept refs with `/` (`origin/main..feature`)? Doing so means we can't use the no-`/` heuristic to disambiguate from paths. Alternative: require explicit prefix like `@diff:origin/main..feature`. | @-mention grammar | Analyze |
| Q-T1 | When `cfg.project.auto_tree = true`, should the project block update on `cd` (since the cwd changed)? Or stay fixed at startup-cwd? | UX expectation | Analyze |
| Q-T2 | Should `cfg.project` accept a list of include/exclude glob patterns, or just rely on git's .gitignore? | Configurability | Analyze |
| Q-H2 | Should highlighting also apply to user-pasted code (expand_mentions @path), not just assistant output? | Symmetry | Analyze |
---
## 11. Phase 6 → Phase 7+ Out-of-band
The §11 "Planned Phase Sequence" table in PHASE0.md does not list
phases beyond 6. After Phase 6 lands, candidate next iterations
(non-binding, for the formulate of Phase 7 to confirm):
- **Phase 7**: secret-redaction wiring into `safety.lua` (#52
follow-up filed during Phase 5/13 close); session-multiplex / tmux
parity surfaces (out of scope per §12 — explicitly rejected);
or other backlog as it accumulates on Gitea.
Phase 6 itself is self-contained — none of its three pillars introduce
substrate dependencies on phases not yet planned.