From 4407029296fb72d0be320b7865212d89b66c94cc Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sat, 16 May 2026 22:01:40 +0000 Subject: [PATCH] =?UTF-8?q?docs/PHASE6:=20plan=20=E2=80=94=20fold=20B1/B3/?= =?UTF-8?q?B4=20+=20add=20=C2=A712=20commit=20roadmap?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Status header: Analyze -> Plan. Baseline findings folded into the design sections: §1 (highlighter pillar) gains B4: tree-sitter absent on every probed host; :highlight on emits install-hint when missing. §4 (highlighter sketch) revised per B3: io.popen():close() doesn't expose exit codes in LuaJIT. Route via executor.exec("cat tmp | tree-sitter ...") which uses pty.spawn+waitpid and returns code reliably. Tmpfile design retained (avoids ARGMAX + shell-escape). §5 (:diff impl + @.. retry) revised per B1: every git invocation must use `--no-pager -c color.ui=never` to suppress the color/keypad/line-clear escapes forkpty triggers. Factored recommendation: helper `_git_clean_cmd(subcmd)` shared by :diff and the @-mention diff retry. New §12 Implementation Plan — 6 commits, bottom-up: 1. context.lua: ctx.project + compose_project + composition order 2. repl.lua: _scan_project_tree helper + :tree meta 3. repl.lua: :diff meta + _git_clean_cmd helper (B1) 4. repl.lua: expand_mentions tiered resolution (@.. per A6) 5. renderer.lua + repl.lua: tree-sitter detect + fence filter + :highlight meta (B3-revised tmpfile dispatch) 6. config.lua project example + status -> Implement Per-commit risk index + smoke criteria. Highlighter (commit 5) is the largest experimental surface — placed last so the rest of Phase 6 ships even if highlighter slips. Order is independent enough that swapping 3<->4 or 5<->6 doesn't break anything; bottom-up keeps each commit individually green. Things deliberately not split: _shq reuse, lang map duplication for v1, streaming-rehydration order (rehydrate -> highlight -> emit inherits naturally from existing chunk pipeline). Two items open at plan time, resolve at implement: _scan_project_tree dir-arg vs hardcoded getcwd; :highlight status probing tree-sitter --print-langs. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/PHASE6.md | 206 +++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 183 insertions(+), 23 deletions(-) diff --git a/docs/PHASE6.md b/docs/PHASE6.md index bcc090b..8c97f99 100644 --- a/docs/PHASE6.md +++ b/docs/PHASE6.md @@ -2,7 +2,7 @@ **Project:** aish — AI-augmented conversational shell **Document:** Phase 6 Requirements, Architecture & Design Decisions -**Status:** Analyze (formulate complete; current tree at `f596743` probed) +**Status:** Plan (formulate + analyze + baseline complete; tree at `9f50206`) **Date:** 2026-05-16 **Analyze findings (2026-05-16):** @@ -109,6 +109,10 @@ Three pillars per PHASE0.md §11 row 6: identity function (zero overhead, zero hard dependency). Toggleable at runtime with `:highlight on|off`. Default off until the user opts in (don't surprise existing users with a display change). + Per B4: tree-sitter is **absent on every fleet host probed**; + `:highlight on` when the CLI is missing emits a status that names + the install hint (`apt install tree-sitter` / `cargo install + tree-sitter-cli`) rather than silently falling back to identity. 2. **Diff-aware code injection** — surface git diffs as first-class context. Two entry points: @@ -233,29 +237,36 @@ Edge cases: chunk boundary lands inside the fence marker itself machine looks at the cumulative `buf`, so partial markers are recovered correctly. -`highlighted(body, lang)` — resolved per A4 (tmpfile roundtrip): +`highlighted(body, lang)` — **B3-revised** (supersedes A4): ```lua if not highlight_enabled or not lang_map[lang] then return body end local tmp = os.tmpname() -local w = io.popen(("tree-sitter highlight --lang %s > %s") - :format(lang_map[lang], tmp), "w") -w:write(body) -local _, _, code = w:close() -local f = io.open(tmp, "rb") -local out = f and f:read("*a") or body -if f then f:close() end +local f = io.open(tmp, "wb") +if not f then return body end +f:write(body); f:close() +-- B3: LuaJIT io.popen():close() returns (true, nil, nil) regardless of +-- exit code. Route via executor.exec which uses pty.spawn+waitpid and +-- returns (out, exit_code) reliably. +local out, code = executor.exec( + ("cat %s | tree-sitter highlight --lang %s") + :format(_shq(tmp), lang_map[lang])) os.remove(tmp) if code ~= 0 then return body end -- pass-through on highlighter failure return out ``` -The two-handle design avoids the ARGMAX risk of shelling -`printf '%s' BODY | tree-sitter ...` (Linux ARGMAX is ~128KB but -LuaJIT strings can be larger) and sidesteps shell-escape edge cases -(body may contain arbitrary bytes). Cost is one syscall per code block -for the tmp file create/remove cycle — negligible vs the highlighter -invocation itself. +Why this shape (and not the formulate-time A4 sketch): + +- **A4 assumed Lua 5.2+ popen-close-with-code.** LuaJIT (5.1 contract) + doesn't expose the exit status via `io.popen(...):close()`. Baseline + B3 caught this; the only reliable exit-code path in our substrate + is `executor.exec` (pty.spawn + waitpid). +- The tmpfile stays — avoids ARGMAX on `printf '%s' BODY |` and + sidesteps shell-escape edge cases on arbitrary code-block bytes. +- Cost is one syscall round (tmpfile create/remove) plus one pty + spawn per code block — negligible vs the highlighter latency. +- `_shq` is the existing shell-quote helper from #3 (pre/post hooks). ### Lang map (v1) @@ -292,12 +303,16 @@ Default: off (don't change existing-user UX). - `:diff main..feature` → `git diff main..feature` - `:diff ` → passed verbatim to `git diff ` -Implementation: +Implementation — **B1-revised** (must disable pager + color): ```lua meta.diff = function(args) args = (args or ""):gsub("^%s+", ""):gsub("%s+$", "") - local cmd = "git diff " .. args + -- B1: forkpty makes git think it's interactive, enabling color + -- ANSI + DEC keypad/line-clear escapes that pollute the injected + -- context block. --no-pager kills the keypad sequences; --color= + -- never kills the color codes. Both are required. + local cmd = "git --no-pager -c color.ui=never diff " .. args local out, code = executor.exec(cmd) if code ~= 0 then renderer.status(("diff failed (exit %d)"):format(code)) @@ -315,6 +330,11 @@ end The `[diff ...]\n` framing matches the `[bg:N exited]` / `[delegate X]` conventions established in Phase 5 / #6 / #8. +The same `--no-pager -c color.ui=never` prefix applies to the +`@..` resolution path in the next section, and to any +future git verbs we add (`:log`, `:show`, etc.). Factor into a +helper `_git_clean_cmd(subcmd)` if multiple call sites accumulate. + ### @-mention: `@..` — tiered resolution (A6) Extends `expand_mentions` (#7) by adding a SECOND resolution attempt @@ -327,12 +347,15 @@ when the first (path lookup) fails AND the token contains `..`: if not content and path:find("..", 1, true) then local r1, r2 = path:match("^(.-)%.%.(.+)$") if r1 and r2 and r1 ~= "" and r2 ~= "" then - local pipe = io.popen(("git diff %s..%s 2>/dev/null") - :format(shq(r1), shq(r2))) - local diff = pipe and pipe:read("*a") or "" - local _, _, code = pipe and pipe:close() - if code == 0 and diff:match("%S") then - content = diff + -- B1: --no-pager + color=never (same as the :diff meta path). + -- B3: io.popen close() doesn't expose exit codes — use the + -- file-redirect trick OR executor.exec. Here we want a quick + -- best-effort and the cost of an extra forkpty is acceptable. + local out, code = executor.exec( + ("git --no-pager -c color.ui=never diff %s..%s 2>/dev/null") + :format(shq(r1), shq(r2))) + if code == 0 and out:match("%S") then + content = out -- Note: language tag becomes "diff" regardless of path lang lang_override = "diff" end @@ -539,3 +562,140 @@ phases beyond 6. After Phase 6 lands, candidate next iterations Phase 6 itself is self-contained — none of its three pillars introduce substrate dependencies on phases not yet planned. + +--- + +## 12. Implementation Plan (commit-by-commit) + +Bottom-up ordering: foundations first (context.lua field + composer), +then the diff and tree surfaces that have no display-layer risk, then +the highlighter (largest experimental surface — last so the rest of +Phase 6 ships even if highlighter slips). Each commit leaves the tree +green (existing tests pass + smoke ok) and adds a discrete capability. + +### Order + +1. **`context.lua` — `[project]` block plumbing.** Add `self.project` + (string, nil-allowed) on `Context.new`. Add `compose_project(text)` + helper mirroring `compose_background` / `compose_summary`. In + `to_messages`: insert between `compose_background` and + `compose_summary` so the read order is memory → project tree → + earlier-summary → NORRIS. Suppressed under `self.norris_active` + (parity with R-C1 / R-C4). No behavior change yet — nothing sets + `ctx.project`. Smoke: `:to_messages()` still empty when project nil. + +2. **`repl.lua` — `_scan_project_tree` helper + `:tree` meta.** + - `_scan_project_tree(dir, opts)` per §6: `git ls-files --cached + --others --exclude-standard` in a repo, `find . -maxdepth N + -type f -not -path '*/\.*'` outside. Returns `(body, info)` + where `info = { file_count, truncated }`. + - `:tree [N|refresh|off]` meta: scans cwd, sets `ctx.project`, + emits status with file count + truncation note. + - `cfg.project.auto_tree` startup hook: if true, run `_scan` once + and set `ctx.project` (before the main loop opens). Default + false (existing configs unchanged). + - Update HELP with `:tree` lines. + - Smoke: in the aish repo, `:tree` injects a ~32-file block; + `:to_messages()` shows the `[project]` block in the system prompt. + +3. **`repl.lua` — `:diff` meta + `_git_clean_cmd` helper (B1).** + - `_git_clean_cmd(subcmd_and_args)` returns the `git --no-pager + -c color.ui=never ` prefix. Used by `:diff` + and the `@..` path in commit #4. + - `:diff [args]` meta per §5 (B1-revised): runs the clean git + command via `executor.exec`, appends `[diff ]\n` + to context as exec_output. Empty / non-repo / bad-ref paths + emit status and skip. + - Update HELP with `:diff` line. + - Smoke: `:diff` from a dirty aish checkout injects the working + tree diff; `:diff staged` works; `:diff junkref` emits status + and skips. + +4. **`repl.lua` — `expand_mentions` tiered resolution (A6).** + Extend the existing path-resolution loop with the diff-retry + branch from §5: if `_read_truncated` returns nil AND the token + contains `..`, parse as `..` and try `_git_clean_cmd( + "diff ..")`. On success, replace with a fenced `diff` + block. Preserves existing peel-on-trailing-punct logic. Smoke: + `@HEAD~1..HEAD` expands inline; `@origin/main..feature` works + when the ref exists; `@../sibling.txt` still resolves as file. + +5. **`renderer.lua` + `repl.lua` — tree-sitter highlighter.** + This commit is the largest single change in Phase 6. Substeps: + + a. `_detect_treesitter()` in repl.lua: one-shot popen of + `command -v tree-sitter && tree-sitter --version`. Stash + result on a local. + + b. `renderer.lua` — fence-aware state machine wrapping + `assistant_delta`. Exports `renderer.set_highlight(enabled, + detected)` so repl.lua wires the toggle + cli-availability + flags. State: `outside` (pass-through) / `inside` (buffer until + closing fence). On close: call `highlighted(body, lang)` and + emit. Algorithm per §4; bytes-of-cumulative-buf scan as B2 + requires for fragment-across-boundary fences. + + c. `highlighted(body, lang)` per §4 (B3-revised): write body to + `os.tmpname()`, invoke via `executor.exec("cat tmp | + tree-sitter highlight --lang X")`, capture out + exit code, + cleanup tmp, pass-through on failure. + + d. `:highlight [on|off|status]` meta in repl.lua. `:highlight on` + when CLI absent → status with install hint (B4); `:highlight + status` always reports current toggle + CLI availability. + + e. HELP update; PHASE6 status → Implement. + +6. **`config.lua` + docs/PHASE6 status bump.** + - Add commented-out `project = { auto_tree = false, tree_depth = 3, + tree_max_chars = 4096 }` block in config.lua (parity with the + Phase 1-5 example blocks). + - PHASE6.md status header → **Implement** (matches Phase 5 + cadence — manifest tracks implementation state). + +### Risk index per commit + +| Commit | Risk | Mitigation | +|---|---|---| +| 1 (compose_project) | Composition-order regression breaks Phase 4/5 callers | Order test: empty memory + empty project = identical sys_content to pre-Phase-6 baseline | +| 2 (:tree) | `find` fallback picks up node_modules / target / build / etc. | Document in status warning; users in non-repo cwds scope via `:tree ` | +| 3 (:diff) | B1 — color/keypad codes leak if a future caller forgets the helper | All call sites must go through `_git_clean_cmd`; lint by grep before commit | +| 4 (@..) | False positive on `@../sibling.txt` when no such file exists | A6's tiered resolution: only retry as diff when file lookup fails. `@../sibling.txt` resolves as path; if the path is missing, diff retry runs and naturally fails — same outcome as before | +| 5 (highlighter) | Fence detector misclassifies inline ` ` ``` ` ` triple-backtick in prose | State machine triggers on `^```` at the start of a line OR following a newline only; mid-line backticks don't open a fence. Document in §4. | +| 5 (highlighter) | tmpfile race / leak on crash | `os.remove(tmp)` in normal exit path; OS cleans `/tmp/lua_*` files on reboot. Single-user trust per PHASE0 §12. | +| 6 (config bump) | none — pure docs / commented config | + +### Tests + smoke per commit + +Each commit must: +- Pass `luajit test_safety.lua` (87/87) and `luajit test_router_model.lua` (31/31) +- Load cleanly: `luajit -e 'package.path="./?.lua;./vendor/?.lua;"..package.path; require("repl"); print("ok")'` +- Pass a feature-specific smoke (described per row above) + +No new test framework dependency. Per-feature unit tests can live as +inline `luajit -e '...'` blocks in commit messages or as a dedicated +`test_phase6.lua` if the surface area justifies it (decide at impl-time). + +### Things deliberately NOT split into a separate commit + +- `_shq` (shell-quote helper) — already exists in repl.lua from #3. + Reuse in commit 5 (highlighter); no new helper. +- Lang map — small enough to copy locally in commit 5 (~15 lines); + the existing `_lang_of(path)` in `expand_mentions` uses a similar + but smaller map. Factor only if a third caller appears. +- Streaming-rehydration interaction with the highlighter — `secrets_session` + rehydrate runs BEFORE the highlight filter in the chunk pipeline. + Order: `chunk → rehydrator:push → highlight_filter → emit`. The + highlighter operates on plain text only; rehydrated placeholders + resolve to real values which the highlighter sees as code. No + special wiring needed. + +### Open at plan-time (resolve at implement) + +- Whether `_scan_project_tree` should honor a per-call `opts.dir` + override (so a future feature like "scan ``" lands cheaply) + vs hardcoding `libc.getcwd()`. Default to taking `dir` as arg; + the `:tree` meta passes `libc.getcwd()` explicitly. +- Whether `:highlight status` should also probe `tree-sitter --print-langs` + to show which langs are actually available. Nice-to-have; defer + unless install paths produce variable lang sets in practice.