docs/PHASE6: plan — fold B1/B3/B4 + add §12 commit roadmap

Status header: Analyze -> Plan.

Baseline findings folded into the design sections:

  §1 (highlighter pillar) gains B4: tree-sitter absent on every
  probed host; :highlight on emits install-hint when missing.

  §4 (highlighter sketch) revised per B3: io.popen():close() doesn't
  expose exit codes in LuaJIT. Route via executor.exec("cat tmp |
  tree-sitter ...") which uses pty.spawn+waitpid and returns code
  reliably. Tmpfile design retained (avoids ARGMAX + shell-escape).

  §5 (:diff impl + @<r1>..<r2> retry) revised per B1: every git
  invocation must use `--no-pager -c color.ui=never` to suppress
  the color/keypad/line-clear escapes forkpty triggers. Factored
  recommendation: helper `_git_clean_cmd(subcmd)` shared by :diff
  and the @-mention diff retry.

New §12 Implementation Plan — 6 commits, bottom-up:

  1. context.lua: ctx.project + compose_project + composition order
  2. repl.lua: _scan_project_tree helper + :tree meta
  3. repl.lua: :diff meta + _git_clean_cmd helper (B1)
  4. repl.lua: expand_mentions tiered resolution (@<r1>..<r2> per A6)
  5. renderer.lua + repl.lua: tree-sitter detect + fence filter +
     :highlight meta (B3-revised tmpfile dispatch)
  6. config.lua project example + status -> Implement

Per-commit risk index + smoke criteria. Highlighter (commit 5) is
the largest experimental surface — placed last so the rest of Phase 6
ships even if highlighter slips. Order is independent enough that
swapping 3<->4 or 5<->6 doesn't break anything; bottom-up keeps each
commit individually green.

Things deliberately not split: _shq reuse, lang map duplication for
v1, streaming-rehydration order (rehydrate -> highlight -> emit
inherits naturally from existing chunk pipeline).

Two items open at plan time, resolve at implement: _scan_project_tree
dir-arg vs hardcoded getcwd; :highlight status probing
tree-sitter --print-langs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 22:01:40 +00:00
parent 9f50206ca6
commit 4407029296
+183 -23
View File
@@ -2,7 +2,7 @@
**Project:** aish — AI-augmented conversational shell
**Document:** Phase 6 Requirements, Architecture & Design Decisions
**Status:** Analyze (formulate complete; current tree at `f596743` probed)
**Status:** Plan (formulate + analyze + baseline complete; tree at `9f50206`)
**Date:** 2026-05-16
**Analyze findings (2026-05-16):**
@@ -109,6 +109,10 @@ Three pillars per PHASE0.md §11 row 6:
identity function (zero overhead, zero hard dependency). Toggleable
at runtime with `:highlight on|off`. Default off until the user
opts in (don't surprise existing users with a display change).
Per B4: tree-sitter is **absent on every fleet host probed**;
`:highlight on` when the CLI is missing emits a status that names
the install hint (`apt install tree-sitter` / `cargo install
tree-sitter-cli`) rather than silently falling back to identity.
2. **Diff-aware code injection** — surface git diffs as first-class
context. Two entry points:
@@ -233,29 +237,36 @@ Edge cases: chunk boundary lands inside the fence marker itself
machine looks at the cumulative `buf`, so partial markers are
recovered correctly.
`highlighted(body, lang)`resolved per A4 (tmpfile roundtrip):
`highlighted(body, lang)`**B3-revised** (supersedes A4):
```lua
if not highlight_enabled or not lang_map[lang] then return body end
local tmp = os.tmpname()
local w = io.popen(("tree-sitter highlight --lang %s > %s")
:format(lang_map[lang], tmp), "w")
w:write(body)
local _, _, code = w:close()
local f = io.open(tmp, "rb")
local out = f and f:read("*a") or body
if f then f:close() end
local f = io.open(tmp, "wb")
if not f then return body end
f:write(body); f:close()
-- B3: LuaJIT io.popen():close() returns (true, nil, nil) regardless of
-- exit code. Route via executor.exec which uses pty.spawn+waitpid and
-- returns (out, exit_code) reliably.
local out, code = executor.exec(
("cat %s | tree-sitter highlight --lang %s")
:format(_shq(tmp), lang_map[lang]))
os.remove(tmp)
if code ~= 0 then return body end -- pass-through on highlighter failure
return out
```
The two-handle design avoids the ARGMAX risk of shelling
`printf '%s' BODY | tree-sitter ...` (Linux ARGMAX is ~128KB but
LuaJIT strings can be larger) and sidesteps shell-escape edge cases
(body may contain arbitrary bytes). Cost is one syscall per code block
for the tmp file create/remove cycle — negligible vs the highlighter
invocation itself.
Why this shape (and not the formulate-time A4 sketch):
- **A4 assumed Lua 5.2+ popen-close-with-code.** LuaJIT (5.1 contract)
doesn't expose the exit status via `io.popen(...):close()`. Baseline
B3 caught this; the only reliable exit-code path in our substrate
is `executor.exec` (pty.spawn + waitpid).
- The tmpfile stays — avoids ARGMAX on `printf '%s' BODY |` and
sidesteps shell-escape edge cases on arbitrary code-block bytes.
- Cost is one syscall round (tmpfile create/remove) plus one pty
spawn per code block — negligible vs the highlighter latency.
- `_shq` is the existing shell-quote helper from #3 (pre/post hooks).
### Lang map (v1)
@@ -292,12 +303,16 @@ Default: off (don't change existing-user UX).
- `:diff main..feature``git diff main..feature`
- `:diff <anything else>` → passed verbatim to `git diff <anything>`
Implementation:
Implementation**B1-revised** (must disable pager + color):
```lua
meta.diff = function(args)
args = (args or ""):gsub("^%s+", ""):gsub("%s+$", "")
local cmd = "git diff " .. args
-- B1: forkpty makes git think it's interactive, enabling color
-- ANSI + DEC keypad/line-clear escapes that pollute the injected
-- context block. --no-pager kills the keypad sequences; --color=
-- never kills the color codes. Both are required.
local cmd = "git --no-pager -c color.ui=never diff " .. args
local out, code = executor.exec(cmd)
if code ~= 0 then
renderer.status(("diff failed (exit %d)"):format(code))
@@ -315,6 +330,11 @@ end
The `[diff ...]\n<output>` framing matches the `[bg:N exited]` /
`[delegate X]` conventions established in Phase 5 / #6 / #8.
The same `--no-pager -c color.ui=never` prefix applies to the
`@<r1>..<r2>` resolution path in the next section, and to any
future git verbs we add (`:log`, `:show`, etc.). Factor into a
helper `_git_clean_cmd(subcmd)` if multiple call sites accumulate.
### @-mention: `@<ref1>..<ref2>` — tiered resolution (A6)
Extends `expand_mentions` (#7) by adding a SECOND resolution attempt
@@ -327,12 +347,15 @@ when the first (path lookup) fails AND the token contains `..`:
if not content and path:find("..", 1, true) then
local r1, r2 = path:match("^(.-)%.%.(.+)$")
if r1 and r2 and r1 ~= "" and r2 ~= "" then
local pipe = io.popen(("git diff %s..%s 2>/dev/null")
:format(shq(r1), shq(r2)))
local diff = pipe and pipe:read("*a") or ""
local _, _, code = pipe and pipe:close()
if code == 0 and diff:match("%S") then
content = diff
-- B1: --no-pager + color=never (same as the :diff meta path).
-- B3: io.popen close() doesn't expose exit codes — use the
-- file-redirect trick OR executor.exec. Here we want a quick
-- best-effort and the cost of an extra forkpty is acceptable.
local out, code = executor.exec(
("git --no-pager -c color.ui=never diff %s..%s 2>/dev/null")
:format(shq(r1), shq(r2)))
if code == 0 and out:match("%S") then
content = out
-- Note: language tag becomes "diff" regardless of path lang
lang_override = "diff"
end
@@ -539,3 +562,140 @@ phases beyond 6. After Phase 6 lands, candidate next iterations
Phase 6 itself is self-contained — none of its three pillars introduce
substrate dependencies on phases not yet planned.
---
## 12. Implementation Plan (commit-by-commit)
Bottom-up ordering: foundations first (context.lua field + composer),
then the diff and tree surfaces that have no display-layer risk, then
the highlighter (largest experimental surface — last so the rest of
Phase 6 ships even if highlighter slips). Each commit leaves the tree
green (existing tests pass + smoke ok) and adds a discrete capability.
### Order
1. **`context.lua` — `[project]` block plumbing.** Add `self.project`
(string, nil-allowed) on `Context.new`. Add `compose_project(text)`
helper mirroring `compose_background` / `compose_summary`. In
`to_messages`: insert between `compose_background` and
`compose_summary` so the read order is memory → project tree →
earlier-summary → NORRIS. Suppressed under `self.norris_active`
(parity with R-C1 / R-C4). No behavior change yet — nothing sets
`ctx.project`. Smoke: `:to_messages()` still empty when project nil.
2. **`repl.lua` — `_scan_project_tree` helper + `:tree` meta.**
- `_scan_project_tree(dir, opts)` per §6: `git ls-files --cached
--others --exclude-standard` in a repo, `find . -maxdepth N
-type f -not -path '*/\.*'` outside. Returns `(body, info)`
where `info = { file_count, truncated }`.
- `:tree [N|refresh|off]` meta: scans cwd, sets `ctx.project`,
emits status with file count + truncation note.
- `cfg.project.auto_tree` startup hook: if true, run `_scan` once
and set `ctx.project` (before the main loop opens). Default
false (existing configs unchanged).
- Update HELP with `:tree` lines.
- Smoke: in the aish repo, `:tree` injects a ~32-file block;
`:to_messages()` shows the `[project]` block in the system prompt.
3. **`repl.lua` — `:diff` meta + `_git_clean_cmd` helper (B1).**
- `_git_clean_cmd(subcmd_and_args)` returns the `git --no-pager
-c color.ui=never <subcmd_and_args>` prefix. Used by `:diff`
and the `@<r1>..<r2>` path in commit #4.
- `:diff [args]` meta per §5 (B1-revised): runs the clean git
command via `executor.exec`, appends `[diff <args>]\n<out>`
to context as exec_output. Empty / non-repo / bad-ref paths
emit status and skip.
- Update HELP with `:diff` line.
- Smoke: `:diff` from a dirty aish checkout injects the working
tree diff; `:diff staged` works; `:diff junkref` emits status
and skips.
4. **`repl.lua` — `expand_mentions` tiered resolution (A6).**
Extend the existing path-resolution loop with the diff-retry
branch from §5: if `_read_truncated` returns nil AND the token
contains `..`, parse as `<r1>..<r2>` and try `_git_clean_cmd(
"diff <r1>..<r2>")`. On success, replace with a fenced `diff`
block. Preserves existing peel-on-trailing-punct logic. Smoke:
`@HEAD~1..HEAD` expands inline; `@origin/main..feature` works
when the ref exists; `@../sibling.txt` still resolves as file.
5. **`renderer.lua` + `repl.lua` — tree-sitter highlighter.**
This commit is the largest single change in Phase 6. Substeps:
a. `_detect_treesitter()` in repl.lua: one-shot popen of
`command -v tree-sitter && tree-sitter --version`. Stash
result on a local.
b. `renderer.lua` — fence-aware state machine wrapping
`assistant_delta`. Exports `renderer.set_highlight(enabled,
detected)` so repl.lua wires the toggle + cli-availability
flags. State: `outside` (pass-through) / `inside` (buffer until
closing fence). On close: call `highlighted(body, lang)` and
emit. Algorithm per §4; bytes-of-cumulative-buf scan as B2
requires for fragment-across-boundary fences.
c. `highlighted(body, lang)` per §4 (B3-revised): write body to
`os.tmpname()`, invoke via `executor.exec("cat tmp |
tree-sitter highlight --lang X")`, capture out + exit code,
cleanup tmp, pass-through on failure.
d. `:highlight [on|off|status]` meta in repl.lua. `:highlight on`
when CLI absent → status with install hint (B4); `:highlight
status` always reports current toggle + CLI availability.
e. HELP update; PHASE6 status → Implement.
6. **`config.lua` + docs/PHASE6 status bump.**
- Add commented-out `project = { auto_tree = false, tree_depth = 3,
tree_max_chars = 4096 }` block in config.lua (parity with the
Phase 1-5 example blocks).
- PHASE6.md status header → **Implement** (matches Phase 5
cadence — manifest tracks implementation state).
### Risk index per commit
| Commit | Risk | Mitigation |
|---|---|---|
| 1 (compose_project) | Composition-order regression breaks Phase 4/5 callers | Order test: empty memory + empty project = identical sys_content to pre-Phase-6 baseline |
| 2 (:tree) | `find` fallback picks up node_modules / target / build / etc. | Document in status warning; users in non-repo cwds scope via `:tree <depth>` |
| 3 (:diff) | B1 — color/keypad codes leak if a future caller forgets the helper | All call sites must go through `_git_clean_cmd`; lint by grep before commit |
| 4 (@<r1>..<r2>) | False positive on `@../sibling.txt` when no such file exists | A6's tiered resolution: only retry as diff when file lookup fails. `@../sibling.txt` resolves as path; if the path is missing, diff retry runs and naturally fails — same outcome as before |
| 5 (highlighter) | Fence detector misclassifies inline ` ` ``` ` ` triple-backtick in prose | State machine triggers on `^```` at the start of a line OR following a newline only; mid-line backticks don't open a fence. Document in §4. |
| 5 (highlighter) | tmpfile race / leak on crash | `os.remove(tmp)` in normal exit path; OS cleans `/tmp/lua_*` files on reboot. Single-user trust per PHASE0 §12. |
| 6 (config bump) | none — pure docs / commented config |
### Tests + smoke per commit
Each commit must:
- Pass `luajit test_safety.lua` (87/87) and `luajit test_router_model.lua` (31/31)
- Load cleanly: `luajit -e 'package.path="./?.lua;./vendor/?.lua;"..package.path; require("repl"); print("ok")'`
- Pass a feature-specific smoke (described per row above)
No new test framework dependency. Per-feature unit tests can live as
inline `luajit -e '...'` blocks in commit messages or as a dedicated
`test_phase6.lua` if the surface area justifies it (decide at impl-time).
### Things deliberately NOT split into a separate commit
- `_shq` (shell-quote helper) — already exists in repl.lua from #3.
Reuse in commit 5 (highlighter); no new helper.
- Lang map — small enough to copy locally in commit 5 (~15 lines);
the existing `_lang_of(path)` in `expand_mentions` uses a similar
but smaller map. Factor only if a third caller appears.
- Streaming-rehydration interaction with the highlighter — `secrets_session`
rehydrate runs BEFORE the highlight filter in the chunk pipeline.
Order: `chunk → rehydrator:push → highlight_filter → emit`. The
highlighter operates on plain text only; rehydrated placeholders
resolve to real values which the highlighter sees as code. No
special wiring needed.
### Open at plan-time (resolve at implement)
- Whether `_scan_project_tree` should honor a per-call `opts.dir`
override (so a future feature like "scan `<other-dir>`" lands cheaply)
vs hardcoding `libc.getcwd()`. Default to taking `dir` as arg;
the `:tree` meta passes `libc.getcwd()` explicitly.
- Whether `:highlight status` should also probe `tree-sitter --print-langs`
to show which langs are actually available. Nice-to-have; defer
unless install paths produce variable lang sets in practice.