docs/PHASE6: analyze — substrate probes + Q resolutions in-place

Analyze pass against tree at f596743. All 6 formulate-time questions
resolved without structural changes; pillar shapes intact.

A1. renderer.lua surface clean — assistant_delta/flush accumulate via
    stream_buf; fence-aware filter slots in between chunk receipt and
    emit without touching anything else.

A2. executor.exec via pty.spawn already handles git diff / find;
    cwd-aware (inherits from libc.chdir). No new IO model.

A3. context composition order locked: base + [background] + [earlier
    summary] + NORRIS. [project] inserts between [background] and
    [earlier summary]; Norris-suppression guard inherited.

A4. Q-H1 RESOLVED: tmpfile roundtrip for tree-sitter popen3
    (io.popen("w") + redirect stdout to tmp file; io.open reads back).
    Avoids ARGMAX + shell-escape complexity. Cost ~one syscall per
    code block.

A5. Q-D1 RESOLVED: no confirm gate on :diff. git diff is read-only;
    matches :history / :sessions / :safety check.

A6. Q-D2 RESOLVED: tiered @<token> resolution — file lookup first,
    then ref-range retry when path fails AND token contains "..".
    @origin/main..feature works naturally; @../sibling.txt unaffected.

A7. Q-H2 RESOLVED: highlighter is assistant-output only in v1.
    @-mention echo via readline is a different code path; deferred
    to v2 (added to §8 out-of-scope).

A8. Q-T1 RESOLVED: project tree captured at scan time, not auto-
    refreshed on cd. v1 verb is :tree refresh; cd-intercept auto-
    refresh deferred to v2.

A9. Q-T2 RESOLVED: .gitignore via `git ls-files --exclude-standard`
    in repos; find fallback outside. Custom globs deferred to v2.

A10. expand_mentions punct-peel doesn't strip "/", so HEAD~1..HEAD,
     peels comma cleanly and the diff retry catches the cleaned token.

A11. Auto-injection ordering: memory load → tree scan → first ask_ai.
     Composition reads memory facts before file tree.

A12. [project] Norris-suppressed (parity with R-C1/R-C4).

§3 module-changes table: context.lua row updated (project string +
compose_project + ordering note + Norris suppression). §4 highlighter
code sample replaced with the tmpfile-roundtrip resolved form. §5
@-mention section rewritten as tiered-resolution with worked examples.
§8 out-of-scope gained three v2-polish items (echo highlight, cd-
intercept auto-refresh, custom globs) so they're tracked. §10 Open
Questions table now shows all 6 Qs with their resolutions inline.
§9 Risks row for @-mention collision updated to point at A6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 21:53:58 +00:00
parent f596743834
commit ad52fe4538
+166 -41
View File
@@ -2,9 +2,95 @@
**Project:** aish — AI-augmented conversational shell
**Document:** Phase 6 Requirements, Architecture & Design Decisions
**Status:** Formulate (pre-analyze)
**Status:** Analyze (formulate complete; current tree at `f596743` probed)
**Date:** 2026-05-16
**Analyze findings (2026-05-16):**
A1. **renderer.lua surface clean**`assistant_delta(chunk)` already
concatenates into a `stream_buf` then `emit()`s the chunk;
`assistant_flush()` finalizes with a trailing newline if missing.
The fence-aware highlight filter slots in between chunk receipt and
`emit` without restructuring; no callers besides `repl.lua` touch
`stream_buf` so the filter state can live alongside it.
A2. **executor surface clean**`executor.exec(cmd)` already
forkpty-spawns, captures + live-streams output, returns `(out, code)`.
Phase 6's `:diff` and `_scan_project_tree` reuse this path verbatim;
no new IO model. `git`-rooted commands inherit cwd from the parent
(which `libc.chdir` already mutates), so a `:diff` after `cd` reads
the right repo.
A3. **context composition order locked** — current `to_messages` builds
`sys_content = base + [background] + [earlier summary] + NORRIS suffix`.
Phase 6 inserts `[project]` between `[background]` and `[earlier
summary]`. Same Norris-suppression guard already in place
(`if not self.norris_active`).
A4. **Q-H1 RESOLVED: tmpfile roundtrip** for `tree-sitter highlight`
write+read. Avoids ARGMAX risk on large code blocks (vs `printf
BODY | tree-sitter ...`) and shell-escape complexity. Two file
handles, deterministic cleanup via `os.remove`. Sketch:
```lua
local tmp = os.tmpname()
local w = io.popen(("tree-sitter highlight --lang %s > %s")
:format(lang, tmp), "w")
w:write(body); local _, _, code = w:close()
local f = io.open(tmp, "rb"); local out = f:read("*a"); f:close()
os.remove(tmp)
if code ~= 0 then return body end -- pass-through on failure
return out
```
A5. **Q-D1 RESOLVED: no confirm gate on `:diff`.** `git diff` is
read-only; matches `:history`, `:sessions`, `:safety check` —
none of which gate. Permission DSL (#9) only applies to AI-suggested
`CMD:` lines, not user-issued metas.
A6. **Q-D2 RESOLVED: tiered resolution for `@<token>`.** The mention
parser tries `<token>` as a file path first; if it doesn't resolve
AND the token contains `..`, retry as a diff range. This keeps
`@../sibling.txt` (path) working AND allows `@origin/main..feature`
(ref range — resolves via second attempt since no such file exists).
No grammar prefix needed.
A7. **Q-H2 RESOLVED: highlighting is assistant-output only in v1.**
`expand_mentions` content lands in the user-turn payload — visible
on the terminal via readline echo, not via `assistant_delta`. Filing
"highlight @path-expanded code in echo" as v2 polish. Reason:
intercepting readline echo for ANSI injection is non-trivial and
orthogonal to the stream filter.
A8. **Q-T1 RESOLVED: project tree captured at scan time, not auto-
refreshed on cd.** `cd /other-project` leaves the existing
`ctx.project` stale; `:tree refresh` is the manual verb to update.
Auto-refresh on cd intercept is a v2 polish (the cd interceptor in
`executor.maybe_chdir` is a clean hook for it).
A9. **Q-T2 RESOLVED: rely on `.gitignore` via `git ls-files`** in repos;
fall back to `find` with simple excludes outside. Custom
include/exclude glob lists deferred to v2. Reason: most users live
inside git repos; `.gitignore` already encodes their notion of
"noise". Out-of-repo users get the simple fallback and can scope
via `:tree <depth>`.
A10. **`expand_mentions` punct-peel does NOT strip `/`** — so
`@HEAD~1..HEAD,` peels the `,` and the underlying token `HEAD~1..HEAD`
has no slash; the path-then-diff retry from A6 catches it. No new
peel logic needed.
A11. **Auto-injection ordering for `[project]`** — if both `cfg.memory.
inject_max_chars` and `cfg.project.auto_tree` fire at startup, the
order is: memory load → tree scan → first ask_ai. The composition
in `to_messages` places `[background]` (memory) before `[project]`
so the model reads memory facts before file tree. Documented in §3.
A12. **Norris interaction** — `[project]` block follows the established
[background]/[earlier summary] suppression rule under
`ctx.norris_active`. Planner stays on its goal anchor; the tree
can be re-introduced via the goal text if needed. Matches R-C1/R-C4.
PHASE0 is the locked substrate; PHASE1-5 are layered on top. This manifest
specifies what Phase 6 adds — **tree-sitter syntax highlighting hooks**,
**diff-aware code injection**, and **project-level context (file-tree
@@ -84,7 +170,7 @@ Three pillars per PHASE0.md §11 row 6:
|---|---|---|
| `renderer.lua` | `assistant_delta(text)` writes chunks; `assistant_flush()` finalizes | Add fence-aware filter inside the assistant stream. State machine: outside-fence (pass-through) / inside-fence (buffer, emit on close). On close, pipe buffer through `tree-sitter highlight --lang <X>` (if highlight enabled), emit result. Toggle exposed as `renderer.set_highlight(bool)`. |
| `executor.lua` | `extract_cmd_lines`, `extract_cmd_bg_lines`, `extract_delegate_lines` | No changes. Diff and tree use the existing `exec` path. |
| `context.lua` | system prompt = base + [background] + [earlier summary] + NORRIS suffix | Add `self.project = "..."` field + `compose_project(self.project)` helper. Injection between [background] and [earlier summary]. Suppressed under Norris. |
| `context.lua` | system prompt = base + [background] + [earlier summary] + NORRIS suffix | Add `self.project = "..."` string field + `compose_project(self.project)` helper. Injection between [background] and [earlier summary] (A11: memory facts read before file tree). Suppressed under Norris (A12, parity with R-C1/R-C4). |
| `repl.lua` | meta dispatch + main loop + #13 secrets wiring | New helpers: `_detect_treesitter()` (run once at startup), `_run_git_diff(args)`, `_scan_project_tree(dir, opts)`. New meta: `:highlight`, `:diff`, `:tree`. Extend `expand_mentions` to recognize `<ref>..<ref>` token shape. |
| `config.lua` | example blocks for mcp/safety/memory/routing/secrets/etc. | Add commented-out `project = { auto_tree = false, tree_depth = 3, tree_max_chars = 4096 }` block. |
@@ -147,22 +233,29 @@ Edge cases: chunk boundary lands inside the fence marker itself
machine looks at the cumulative `buf`, so partial markers are
recovered correctly.
`highlighted(body, lang)`:
`highlighted(body, lang)` — resolved per A4 (tmpfile roundtrip):
```
if not highlight_enabled or not lang_map[lang]:
return body
pipe = io.popen("tree-sitter highlight --lang " .. lang_map[lang], "w")
pipe:write(body); pipe:close()
-- HOWEVER: io.popen("w") doesn't read back stdout. We need both:
-- write body to stdin AND capture stdout. Easiest: temp file or
-- /tmp pipe trick. tbd in analyze.
```lua
if not highlight_enabled or not lang_map[lang] then return body end
local tmp = os.tmpname()
local w = io.popen(("tree-sitter highlight --lang %s > %s")
:format(lang_map[lang], tmp), "w")
w:write(body)
local _, _, code = w:close()
local f = io.open(tmp, "rb")
local out = f and f:read("*a") or body
if f then f:close() end
os.remove(tmp)
if code ~= 0 then return body end -- pass-through on highlighter failure
return out
```
**Open Q-H1** (analyze): how to popen for write+read simultaneously
without forkpty. Candidates: temp file roundtrip, `popen` with shell
piping `printf '%s' BODY | tree-sitter highlight | cat`. The shell
pipe is cleanest if we shell-escape body.
The two-handle design avoids the ARGMAX risk of shelling
`printf '%s' BODY | tree-sitter ...` (Linux ARGMAX is ~128KB but
LuaJIT strings can be larger) and sidesteps shell-escape edge cases
(body may contain arbitrary bytes). Cost is one syscall per code block
for the tmp file create/remove cycle — negligible vs the highlighter
invocation itself.
### Lang map (v1)
@@ -222,36 +315,48 @@ end
The `[diff ...]\n<output>` framing matches the `[bg:N exited]` /
`[delegate X]` conventions established in Phase 5 / #6 / #8.
### @-mention: `@<ref1>..<ref2>`
### @-mention: `@<ref1>..<ref2>` — tiered resolution (A6)
Extends `expand_mentions` (#7). After the existing path-resolution
attempt fails, try interpreting the token as a git diff-range:
Extends `expand_mentions` (#7) by adding a SECOND resolution attempt
when the first (path lookup) fails AND the token contains `..`:
```lua
local r1, r2 = path:match("^(.-)%.%.(.+)$")
if r1 and r2 and r1 ~= "" and r2 ~= "" then
-- candidate diff range; try `git diff <r1>..<r2>`
local pipe = io.popen(("git diff %q..%q 2>/dev/null")
:format(r1, r2))
...
-- Existing path-attempt block ends with content = _read_truncated(path)
-- which returns nil if no such file. Add the diff retry there:
if not content and path:find("..", 1, true) then
local r1, r2 = path:match("^(.-)%.%.(.+)$")
if r1 and r2 and r1 ~= "" and r2 ~= "" then
local pipe = io.popen(("git diff %s..%s 2>/dev/null")
:format(shq(r1), shq(r2)))
local diff = pipe and pipe:read("*a") or ""
local _, _, code = pipe and pipe:close()
if code == 0 and diff:match("%S") then
content = diff
-- Note: language tag becomes "diff" regardless of path lang
lang_override = "diff"
end
end
end
```
Output replaces the token with:
````
```diff
```diff path=<r1>..<r2>
<content>
```
````
Same fence-with-lang shape as the `@path` expansion.
**Risk:** false-positive on legitimate paths containing `..` like
`@../sibling.txt`. Mitigation: only interpret as diff-range when
the token contains NO `/` (paths have `/`, ref-ranges don't). Refs
with `/` like `origin/main..feature` ARE common — for those, the
user can fall back to `:diff origin/main..feature`.
Tiered resolution semantics:
- `@README.md` → file lookup succeeds → file expansion
- `@../sibling.txt` → file lookup succeeds → file expansion
- `@HEAD~1..HEAD` → file lookup fails, `..` present, ref-range succeeds → diff
- `@origin/main..feature` → file lookup fails (no such file), `..` present,
ref-range succeeds → diff. The token has `/` in `r1` but `git diff` accepts
it as a ref; no `/`-based heuristic needed (resolves Q-D2).
- `@nonexistent-file..but-also-not-a-ref` → both fail; literal token
preserved with the existing `[aish] @X: not found` status path.
---
@@ -372,6 +477,23 @@ sets `ctx.project`. Default false (existing configs unchanged).
`tree-sitter highlight --lang diff` works, but we don't ship custom
ANSI for added/removed lines — tree-sitter's own theme covers it.
- **Highlighter on @-mention echo** (v2 polish per A7) — `:highlight`
applies to assistant output only. Highlighting user-pasted code as
it's echoed by readline would need a separate hook in the readline
display path; out of scope here.
- **Auto-refresh project tree on `cd`** (v2 polish per A8) — the cd
interceptor in `executor.maybe_chdir` is a clean place to call
`_scan_project_tree(libc.getcwd(), ...)` on every successful cd.
Skipped in v1 because the scan can be slow on large trees; manual
refresh via `:tree refresh` is the v1 verb.
- **Custom include/exclude globs for project tree** (v2 polish per A9) —
`cfg.project = { include = {...}, exclude = {...} }` would extend
beyond `.gitignore`. v1 ships with `.gitignore`-only honor (via
`git ls-files --exclude-standard`) plus the `find` fallback for
non-repo cwds.
---
## 9. Risks
@@ -382,7 +504,7 @@ sets `ctx.project`. Default false (existing configs unchanged).
| Highlighter latency on long code blocks (whole-block buffering) | Accepted trade-off vs corrupting output. If painful in practice, add a per-block size cap above which we pass-through unhighlighted. |
| `git diff` on huge changesets blows context budget | Diff output reuses `enforce_budget` eviction (it's just exec output). User can `:diff <subdir>` to scope. v2 could add a `--max-bytes` truncation. |
| `git ls-files` in a non-git cwd → falls back to `find`, may pick up node_modules / target / etc. | Document in config example; v2 could honor `.aishignore` or similar. |
| @`<ref1>..<ref2>` collides with paths like `@../sibling.txt` | Mitigation: require NO `/` in the token for diff interpretation. Paths with `..` segments use `:diff` explicitly. |
| @`<ref1>..<ref2>` collides with paths like `@../sibling.txt` | A6: tiered resolution — try as path first; only fall through to diff retry when path lookup fails AND token contains `..`. `@../sibling.txt` hits the path branch and never reaches the diff retry. |
| Project tree injection adds tokens to every broker call | Char cap + opt-in `auto_tree = false` default. Suppressed under Norris. |
| `:highlight on` mid-stream produces inconsistent rendering for the in-flight turn | Toggle takes effect from the NEXT assistant turn. Document this. |
@@ -390,14 +512,17 @@ sets `ctx.project`. Default false (existing configs unchanged).
## 10. Open Questions (Phase 6)
| # | Question | Impact | Resolution target |
|---|---|---|---|
| Q-H1 | How to popen `tree-sitter highlight` with simultaneous stdin write + stdout capture (Lua/LuaJIT lacks popen3). Candidates: temp-file roundtrip, shell-pipe wrapper `printf '%s' BODY \| tree-sitter ...` with shell-escape, or use `io.popen("w")` + a second `io.open(output_file)` after the process completes. | Highlighter correctness | Analyze |
| Q-D1 | Should `:diff` honor a per-call confirm gate (it shells out and reads git history; safe but noisy)? | UX | Analyze |
| Q-D2 | Should `@<r1>..<r2>` accept refs with `/` (`origin/main..feature`)? Doing so means we can't use the no-`/` heuristic to disambiguate from paths. Alternative: require explicit prefix like `@diff:origin/main..feature`. | @-mention grammar | Analyze |
| Q-T1 | When `cfg.project.auto_tree = true`, should the project block update on `cd` (since the cwd changed)? Or stay fixed at startup-cwd? | UX expectation | Analyze |
| Q-T2 | Should `cfg.project` accept a list of include/exclude glob patterns, or just rely on git's .gitignore? | Configurability | Analyze |
| Q-H2 | Should highlighting also apply to user-pasted code (expand_mentions @path), not just assistant output? | Symmetry | Analyze |
All six formulate-time Qs were resolved in analyze (A4A9). None remain
open as blockers for implementation.
| # | Question | Resolution |
|---|---|---|
| Q-H1 | popen3 for `tree-sitter highlight` | A4: tmpfile roundtrip — `io.popen("w")` writes body with stdout redirected to a tmp file, then `io.open` reads the file. Avoids ARGMAX + shell-escape complexity. |
| Q-D1 | Confirm gate on `:diff`? | A5: no. `git diff` is read-only; matches `:history` / `:sessions` / `:safety check` (none gate). Permission DSL (#9) applies only to AI-suggested `CMD:` lines. |
| Q-D2 | `@<r1>..<r2>` with refs containing `/` | A6: tiered resolution — file lookup first, then if it fails AND `..` is present, retry as ref-range. `@origin/main..feature` naturally falls through to the retry; no grammar prefix needed. |
| Q-T1 | `cfg.project.auto_tree` update on cd | A8: no auto-refresh in v1. `:tree refresh` is the manual verb; cd-intercept hook is documented as v2 polish in §8. |
| Q-T2 | Custom include/exclude globs | A9: rely on `.gitignore` via `git ls-files` in repos; `find` fallback outside. Custom globs deferred to v2. |
| Q-H2 | Highlighting on @-mention echo | A7: assistant-output only in v1. Echo via readline is a different code path; deferred to v2 (see §8). |
---