Six findings from probing the world before tree-sitter / diff / project
tree implementation lands:
B1. `git` subcommands through executor.exec emit ANSI color + DEC
keypad/line-clear escapes by default (forkpty enables interactive
mode). `:diff` impl MUST use `git --no-pager --color=never <args>`.
Same flags apply to any future git verbs.
B2. SSE chunk size envelope: local llama.cpp delivers tiny chunks
(median 4 chars, max 13) AND splits code fences across boundaries
(`'``'` then `'`'`). Cloud (Anthropic via OpenRouter) delivers
big chunks (median 26 chars), fences intact. The §4 fence-aware
filter accumulator design covers both — confirmed necessary by
local-model behavior.
B3. **LuaJIT io.popen():close() does NOT return exit codes** — Lua
5.1 contract, not 5.2+. Breaks the A4 highlighter resolution.
Revised: route via `executor.exec("cat tmp | tree-sitter ...")`
which uses pty.spawn + waitpid and returns (out, code) reliably.
B4. tree-sitter CLI absent on both probed hosts (noether, higgs).
Highlighter is opt-in by design; absent-CLI path should emit a
clear install hint, not silently no-op.
B5. Project-tree envelope: aish 32 files / 449 chars; similar local
repos 15-25 files; scan time ~1-5ms. The 4096-char default cap
accommodates ~290 typical paths. Large repos handled via
tree_depth or cap tuning per existing §9 risk row.
B6. os.tmpname returns POSIX /tmp/lua_XXXXXX paths; acceptable for
the B3-revised tmpfile-roundtrip pattern.
No structural changes to formulate/analyze. B1, B3, B4 will fold into
PHASE6.md §4 / §5 / §1 during plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8.2 KiB
Phase 6 Baseline — pre-implementation measurements
Date: 2026-05-16
Tree probed: ad52fe4 (Phase 5 + #2/#3/#4/#5/#6/#7/#8/#9/#10/#11/#13/#14/#23/#32/#33/#51/#52 follow-up).
Hosts probed: noether (primary), higgs (Pi5).
Broker probed: hossenfelder.fritz.box:8082 (local qwen-coder-7b-snappy-8k, cloud anthropic/claude-haiku-4.5).
This is the Phase 7 (verify) anchor for Phase 6. Captures the world just before tree-sitter / diff / project-tree implementation lands.
B1. git output through executor.exec carries ANSI + terminal control
executor.exec uses pty.spawn (forkpty). When git's stdout is a
PTY, git enables both color output AND interactive pager defaults
(DEC keypad mode \27[?1h= ... \27[?1l>, line-clear \27[K).
Observation:
> executor.exec("git diff --stat HEAD~1..HEAD")
exit=0 len=173
\27[?1h= docs/PHASE6.md | 207 \27[32m++...\27[m\27[31m--...\27[m\27[m
1 file changed, 166 insertions(+), 41 deletions(-)\27[m
\27[K\27[?1l>
With --no-pager: keypad sequences gone, color stays:
> executor.exec("git --no-pager diff --stat HEAD~1..HEAD")
exit=0 len=148
docs/PHASE6.md | 207 \27[32m++...\27[m\27[31m--...\27[m
1 file changed, 166 insertions(+), 41 deletions(-)
With --no-pager --color=never: clean.
> executor.exec("git --no-pager diff --color=never --stat HEAD~1..HEAD")
exit=0 len=132 clean=true
docs/PHASE6.md | 207 +++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 166 insertions(+), 41 deletions(-)
Implication for §5 (:diff meta): the implementation MUST use
both --no-pager and --color=never. Without either, the injected
context block carries escape codes that confuse the model AND inflate
token counts.
The same flags apply to any future git log / git show / git blame
verbs that might land beyond Phase 6.
B2. SSE chunk size envelope (relevant to fence-aware highlighter)
renderer.assistant_delta receives whatever chunks the broker streams.
Measured against two model classes:
Local llama.cpp (qwen-coder-7b-snappy-8k)
prompt: "reply with a python code block that prints hello world,
then a brief explanation"
max_tokens: 150
chunks: 97
total: 423 chars
sizes: min=1, max=13, median=4
fences: fence at char 58 -> chunk 14 ('```')
fence at char 91 -> chunk 23 ('``') <-- split fence
The local model splits fences across chunks ('``' arrives, the
final ` is in the next chunk). The fence-aware filter MUST handle
fragment-across-boundary correctly.
Cloud (anthropic/claude-haiku-4.5 via OpenRouter)
prompt: "write a 5-line python hello world example wrapped in a code fence"
max_tokens: 150
chunks: 3
total: 60 chars
sizes: 7 / 27 / 26
fences: fence at char 0 -> chunk 0 ('```python\n# Hello World in')
fence at char 57 -> chunk 2 ('\nprint("Hello, World!")\n```')
Cloud delivers BIG chunks (median ~26 chars); fences typically arrive intact within a single chunk.
Implication for §4 (highlight stream filter): the state machine
must accumulate enough buf to detect a fence opening or closing
even when only '``' arrives in a chunk. The §4 design already
specifies "look at the cumulative buf, so partial markers are
recovered correctly" — confirmed necessary by local-model behavior.
B3. LuaJIT io.popen():close() does NOT expose exit codes
This is a divergence from Lua 5.2+ behavior assumed by the §4 (A4) highlighter resolution:
> luajit -e "for _, cmd in ipairs({'true','false','exit 7'}) do
local p = io.popen(cmd); local ok, err, code = p:close()
print(cmd, ok, err, code) end"
true true nil nil
false true nil nil
exit 7 true nil nil
io.popen():close() returns (true, nil, nil) regardless of child
exit status. The exit code is silently discarded.
Revised Q-H1 resolution (supersedes A4): the highlighter must
detect tree-sitter failure via a different channel. Cleanest path:
write the body to a tmpfile, then invoke the highlighter via
executor.exec("cat tmpfile | tree-sitter highlight --lang X").
executor.exec uses its own forkpty + waitpid path and DOES return
(out, exit_code) reliably.
Updated sketch:
local function highlighted(body, lang)
if not highlight_enabled or not lang_map[lang] then return body end
local tmp = os.tmpname()
local f = io.open(tmp, "wb")
if not f then return body end
f:write(body); f:close()
local out, code = executor.exec(
("cat %s | tree-sitter highlight --lang %s")
:format(_shq(tmp), lang_map[lang]))
os.remove(tmp)
if code ~= 0 then return body end
return out
end
Cost: tmp file write + read + remove + one executor.exec roundtrip per code block. Acceptable; tree-sitter highlighter latency dominates.
This finding will fold into PHASE6.md §4 during the analyze revision (or as a baseline-time amendment).
B4. tree-sitter CLI presence on the fleet
noether (local primary): ABSENT (which tree-sitter -> not found)
higgs (Pi5 / Debian 13): ABSENT (which tree-sitter -> not found)
Implication for §1 (scope): the design's "external CLI when present, no-op otherwise" decision is the right call — on the fleet as-tested, ZERO hosts ship tree-sitter by default. Users who want highlighting will need to opt in explicitly (apt / cargo / manual install).
Documentation should mention this clearly in PHASE6 implementation
notes + the config example. :highlight on against a host without
the CLI should emit a clear "tree-sitter CLI not found; install with
e.g. apt install tree-sitter or cargo install tree-sitter-cli"
status, not silently no-op.
B5. Project-tree envelope (git ls-files performance)
> time git -C /home/mfritsche/src/aish ls-files --cached --others --exclude-standard >/dev/null
real 0.002s
files: 32, total: 449 chars, avg/file: 14
Sampling other repos on noether (~/src/* with .git/):
| Repo | Files | Time |
|---|---|---|
| aish | 32 | 2 ms |
| ampere-fourier | 15 | 5 ms |
| ampere-kernel-decoders | 23 | 1 ms |
| cfw | 25 | (similar) |
Implication for §6 (:tree scan):
- Scan latency on typical local repos is negligible (<10ms).
- The 4096-char default
tree_max_charscap accommodates ~290 paths at the observed avg of 14 chars/path — fine for most aish-target workflows. - Repos with thousands of files (kernel, nix-pkgs, etc.) WILL exceed
the cap; users can lower
tree_depthor raise the cap. The §9 risk row already covers this; no design change needed.
B6. os.tmpname() behavior
> luajit -e "for i = 1, 3 do print(os.tmpname()) end"
/tmp/lua_qAGTFV
/tmp/lua_RhpXLK
/tmp/lua_F9WtYx
LuaJIT's os.tmpname returns POSIX-style /tmp/lua_XXXXXX paths.
Adequate for B3's tmpfile-roundtrip pattern. No filesystem-level race
window — os.tmpname uses mkstemp(3) semantics on Linux (returns
a unique name; the caller is responsible for io.open and cleanup).
Note: B3's pattern does f:write(body); f:close() between the name
and use — the open-with-O_EXCL guarantee from mkstemp is implicit
via Lua's io.open. Acceptable for a local-only tmpfile holding
short-lived code-block content; not a security concern (we trust the
local user per PHASE0 §12).
Summary
| Finding | Affects | Resolution |
|---|---|---|
| B1 git ANSI/pager leakage | §5 :diff impl |
Add --no-pager --color=never to every git invocation |
| B2 SSE chunk envelope | §4 fence filter | Existing accumulator design is correct; local-model split-fence case confirmed necessary |
| B3 io.popen no exit code | §4 (A4) highlighter | Revise: route via executor.exec("cat tmp | tree-sitter ...") for reliable exit code |
| B4 no tree-sitter on fleet | §1 / docs | Highlighter is opt-in; absent-CLI emits install-hint status |
| B5 tree scan envelope | §6 :tree |
No change; defaults fit observed repo sizes |
| B6 os.tmpname semantics | §4 highlighter | Confirmed adequate for tmpfile-roundtrip |
No structural changes to the formulate/analyze design. B1, B3, and B4 surface as implementation-time amendments to PHASE6.md sections §4, §5, and §1 respectively. Will fold these into the manifest during plan.