Commit Graph

46 Commits

Author SHA1 Message Date
marfrit 467e573d24 docs/PHASE8: review fold-in — 2 BLOCKERs + 4 CONCERNs + 4 NITs
Sonnet-reviewed per reviews-use-sonnet memory directive.

BLOCKERs (RESOLVED in-place):

R1. §5 estimate_tokens pseudocode missing per-turn cache pattern.
    Prose described it; code block called tokenize_fn unconditionally.
    Implementer following code verbatim would hit the O(N round-
    trips per call) perf gap the prose flagged. Code block now
    shows explicit `if t._tokens then ... else t._tokens = ... end`.

R2. enforce_budget loop can spin forever when system_prompt alone
    exceeds token_budget (e.g. 5KB project block + budget=4096 +
    zero turns -> turns can't shrink further but OR-condition stays
    true). Fix: AND `#self.turns > 0` guard on the loop. §13 commit
    3 row shows the explicit Lua-syntax condition.

CONCERNs (FOLDED):

R3. :cost detail per-slot ~est=N annotation was semantically
    undefined — accumulator sum (cumulative across calls + evicted
    turns) vs current-snapshot estimate are incommensurable. §6
    reworked: ONE trailing summary line "[estimated session ctx:
    N tokens; token_budget=M (X% used)]" instead of per-slot
    annotations. §13 commit 4 aligned.

R4. tokenize_fn closure MUST reference active_cfg as upvalue (NOT
    capture by value). Subtle but easy to miss — §13 commit 4 now
    spells out the correct vs wrong patterns explicitly.

R5. 2s tokenize timeout can spuriously cache-as-unsupported when
    llama.cpp is busy with a concurrent completion (single-threaded
    inference; /tokenize queues behind). Documented in §9; v1
    ships 2s, revisit during verify if it bites.

R6. Per-endpoint cache key conflated two same-endpoint/different-
    model presets (B1: /tokenize ignores the model field). Cache
    key simplified to endpoint-only. One probe per endpoint per
    session; if a future broker honors the model field, revisit.

NITs (APPLIED):

N1. §13 commit 3 `OR`/`AND` -> Lua-syntax `or`/`and`.
N2. §10 Q-T5 Resolution-target cell filled in (was blank after B1).
N3. §6 / §8 / §13 commit 4 now describe a CONSISTENT approach
    (trailing summary line; per-slot annotation dropped).
N4. Status header tree-hash updated to current (aa64ad3 -> stays
    fresh through review fold-in; commit 5 will refresh again
    at "Implement" status).

PHASE8.md now 622 lines (was 454 after plan). +168/-61. Ready for
implementation phase 6 of the inner loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:28:27 +00:00
marfrit aa64ad3eec docs/PHASE8: plan — §13 commit roadmap (5 commits)
Status: Analyze -> Plan. All open Qs resolved (Q-T5 via baseline B1).

5-commit roadmap, bottom-up:

  1. broker.lua — M.token_count helper + per-endpoint capability
     cache. <endpoint>/tokenize probe with 2s timeout; cache true/false
     per (endpoint, model) for the session. char/4 fallback on any
     non-200 / parse-fail / transport err. M.tokenize_supported
     introspection helper.

  2. context.lua — Context.new accepts opts.tokenize_fn; estimate_
     tokens widens to use it when set, with per-turn `_tokens` cache.
     char/4 path unchanged when tokenize_fn nil.

  3. context.lua — enforce_budget consults token_budget too (pillar
     5 from A1). Loop condition: turns>max_turns OR estimate_tokens
     >token_budget. Existing summarize-on-evict callback unchanged.

  4. repl.lua — wire tokenize_fn when cfg.tokenize.use_endpoint=true.
     Closure captures active_cfg upval (A5 — follows :model switches
     naturally). :cost detail extension: trailing line showing
     estimated session ctx tokens for comparison with the per-slot
     prompt_tokens sums in the accumulator.

  5. config.lua commented `tokenize = { use_endpoint = true }`
     example + PHASE8.md status -> Implement.

Per-commit risk index covers: probe latency cap (2s, one-shot),
per-turn cache correctness (immutable post-append), enforce_budget
performance (O(N) per call after cache fill), and the intentional
behavior change of token_budget actually being enforced (sessions
fitting under char/4 may evict earlier under accurate counts —
documented in §9).

Two items open at plan, resolve at implement:
  - exact :cost detail layout for estimated session ctx row
  - whether to add a :tokenize debug meta (defer unless useful in verify)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:24:41 +00:00
marfrit 79bd40db79 docs/PHASE8-baseline: live /tokenize probes
Four findings, all align with formulate/analyze:

B1. /tokenize IGNORES the `model` request field — returns the
    tokenization of whichever model is currently loaded on the
    proxy backend, NOT the requested model. Acceptable: a real BPE
    count is still much better than char/4, and the gap between
    Qwen/Llama tokenizers is small. Cloud (OpenRouter) 404s
    regardless, so cloud falls back to char/4 via the capability
    cache.

B2. Latency 23-34ms per call, FLAT across input sizes 50-5000 chars.
    Network round-trip dominates. Per-turn _tokens cache amortizes
    to O(1); worst case 40 cached turns × ~30ms = 1.2s one-time
    cost on first enforce_budget call. Acceptable.

B3. Response shape confirmed: `{"tokens":[N1,N2,...]}` (token IDs;
    we use #response.tokens for count, discard the IDs). JSON not
    SSE; ffi.curl.M.post is the right call.

B4. Cloud /tokenize 404s as expected. Capability cache marks it
    unsupported on first probe; char/4 fallback silent thereafter.
    No design change.

Q-T5 RESOLVED per B1. All open questions now resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:22:05 +00:00
marfrit 1a136d81b7 docs/PHASE8: analyze — adds pillar 5 (enforce_budget honors token_budget)
Status: Formulate -> Analyze. 12 findings (A1-A12); 5/6 open Qs
resolved in-place (Q-T5 deferred to baseline).

MAJOR FINDING:

A1. enforce_budget ONLY checks max_turns, NOT token_budget — even
    with accurate tokenization, eviction decisions are unaffected.
    The new estimate_tokens() would just feed the prompt template
    display. Pillar 5 added: enforce_budget evicts when EITHER
    max_turns OR token_budget is exceeded. This is the real
    motivation for accurate tokenization.

Other findings:

A2.  ffi.curl.M.post signature confirmed (body, status) / (nil, err).
A3.  Single caller of estimate_tokens today; enforce_budget becomes
     the second (more frequent) caller — per-turn _tokens cache
     becomes important.
A4.  Q-T1: cache lives on turn dict; dies with turns on :reset.
A5.  Q-T2: closure captures active_cfg upval; follows :model switch
     naturally.
A6.  Q-T3: opt-out skips the probe entirely (no wiring).
A7.  Q-T6: tools-schema tokens deferred to follow-up (fixed per
     session; under-count bounded).
A8.  _tokens cache invalidation: only :reset; turn content is
     immutable after append.
A9.  Probe latency ~50ms/call locally; per-turn cache amortizes to
     O(1) after first count.
A10. estimate_tokens called OUTSIDE streaming callback; no race.
A11. role:"tool" turns tokenize identically; per-turn cache works.
A12. include_usage (Phase 7) and tokenize (Phase 8) are orthogonal —
     different endpoints, different code paths.

§1 expanded to 5 pillars (pillar 5 = enforce_budget extension).
§3 context.lua row updated to reference the enforce_budget change
+ per-turn _tokens cache. §9 risk row added: accurate counts mean
the default token_budget=4096 is finally ENFORCED — sessions that
spilled silently under char/4 may now evict earlier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:21:24 +00:00
marfrit 00869ba412 docs/PHASE8: formulate — accurate tokenization (resolves Q1)
Phase 8 formulate manifest + PHASE0 §11 amendment to add the Phase 8
row (substrate amendment per CLAUDE.md §3 lands same commit).

Four pillars:

  1. Per-endpoint /tokenize probe (cached). One round-trip on first
     call per (endpoint, model); capability cached for session.
     hossenfelder + llama.cpp expose <endpoint>/tokenize (NOT /v1/
     tokenize — per real probe; the path is endpoint-local, not
     under the OpenAI /v1 prefix). Cloud (OpenRouter) 404s — silent
     char/4 fallback.

  2. broker.token_count(model_cfg, text) — thin wrapper; tries probe,
     falls back to char/4 on miss. Always returns non-negative int;
     never errors. 2s tight timeout; failures cache as not-supported.

  3. Context:estimate_tokens widened. Accepts optional tokenize_fn at
     Context.new; uses it when present, char/4 otherwise. repl.lua
     wires `tokenize_fn = function(text) return broker.token_count(
     active_cfg, text) end` when cfg.tokenize.use_endpoint = true.
     Per-turn _tokens cache to amortize across estimate calls.

  4. :cost detail est-vs-actual annotation. When the heuristic
     disagrees with the actual prompt_tokens from broker usage by
     >10%, show `~est=N`. Silent otherwise. Display-only; no
     behavior change.

Resolves Q1 (PHASE0 §13, originally Phase 3) — replace char/4
heuristic on Context:estimate_tokens. Originally targeted at Phase 3
but deferred forward each iteration; now lands.

Baseline already observed during formulate:
  - /v1/tokenize -> 404 on hossenfelder; /tokenize -> works
  - Body shape: {content: "..."} returns {tokens: [N1, N2, ...]}
  - Accuracy gap: char/4 UNDERESTIMATES by ~10% on real code/prose
    (508 vs 558 on a 2KB README sample). Material for context-
    budget eviction decisions.

Doc covers scope + done-when, tech decisions table, module changes,
per-pillar deep dives, UX surface, out of scope, 6 risk rows, 6
open questions (Q-T4/T5 baseline-bound, others analyze-bound).

Scope confirmed via AskUserQuestion: tokenization (chosen over
cross-session cost persistence and hard rate-limit enforcement).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:19:53 +00:00
marfrit 1f34b6dce8 config + docs/PHASE7: example block + status -> Implement (Phase 7 commit #6)
R9-resolved single-owner of the status bump (commit #5 didn't touch
PHASE7.md). N5: PHASE0 §11 amendment landed in commit 3bad07b
(formulate); not re-applied here.

config.lua:
  - Commented-out `cost = { warn_at_dollars, warn_at_tokens }` block
    with parity to the Phase 1-6 example blocks.
  - Notes warn flags are independent (R4) and per-turn usage flows
    to session/*.jsonl for after-the-fact analysis.

docs/PHASE7.md:
  - Status header bumped: "Plan + review fold-in" -> "Implement"
  - Lists the 6 implement commits inline for traceability:
      7364963  broker: usage capture + opts widening
      7b4a9be  context: accumulator helpers
      8adebd5  repl: _record_usage + opts.category at 5 sites
      b30212a  safety + repl: opts.category for Norris + probe
      0d6ff93  repl: :cost meta surface
      this     config example + status bump

Phase 7 implementation is complete. Next inner-loop step is verify
(7) — user-driven smoke tests, then memory-update (8).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:02:55 +00:00
marfrit d4c20f09df docs/PHASE7: review fold-in — 3 BLOCKERs + 6 CONCERNs + 5 NITs
Sonnet-reviewed (per the reviews-use-sonnet feedback memory).

BLOCKERs (RESOLVED in-place):

R1. M.chat would silently return (text, nil) for ALL non-streaming
    callers — 4 of 5 categories (summarize/delegate/memory_summarize/
    probe) flow through broker.chat, NOT chat_stream. §4 now shows
    the explicit M.chat update that captures kind=="usage" alongside
    "text" and returns (text, usage).

R2. call_broker fallback retry would credit usage to the wrong model
    name. Fix: broker emits payload.model = model_cfg.model (which IS
    the fallback's name when called with fb_cfg — chat_stream's
    upvar). Wrapper keys by payload.model, NOT outer model_name. §4
    + §13 commit 3 reflect.

R3. build_request has TWO internal callers inside broker.lua itself,
    not just the public surface. Plan §13 commit 1 risk row now
    spells this out explicitly so the implementer doesn't read "every
    caller already passes opts" as "external-only".

CONCERNs (FOLDED):

R4. Single cost_warn_fired flag covers two thresholds — first-to-fire
    suppresses the other. Split into ctx.cost_warn_state = { dollars
    = false, tokens = false }; :cost reset clears both. §7 + §13.

R5. Warn-check centralization — single _record_usage helper in
    repl.lua wraps ctx:add_usage AND does threshold check. safety.lua
    routes via helpers.on_usage / opts.on_usage callbacks. context.lua
    stays decoupled from renderer.

R6. Preserve nil-vs-0 cost distinction. Accumulator slot gains
    `is_local = true` (sticky) when ANY recorded usage had cost==nil.
    `:cost detail` annotation comes from is_local flag, not a
    fragile cost==0 heuristic.

R7. :cost detail sort needs 3-level deterministic key:
    (cost desc, model asc, category asc) — table.sort is unstable.

R8. call_broker fallback passes opts.include_usage unchanged.
    Documented as known assumption (B1 confirms both backends
    accept; future-broken fallback can pass include_usage=false).

R9. :resume does NOT restore historical usage_totals. Per-turn usage
    IS in session JSONL for scripting; cross-session aggregation is
    Q-C2 deferred. Documented in §8.

R10. $%.4f loses sub-cent precision (cloud cost 0.000028 -> $0.0000).
     Widened to $%.6f in §6 + §7 warn message format.

NITs (APPLIED):

N1. §4 pseudocode comment notes `if doc.usage` branch is independent
    of choice branch (handles both B2 emission shapes).
N2. §2 stale "B7" reference corrected to B3.
N3. §13 commit 3 row gains explicit dependency note on commit 1's R1.
N4. §13 commit 4 spells out llm_probe -> llm_second_opinion ->
    M.is_destructive signature chain widening.
N5. §3 + §13 commit 6 — PHASE0 §11 amendment already in tree
    (3bad07b); commit 6 must NOT re-apply.

PHASE7.md now 803 lines (was 528 after plan). +275/-57. Ready for
implementation phase pending user gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:55:59 +00:00
marfrit 0f14dc1727 docs/PHASE7: plan — §13 commit roadmap
Status: Analyze -> Plan.

Q-C4 was the last open question pending baseline; now resolved per
B1 (stream_options accepted by both backends; required for local).

§13 Implementation Plan added — 6 commits, bottom-up:

  1. broker.lua: usage extraction from final SSE chunk; build_request
     signature widening to (model_cfg, msgs, stream, opts); on_delta
     ("usage", payload); chat returns (text, usage); opts.category
     passthrough.

  2. context.lua: usage_totals + cost_warn_fired fields; add_usage /
     total_cost / total_tokens helpers; :reset preserves both.

  3. repl.lua: wire opts.category at 5 non-Norris call sites (main,
     delegate x2, summarize, memory_summarize); on_delta("usage")
     branch routes to ctx:add_usage.

  4. safety.lua: wire opts.category for Norris main broker + is_
     destructive LLM probe; helpers.on_usage callback convention
     (no new module dep — matches #52's scrub_msgs pattern).

  5. repl.lua: :cost meta surface + warn-threshold check + HELP.

  6. config.lua: commented cost example block + PHASE7.md status
     bump to Implement.

Per-commit risk index covers signature-change blast radius, missed
call-site lint, and warn-flag one-shot semantics. Lua's multi-
return semantics keep broker.chat backwards-compat automatic.

Two items left open at plan, resolve at implement:
  - is_destructive opts.on_usage vs cfg.helpers threading
  - per-turn verbose mode (deferred; v1 = :cost on demand only)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:50:39 +00:00
marfrit 2244a3f1ee docs/PHASE7-baseline: live broker probes for usage shape
Real probes against hossenfelder.fritz.box:8082 against both backends.
Five findings, all align with the formulate/analyze design — no
structural changes.

B1. `stream_options.include_usage = true` is safely accepted by
    both backends. REQUIRED for local llama.cpp to emit usage;
    no-op for cloud (which emits anyway). Default-true is correct.

B2. Two emission patterns observed:
    - Cloud (Bedrock): usage rides the FINAL delta chunk with
      non-empty `choices` carrying finish_reason.
    - Local: usage rides a SEPARATE chunk with `choices: []`
      preceding `[DONE]`.
    Both shapes are handled by the same `if doc.usage then ...`
    check; the existing on_event choices-branch short-circuits
    safely when choices is empty.

B3. `cost` field is dollar-denominated (number) and cloud-only.
    Local returns `timings` instead (perf, not cost). Accumulator
    captures `usage.cost` as-is; nil treated as 0. :cost detail
    annotates local lines so $0 isn't misread.

B4. `doc.model` in the usage event reflects the upstream-API-version
    (e.g., Bedrock rewrites `anthropic/claude-haiku-4.5` to
    `anthropic/claude-4.5-haiku-20251001`). Accumulator keys by
    caller-intended `model_cfg.model`, NOT `doc.model`, for stable
    cross-call comparison.

B5. Usage event is always the LAST data event before `[DONE]`.
    Emission of `on_delta("usage", ...)` happens after curl.post_sse
    returns — one call per stream, after all text + tool_calls.

Q-C4 RESOLVED: hossenfelder forwards `stream_options.include_usage`
to all backends correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:49:53 +00:00
marfrit f0bccdec48 docs/PHASE7: analyze — probe broker surface + resolve Qs in-place
Status: Formulate -> Analyze (tree at 3bad07b probed).

11 findings (A1-A11), 5/6 open Qs resolved (Q-C4 deferred to baseline):

A1.  broker.chat_stream surface clean — usage capture via closure-local
     + on_delta("usage") emission after curl.post_sse returns.
A2.  7 caller sites for opts.category threading (probe / norris /
     summarize / main / delegate x2 / memory_summarize).
A3.  build_request signature widens to (model_cfg, msgs, stream, opts)
     to absorb tools / max_tokens / include_usage / stream_options
     without further positional growth.
A4.  Q-C3 RESOLVED: free-form categories (caller decides); matches
     Phase 6 helpers/skills convention.
A5.  Q-C5 RESOLVED: warn fires on the call that crossed (no NEXT-call
     delay).
A6.  Q-C6 RESOLVED: :reset does NOT clear cost_warn_fired; only
     :cost reset clears.
A7.  Norris call-graph rewires (commit 955bd82) — secrets streaming
     rehydrator wraps only "text" kind; new "usage" kind passes
     through unchanged. No new entanglement.
A8.  ctx.usage_totals survives :reset (R8 parity with memory_items,
     project).
A9.  Session JSONL inherits the new field automatically (dkjson
     opaque encoding).
A10. Q-C1 PARTIAL: defensive silent skip when provider omits usage.
     Real probe required for local model — baseline action.
A11. Q-C4 deferred to baseline (real broker probe).

§2 build_request row updated to mention the A3 refactor.
§11 Open Qs table now shows all 6 with resolutions; only Q-C4
remains as a baseline-time probe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:49:03 +00:00
marfrit 3bad07b2da docs/PHASE7: formulate — cost / usage observability
Phase 7 formulate manifest + PHASE0 §11 amendment to add the Phase 7
row (substrate amendment per CLAUDE.md §3, lands in the same commit).

Four pillars:

  1. Usage capture in broker.chat_stream — extract `usage` from the
     final SSE chunk (OpenAI streaming spec with `stream_options:
     {include_usage: true}`). Surface via new on_delta("usage",
     payload) kind. broker.chat returns (text, usage) — backward-
     compat: existing callers ignore the second value.

  2. Per-session accumulator on ctx — ctx.usage_totals[model][category]
     tables (categories: main / delegate / summarize / memory_summarize
     / probe / norris, tagged at the call site via opts.category).
     :reset preserves usage_totals (R8 parity with memory_items /
     project). Session JSONL gains an optional `usage` field on
     assistant turns for after-the-fact analysis.

  3. :cost meta surface — :cost (summary), :cost detail (per-model +
     per-category breakdown), :cost reset (zero the meter). Pure-Lua
     read of ctx.usage_totals; no broker calls.

  4. Optional warn thresholds — cfg.cost.warn_at_dollars /
     warn_at_tokens emit a one-shot status when crossed. Default off;
     useful with cloud presets configured.

Doc covers scope + done-when criteria, tech decisions table, module
changes, per-pillar deep dive with code sketches, UX surface, out of
scope, risks, 6 open questions to resolve in analyze.

Open at formulate:
  Q-C1 — provider-without-usage handling (local llama.cpp probably)
  Q-C2 — cross-session persistence (defer to phase 8)
  Q-C3 — categories closed-set vs free-form
  Q-C4 — does hossenfelder forward stream_options to all backends?
  Q-C5 — warn fires on the call that crosses, or the next one?
  Q-C6 — :reset clears cost_warn_fired too, or only :cost reset?

Scope confirmed via AskUserQuestion: cost/usage observability
(chosen over project-local config overlay and session search/tag).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:47:58 +00:00
marfrit ac58b19da2 config + docs/PHASE6: example block + status -> Implement (Phase 6 commit #6)
R9-resolved single-owner of the status bump (commit #5 didn't touch
PHASE6.md per the review fold-in).

config.lua:
  - Commented-out `project = { auto_tree, tree_depth, tree_max_chars }`
    block with the same shape as the Phase 1-5 example blocks.
  - Note that :diff / :tree / :highlight all work without config; the
    `project` block ONLY controls the startup auto-inject.
  - Note about :highlight v1 having no config flag (runtime-only),
    cross-references the in-REPL install hint.

docs/PHASE6.md:
  - Status header bumped: "Plan + review fold-in" -> "Implement"
  - Lists the 6 implement commits in the header for traceability:
      c4fc7fd  context: compose_project plumbing
      d1dce83  _scan_project_tree + :tree + auto_tree hook
      4d5f93a  :diff + _git_clean_cmd (B1 helper)
      0d63f01  expand_mentions @<r1>..<r2> tiered resolution
      11d0e59  tree-sitter highlighter (renderer fence filter +
               highlighted dispatch + :highlight meta)
      this    config example + status bump

Phase 6 implementation is complete. Next inner-loop step is verify
(7) — user-driven smoke tests against the live broker on each pillar
plus filing of issues for any defects, then memory-update (8).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:27:58 +00:00
marfrit 261b230be8 docs/PHASE6: review fold-in — 2 BLOCKERs resolved, 7 CONCERNs, 6 NITs
Independent agent review of PHASE6 (manifest + baseline + plan at
4407029). Status header: Plan -> Plan + review fold-in.

BLOCKERs (RESOLVED in-place):

R1. §4 fence detector's `outside`-state dropped the leading `'``'`
    chunk of a split fence — contradicted B2's local-model
    split-fence requirement (4-char median chunk size). Algorithm
    rewritten: outside-state now holds a tail (up to 10 chars) when
    the chunk's suffix could be a fence prefix; flushes on next push.
    Same accumulator pattern as the secrets streaming rehydrator.

R2. `highlighted()` file placement was ambiguous (§3 vs §12). Lives
    in repl.lua (where _shq and executor are accessible);
    renderer.lua exposes set_highlight(enabled, detected, highlight_fn)
    and calls back. Keeps renderer.lua free of the executor require.

CONCERNs (FOLDED):

R3. PTY raw-mode toggle on every code-block render — smoke-test for
    cursor flicker / SIGWINCH races before locking in. Risk row 5.
R4. tree-sitter highlight --lang X grammar is UNVERIFIED — upstream
    CLI canonically takes a path with extension. Implement-time
    check required; fallback path documented (extension-based
    tmpfile + path arg). Added to risk row 5 + open-at-plan.
R5. :tree off semantics clarified — one-shot clear of ctx.project
    + ctx._project_opts; no "disabled" flag.
R6. cwd-coupling difference between :diff (call-time) and :tree
    (scan-time) now documented in §5.
R7. :tree refresh opts caching specified — caches ctx._project_opts;
    `:tree refresh` reuses last explicit opts.
R8. :reset preserves ctx.project (parity with memory_items per
    Phase 4). §12 commit 1 smoke updated.
R9. Status-bump duplication between §12 commits 5e and 6 resolved
    — commit 6 owns the bump.

NITs (APPLIED):

N1. §4 algorithm pseudocode now includes SOL/post-newline anchor
    (mid-line backticks in prose don't open a fence).
N2. _detect_treesitter() gained a comment explaining the popen
    pattern doesn't gate on exit code (B3).
N3. :diff staged shorthand dropped — meta is a thin pass-through
    to git's own grammar.
N4. _scan_project_tree switched from `cd && git ...` to
    `git -C <dir> ...` — no subshell, more idiomatic.
N5. Open-at-plan dir-arg bullet dropped (already decided in §6);
    replaced with R3 + R4 implement-time verification items.
N6. §11 wording on #52 left as-is (cosmetic only).

PHASE6.md now 896 lines (was 701 after plan). +264/-69. Ready for
implementation phase 6 of the inner loop pending user gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:06:19 +00:00
marfrit 4407029296 docs/PHASE6: plan — fold B1/B3/B4 + add §12 commit roadmap
Status header: Analyze -> Plan.

Baseline findings folded into the design sections:

  §1 (highlighter pillar) gains B4: tree-sitter absent on every
  probed host; :highlight on emits install-hint when missing.

  §4 (highlighter sketch) revised per B3: io.popen():close() doesn't
  expose exit codes in LuaJIT. Route via executor.exec("cat tmp |
  tree-sitter ...") which uses pty.spawn+waitpid and returns code
  reliably. Tmpfile design retained (avoids ARGMAX + shell-escape).

  §5 (:diff impl + @<r1>..<r2> retry) revised per B1: every git
  invocation must use `--no-pager -c color.ui=never` to suppress
  the color/keypad/line-clear escapes forkpty triggers. Factored
  recommendation: helper `_git_clean_cmd(subcmd)` shared by :diff
  and the @-mention diff retry.

New §12 Implementation Plan — 6 commits, bottom-up:

  1. context.lua: ctx.project + compose_project + composition order
  2. repl.lua: _scan_project_tree helper + :tree meta
  3. repl.lua: :diff meta + _git_clean_cmd helper (B1)
  4. repl.lua: expand_mentions tiered resolution (@<r1>..<r2> per A6)
  5. renderer.lua + repl.lua: tree-sitter detect + fence filter +
     :highlight meta (B3-revised tmpfile dispatch)
  6. config.lua project example + status -> Implement

Per-commit risk index + smoke criteria. Highlighter (commit 5) is
the largest experimental surface — placed last so the rest of Phase 6
ships even if highlighter slips. Order is independent enough that
swapping 3<->4 or 5<->6 doesn't break anything; bottom-up keeps each
commit individually green.

Things deliberately not split: _shq reuse, lang map duplication for
v1, streaming-rehydration order (rehydrate -> highlight -> emit
inherits naturally from existing chunk pipeline).

Two items open at plan time, resolve at implement: _scan_project_tree
dir-arg vs hardcoded getcwd; :highlight status probing
tree-sitter --print-langs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:01:40 +00:00
marfrit 9f50206ca6 docs/PHASE6-baseline: substrate probes ahead of implementation
Six findings from probing the world before tree-sitter / diff / project
tree implementation lands:

B1. `git` subcommands through executor.exec emit ANSI color + DEC
    keypad/line-clear escapes by default (forkpty enables interactive
    mode). `:diff` impl MUST use `git --no-pager --color=never <args>`.
    Same flags apply to any future git verbs.

B2. SSE chunk size envelope: local llama.cpp delivers tiny chunks
    (median 4 chars, max 13) AND splits code fences across boundaries
    (`'``'` then `'`'`). Cloud (Anthropic via OpenRouter) delivers
    big chunks (median 26 chars), fences intact. The §4 fence-aware
    filter accumulator design covers both — confirmed necessary by
    local-model behavior.

B3. **LuaJIT io.popen():close() does NOT return exit codes** — Lua
    5.1 contract, not 5.2+. Breaks the A4 highlighter resolution.
    Revised: route via `executor.exec("cat tmp | tree-sitter ...")`
    which uses pty.spawn + waitpid and returns (out, code) reliably.

B4. tree-sitter CLI absent on both probed hosts (noether, higgs).
    Highlighter is opt-in by design; absent-CLI path should emit a
    clear install hint, not silently no-op.

B5. Project-tree envelope: aish 32 files / 449 chars; similar local
    repos 15-25 files; scan time ~1-5ms. The 4096-char default cap
    accommodates ~290 typical paths. Large repos handled via
    tree_depth or cap tuning per existing §9 risk row.

B6. os.tmpname returns POSIX /tmp/lua_XXXXXX paths; acceptable for
    the B3-revised tmpfile-roundtrip pattern.

No structural changes to formulate/analyze. B1, B3, B4 will fold into
PHASE6.md §4 / §5 / §1 during plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:58:56 +00:00
marfrit ad52fe4538 docs/PHASE6: analyze — substrate probes + Q resolutions in-place
Analyze pass against tree at f596743. All 6 formulate-time questions
resolved without structural changes; pillar shapes intact.

A1. renderer.lua surface clean — assistant_delta/flush accumulate via
    stream_buf; fence-aware filter slots in between chunk receipt and
    emit without touching anything else.

A2. executor.exec via pty.spawn already handles git diff / find;
    cwd-aware (inherits from libc.chdir). No new IO model.

A3. context composition order locked: base + [background] + [earlier
    summary] + NORRIS. [project] inserts between [background] and
    [earlier summary]; Norris-suppression guard inherited.

A4. Q-H1 RESOLVED: tmpfile roundtrip for tree-sitter popen3
    (io.popen("w") + redirect stdout to tmp file; io.open reads back).
    Avoids ARGMAX + shell-escape complexity. Cost ~one syscall per
    code block.

A5. Q-D1 RESOLVED: no confirm gate on :diff. git diff is read-only;
    matches :history / :sessions / :safety check.

A6. Q-D2 RESOLVED: tiered @<token> resolution — file lookup first,
    then ref-range retry when path fails AND token contains "..".
    @origin/main..feature works naturally; @../sibling.txt unaffected.

A7. Q-H2 RESOLVED: highlighter is assistant-output only in v1.
    @-mention echo via readline is a different code path; deferred
    to v2 (added to §8 out-of-scope).

A8. Q-T1 RESOLVED: project tree captured at scan time, not auto-
    refreshed on cd. v1 verb is :tree refresh; cd-intercept auto-
    refresh deferred to v2.

A9. Q-T2 RESOLVED: .gitignore via `git ls-files --exclude-standard`
    in repos; find fallback outside. Custom globs deferred to v2.

A10. expand_mentions punct-peel doesn't strip "/", so HEAD~1..HEAD,
     peels comma cleanly and the diff retry catches the cleaned token.

A11. Auto-injection ordering: memory load → tree scan → first ask_ai.
     Composition reads memory facts before file tree.

A12. [project] Norris-suppressed (parity with R-C1/R-C4).

§3 module-changes table: context.lua row updated (project string +
compose_project + ordering note + Norris suppression). §4 highlighter
code sample replaced with the tmpfile-roundtrip resolved form. §5
@-mention section rewritten as tiered-resolution with worked examples.
§8 out-of-scope gained three v2-polish items (echo highlight, cd-
intercept auto-refresh, custom globs) so they're tracked. §10 Open
Questions table now shows all 6 Qs with their resolutions inline.
§9 Risks row for @-mention collision updated to point at A6.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:53:58 +00:00
marfrit f596743834 docs/PHASE6: formulate — tree-sitter highlight + diff + project tree
Phase 6 formulate manifest. Three pillars per PHASE0 §11 row 6:

  1. Tree-sitter syntax highlighting hooks
     External `tree-sitter` CLI when present, no-op otherwise.
     Honors PHASE0 §3 (no compiled extensions). Toggleable
     at runtime; off by default so existing UX is unchanged.

  2. Diff-aware code injection
     :diff [args] meta + @<ref1>..<ref2> @-mention extension.
     Shells out to `git diff`; output flows through the existing
     exec-output context channel.

  3. Project-level file-tree context
     :tree meta + optional cfg.project.auto_tree startup inject.
     git ls-files in a repo, find fallback otherwise. Composed
     into the system prompt as a new [project] block between
     [background] and [earlier summary]. Suppressed under Norris
     (R-C1 / R-C4 parity).

Module changes: renderer.lua (fence-aware highlight filter), context.lua
(compose_project), repl.lua (3 new metas, 3 new helpers, expand_mentions
extension). No new module files in v1.

Doc covers: scope + done-when criteria, tech decisions table, module
changes table, per-pillar deep dive with example code, UX surface
summary, out-of-scope list, risks, and 6 open questions to resolve
in analyze (Q-H1/Q-H2 highlighter, Q-D1/Q-D2 diff, Q-T1/Q-T2 tree).

Scope confirmed via AskUserQuestion: all three subsurfaces in scope;
tree-sitter approach is external CLI w/ no-op fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:47:00 +00:00
marfrit 2e389c1475 docs/PHASE5: review fold-in — callback signature, Norris suppression, cost defaults
Independent review found 1 BLOCKER + 5 CONCERNs + 4 NITs. Resolutions:

B1 BLOCKER: summary callback signature was inconsistent across §3 and §6.
  Canonical now: summarize_fn(prior_summary, evicted_turns) -> string|nil
  dispatching on the two args:
    (nil, [turns])  — first-time summarize
    (str, [turns])  — additive (extend prior summary with new evictions)
    (str, nil)      — compress (re-summarize the prior summary itself)

C1: re-summarize trigger now uses the (str, nil) compress signal
  rather than degenerate (str, {}).

C2: routing decision is taken once on entry to ask_ai. The chosen
  active_cfg is used for every tool-sub-loop iteration. Original
  active_cfg restored after ask_ai returns.

C3: AUTO-routing does NOT fire inside the Norris loop. Model fixed
  at :norris launch time; planner stays on it for every iteration.
  Q39 resolved. Per-iteration fallback still gated by
  cfg.routing.fallback — retries the failing call against cloud
  without permanently switching the planner.

C4: Summary block suppressed in Norris (mirrors Phase 4 R-C1 for
  the [background] block). Both are "earlier context" the planner
  generally doesn't need.

C5: Fallback pattern coverage expanded — added HTTP 408 (Q41
  resolved) and "Operation timed out" (libcurl version variant).
  Dropped "HTTP response code said error" from A2 — FAILONERROR
  was removed in Phase 4 f26cbd9.

NITs folded:
  N1 :route check <text> always runs heuristic; suffix
     "(routing currently disabled)" when cfg.routing.auto = false
  N2 reasoning → nil by default (not → "cloud"); user explicitly
     opts in to map reasoning to a paid model. Same cost-safety
     rationale as confirm_cmd default true.
  N3 "Retry only when no deltas have arrived" promoted to §5
     normative rule (was in §11 risk row).
  N4 cfg.routing.cloud_fallback renamed cfg.routing.fallback to
     align with the :fallback meta verb.

Reviewer verdict: commit #1 (router.classify_model) is implement-
ready; B1/C1 resolution required before commit #2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:15:39 +00:00
marfrit 555fdd7717 docs/PHASE5: analyze — surface clean, summary lives on ctx.summary not turns
A1. router.lua surface clean; classify_model is a natural sibling of
    classify. No structural refactor.

A2. broker error message shapes confirmed: all transport errors carry
    "transport: " prefix; "api: " for SSE-framed semantic errors;
    "broker: " for config bugs. Fallback matcher must strip the prefix
    before testing — list of eligible patterns tightened in §5.

A3. Q38 RESOLVED — summary doesn't go in ctx.turns (would create
    system/system back-to-back, same gotcha as PHASE0 §6 user/user).
    Instead lives on ctx.summary (string) and composes into the
    system message between [background] and NORRIS suffix. No new
    role:"system" turn; no alternation risk. §3 + §6 reflect.

Module-changes table updated to specify ctx.summary string field +
the to_messages composition order. Storage shape diagram in §6
rewritten.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:12:50 +00:00
marfrit 4453b93ab5 docs/PHASE5: formulate — multi-model routing + cloud fallback + summarize-on-evict
Phase 5 formulate manifest. Three pillars per PHASE0 §11 row 5:
heuristic-based per-request model routing, single-hop cloud fallback
on local transport failure, and fast-model summarization at sliding-
window eviction time.

Resolutions baked in via §2:
  - Routing trigger: per-request in repl.ask_ai, gated by
    cfg.routing.auto (default off)
  - Classification: pure-Lua heuristics (length, keywords, code-fence
    detection, exception markers) — no LLM probe in v1
  - Classes: code → deep, reasoning → cloud, default → keep active
  - Fallback trigger: string-match on err for HTTP 5xx /
    model_not_found / "Connection refused" / DNS / timeout
  - Fallback: one retry against cfg.routing.fallback_model (default
    "cloud" if configured); status line on every retry
  - Summarize: enforce_budget invokes summarize_fn callback wired
    by repl.lua to broker.chat with the fast model
  - Summary turn: single rolling _summary at turns[1], appended to
    on each eviction, re-summarized when it exceeds max_summary_chars

Open questions (Q37-Q42) in §10:
  Q37 routing for :ask explicit ask
  Q38 summary turn vs system-role alternation
  Q39 fallback under Norris (proposal: single-request only)
  Q40 summary re-summarize fidelity loss (lossy by design)
  Q41 HTTP 408 pattern eligibility (default yes)
  Q42 routing inside tool-call sub-loop (proposal: fix at entry)

5-commit roadmap in §11. No new module files; mostly repl.lua and
router.lua growth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:11:26 +00:00
marfrit ffead3986c docs/PHASE4: review fold-in — flock for race, Norris suppression, summarizer self-amp
Independent review found 1 BLOCKER + 3 CONCERNs + 4 NITs.

R-B1 (BLOCKER): TOCTOU race on memory.jsonl — two aish processes
  scanning the same file compute identical next_ids. Resolution:
  flock(LOCK_EX | LOCK_NB) on the fd in M.open_memory, held until
  close. Bundled into commit #1 (per reviewer: cannot defer because
  adding flock retroactively means reopening the handle). Requires
  ffi/libc.lua extension: flock cdef + LOCK_EX/LOCK_NB/LOCK_UN
  constants + M.flock wrapper.

R-C1 (CONCERN, closes Q33): [background] block suppressed when
  ctx.norris_active. Avoids ~16K of redundant tokens per 8-step
  Norris run. Norris already anchors via its goal in the NORRIS
  suffix; memory items rarely change step-to-step planning.

R-C2 (CONCERN): summarizer self-amplification — running :memory
  summarize twice in one session would feed the prior summarize
  call's assistant turn into the next input. Resolution: operate
  on the session log file (history.load(session_path)) instead
  of ctx:to_messages(), and tag prior summarize turns with
  meta="summarize" so they're filterable.

R-C3 (CONCERN, cosmetic): §5 diagram clarified that
  DEFAULT_SYSTEM_PROMPT already carries the Phase 2 MCP block
  statically — not a separate dynamic block in v1.

NITs N1-N4 folded inline:
  N1 forget no-op for unknown id surfaces a status
  N2 path note: memory.jsonl is sibling of sessions/, no collision
  N3 item-id invariants: id >= 1; meta header has no id; tombstones
     with non-matching targets are no-ops
  N4 :memory inject semantics explicit (replace ctx.memory_items
     from a fresh load + LRU-by-ts truncation)

§3 module-changes table grew a new ffi/libc.lua row.
§12 commit #1 description tightened — flock work bundled inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 04:50:43 +00:00
marfrit 2146b909f8 docs/PHASE4: analyze — surface confirmed, counter strategy locked
A1. history.lua surface lines up cleanly for the memory additions —
    no structural refactor; pure additive functions mirroring the
    session pattern.

A2. Counter persistence: scan at open, cache next_id in handle.
    O(n) load (n bounded by curation, ~hundreds), no sidecar file.
    Persisted ids let forget-tombstones target items even across
    restarts.

A3. System-prompt suffix order locked: DEFAULT (carrying Phase 2 MCP
    block baked in) → Phase 4 [background] → Phase 3 NORRIS. Token
    cost measured: default ~174 toks, +NORRIS ~364 toks, +NORRIS+2KB
    background ~865 toks. Well within typical context budgets.

No manifest amendments needed — §3/§5 already match. Findings recorded
inline as Phase 7 anchors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 04:47:01 +00:00
marfrit bea717534c docs/PHASE4: formulate — memory.jsonl + startup injection + :memory meta
Phase 4 formulate manifest. Three pillars per PHASE0 §11 row 4:
memory.jsonl persistent cross-session store, startup context injection
into the system prompt, and the :memory management surface +
opt-in :memory summarize for candidate extraction.

Resolutions baked in via §2:
  - Storage:  append-only JSONL at <history.dir>/memory.jsonl
  - Format:   {id, ts, kind, content, tags?, source?}
  - Kinds:    fact / pref / context (lightly typed v1)
  - Forget:   tombstone append, resolve at load (set-based)
  - Cadence:  manual :memory summarize only in v1; auto-trigger Q-listed
  - Inject:   dynamic [background] block on system prompt, capped at
              2000 chars by default; LRU-by-ts selection if over-budget
  - Order:    DEFAULT → MCP block → [background] → NORRIS suffix
              (Norris last so it dominates when active)

New module surfaces:
  history.lua  M.open_memory / memory:add / memory:forget / M.load_memory
  context.lua  ctx.memory_items + [background] composer
  repl.lua     :remember, :memory add/list/forget/clear/inject/summarize
  config.lua   commented-out memory = {...} example

Open questions (Q31-Q36) tracked in §11:
  Q31 auto-summarize trigger (manual v1; auto-on-quit candidate)
  Q32 in-place edit vs forget+re-add
  Q33 Norris-mode interaction (proposal: both blocks stay)
  Q34 split prefs into a dedicated prompt section?
  Q35 redaction of sensitive content during summarize
  Q36 duplicate detection on :memory add

5-commit roadmap in §12 (history → context → repl → summarize → config).
No new module files. No substrate amendments to PHASE0 — entirely
additive on top of Phase 1's history.lua pattern and Phase 3's
dynamic-suffix pattern in context.lua.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 04:25:57 +00:00
marfrit 125f800513 docs/PHASE3: re-review NIT fold-in — pipe-to-sh EOL, ci= note, §12 sync
Re-review surfaced one new BLOCKER + two CONCERNs + four NITs. Folded:

N1 BLOCKER: `|%s*sh%f[%s]` missed `curl x | sh` (end-of-string canonical
   wrapper-bypass — Lua's `%f[%s]` requires transition INTO whitespace,
   which doesn't happen at EOL). Replaced with two patterns each for
   sh and bash: `|%s*sh%s` (followed by whitespace/args) and
   `|%s*sh%s*$` (end-of-string). Same for bash. Verified against 18
   wrapper-bypass test cases — all canonical idioms now HALT.

N2 CONCERN: `ci=true` rule flag had no implementation note. Added one
   sentence to §5 explaining the matcher lowercases the input string
   when ci is set.

N3 CONCERN: §12 commit #5 description was stale — still said
   "extends interactive CMD: extraction to consult is_destructive"
   which contradicts the R-B3 resolution (Norris-only). Rewrote
   commit #5 description to match R-B3, and bundled the
   ffi/readline.lua `_bound[seq]:free()` removal into commit #5's
   scope with explicit "Phase 1 amendment" callout. Same for the
   §12 risk note that still referenced the dropped behavior change.

Other NITs (N4 skip threshold, N5 approved-turn mention, N6 :model
swap interaction, N7 commit-attribution wording) are cosmetic and
will fold in-flight during implement if material.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:45:25 +00:00
marfrit 91ddcb005d docs/PHASE3: review fold-in — security-layer BLOCKERs resolved
Independent review surfaced 3 BLOCKERs + 6 CONCERNs + 7 NITs against
the analyze-tier draft. Resolutions applied:

BLOCKERs:
  B1 Shell-wrapper bypass — static patterns leaked on bash -c, sh -c,
     eval, pipe-to-shell, python -c, xargs|rm. Added 9 wrapper
     patterns to §5. Norris HALTs on any wrapper invocation; user
     reads the inner before proceed. The patterns are the
     conservative floor against the wrapper bypass class.
  B2 LLM second-opinion was self-policing — same model class
     generating actions then judging them. Switched probe model
     from `fast` to `deep` (qwen3-30b). Added re-roll inversion:
     if first probe says NO, ask "is this SAFE?". Disagreement
     between two probes → HALT. Cheap independent-class insurance.
  B3 `is_destructive` would have run on interactive CMD: extraction
     — a PHASE0 §6/§10 substrate amendment in disguise. Resolved
     Q24: heuristic runs ONLY when norris_active == true. No
     substrate change; interactive `confirm_cmd` semantics unchanged.

CONCERNs:
  C1 Skip-budget: consecutive_user_skips counter; 3+ similar skips
     escalate to abort/force-proceed prompt.
  C2 Algorithm-vs-Q25-resolution contradiction: §4 reordered to
     dispatch ALL pending actions before checking GOAL: complete.
  C3 Norris-goal eviction: goal embedded directly in the dynamic
     system-prompt suffix; survives sliding-window eviction.
  C4 Readline use-after-free window: M.bind no longer frees old
     callbacks; pin for process lifetime (bounded memory cost).
  C5 GOAL: complete matcher: line-level scan, exact match after
     trim — substrate-aligned with CMD: rigor.
  C6 §4 step 4 tightened: auto_approve does NOT bypass destructive
     heuristic; tool_call without auto_approve still HALTs even
     when destructive-clear (Norris conservative).

NITs deferred or rolled into pattern table:
  - chown root-path pattern tightened (NIT 2 in-line)
  - Test corpus expansion noted in §12 commit #1 risk
  - Other NITs are wording-level

Status: Plan (review folded). Ready for commit #1 (safety static
patterns) once another review pass clears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:42:58 +00:00
marfrit cf4d79dd9d docs/PHASE3: analyze + baseline — \C-n mechanics, LLM latency, module pre-state
Analyze findings folded into the manifest:

  A1. \C-n binding can't toggle mid-prompt without rl_insert_text /
      rl_redisplay. Solution: bind those (one cdef + 2 wrappers in
      ffi/readline.lua) so \C-n inserts ":norris " at the cursor; user
      types goal + Enter. Routes through existing meta dispatch.

  A2. broker has no max_tokens passthrough. Add opts.max_tokens for
      the LLM second-opinion path (terminates at ~2 tokens; verified
      proxy honors it).

  A3. Phase 2 tool-sub-loop pattern IS the planner shape. safety.norris_step
      is the per-iteration extraction; driver loop in repl.lua.

Module-changes table (§3) updated with the rl_insert_text and
max_tokens rows.

Baseline doc (PHASE3-baseline.md, 80 lines) captures:
  - LLM second-opinion latency: 425-1162ms per probe, all 5 test
    cases correct. Worst-case 16-step Norris = ~20s overhead; with
    static-pattern fast-path + session cache, ~5s realistic.
  - Module pre-state at commit f26cbd9 (Phase 2 tip): LOC + state
    per file before Phase 3 edits.
  - Six static-pattern Lua-match sanity checks (all correct).
  - Carries: aish#15 (still open), aish#14, aish#32/#33.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 22:37:58 +00:00
marfrit b58a842e49 docs/PHASE3: formulate — Norris autonomous mode + destructive-op gate
Phase 3 formulate manifest. Three pillars per PHASE0.md §11 row 3:
Chuck Norris autonomous mode (planning loop), destructive-op heuristic
(static patterns + LLM second-opinion), and HALT/confirm protocol.

Resolutions baked in via §2:
  Q2  iterative re-plan after each action (not top-down tree)
  Action sources    CMD: lines AND MCP tool_calls — Phase 2 contract honored
  HALT trigger      static-pattern hit OR LLM-second-opinion flag
  HALT shape        3-way: proceed / skip / abort
  Auto-approve under Norris  honors Phase 2 auto_approve policy
                             EXCEPT destructive-op heuristic always wins
  LLM second-opinion model   the `fast` preset (cheapest)
  Norris prompt suffix       appended to system prompt while active;
                             "GOAL: complete" sentinel for done

Key extensions:
  - safety.is_destructive: ~20 static shell-idiom patterns + LLM probe;
    runs on interactive CMD: extraction too (§9 — replaces bare
    confirm_cmd for known-destructive cases). Q24 worth challenging
    at analyze.
  - safety.norris_step: single-iteration of the planner. Driver loop
    in repl.lua. \C-n toggle (real binding, replaces Phase 1
    placeholder); :norris <goal> explicit launch.
  - renderer.norris_begin/step/halt/end: visual parity with exec
    and tool_call frames. Prompt becomes [aish:fast ]> per
    PHASE0.md §9.
  - context.to_messages dynamically appends NORRIS MODE suffix
    when norris_active.

New open questions (Q23–Q30) tracked in §11:
  Q23 LLM second-opinion latency budget (caching mitigation)
  Q24 interactive CMD: also subject to is_destructive? (proposal: yes)
  Q25 GOAL: complete + pending actions in same response — dispatch first
  Q26 context preservation on abort/done/budget — all preserve
  Q27 :norris continue (resume after abort) — deferred to v2
  Q28 side-effect MCP tools not in *__shell/*__write_file patterns
  Q29 goal-implies-authorization for destructive ops — no, always confirm
  Q30 :norris no-arg vs \C-n share goal-prompt path — yes, trivial

Module-layout (PHASE0 §4) untouched — all changes are growth of
existing files. 6 commits expected at implement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:45:03 +00:00
marfrit f26cbd9a3a phase2 amend: __ separator (Bedrock-safe) + post_sse error diagnostics
Phase 7 verify finding from TC #26 against :model cloud:
  HTTP 400 from openrouter→Amazon Bedrock:
  "tools.0.custom.name: String should match pattern
   '^[a-zA-Z0-9_-]{1,128}$'"

Anthropic via Bedrock validates tool names against that regex and
rejects dots. PHASE2 originally chose "." as the namespace separator
("boltzmann.list_dir"); OpenAI tolerated it, Bedrock does not.

Separator switched to "__" (two underscores) everywhere — internal
API matches on-wire shape, no transformation layer:

  - repl.lua:
    - tools_schema builds "alias__name"
    - dispatch_tool_call splits via "^(.-)__(.+)$" (non-greedy → leftmost __)
    - :mcp tool parser uses same split
    - :mcp tools formatter prints "alias__name"
    - HELP block shows <alias__name>
  - safety.lua confirm_tool_call: alias.* glob → alias__* glob
  - config.lua example block: keys rewritten
  - docs/PHASE2.md: amendment header added; §1, §2 row, §3 config.lua
    row, §5 wire-shape JSON examples, §6 auto_approve schema, §7
    meta-cmd table, §12 plan all updated. Original "." references
    preserved in commit history.

Constraint: aliases must not themselves contain "__" so the parse
stays unambiguous. Tool names from MCP servers may have underscores
freely.

Second fix bundled — uninformative broker error:
  Previously "broker error: transport: HTTP response code said error"
  Now      "broker error: transport: HTTP 400: {full body snippet}"

ffi/curl.lua M.post_sse changes:
  - FAILONERROR no longer set (was hiding the response body).
  - raw_body accumulator added alongside the SSE buffer; captures
    every byte regardless of SSE shape.
  - After perform, check status_code via curl_easy_getinfo. On >=400,
    return (nil, "HTTP <code>: <body[:400]>"). 2xx unchanged.
  - End-of-stream SSE flush only runs on 2xx (no false event on
    error bodies that aren't SSE-shaped).
  - Phase 1 callers reading just first return slot stay correct.

End-to-end verified:
  - :model cloud + tools=[boltzmann__read_file ...] +
    "Use boltzmann__read_file with path=/etc/hostname" →
    Claude emits tool_call with name="boltzmann__read_file",
    args='{"path": "/etc/hostname"}'. ok=true, transport clean.
  - Force-bad tool name "bad.name.with.dots" → err string carries
    the full bedrock 400 with the regex-pattern message visible.

TC #26 (sub-loop end-to-end) is now testable against cloud — the
error that blocked it is resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:04:57 +00:00
marfrit 6c194deea0 mcp: JSON-RPC client + ffi/curl status_code; PHASE0 §4 amended
First commit of Phase 2 per docs/PHASE2.md §12. Three changes bundled:

mcp.lua (new, 153 lines):
  - M.connect(url, opts) returns a Session.
  - Session:initialize() round-trips initialize + notifications/initialized
    + tools/list. Caches tools for session lifetime (lmcp announces
    capabilities.tools.listChanged = false; no refetch).
  - Session:list_tools() returns the cached tool list.
  - Session:call_tool(name, args) returns (result_table, kind) where
    kind ∈ {"ok", "handler_error", "rpc_error", "transport_error"} per
    the §4 error split. Folded HTTP-level failure into transport_error.
  - Per-server Bearer auth via opts.auth_token or opts.auth_env env-var
    indirection.
  - Captures protocolVersion mismatch as a warning string rather than
    aborting (lmcp doesn't negotiate — N3 in review).

ffi/curl.lua extension:
  - Add curl_easy_getinfo to ffi.cdef.
  - Pre-cast as getinfo_long; helper get_response_code() fetches
    CURLINFO_RESPONSE_CODE (decimal 2097154 = CURLINFOTYPE_LONG | 2).
  - M.post now returns (body, status_code) on transport success;
    (nil, errmsg) on libcurl failure stays unchanged. Phase 1 callers
    reading only the first slot are unaffected.

docs/PHASE0.md §4:
  - Insert `mcp.lua` between broker.lua and router.lua per PHASE2.md §9.
  - Module-stability invariant clarified: rename prohibition is what
    matters; adding new files is additive.

Smoke-test passes for all four kinds against boltzmann lmcp v0.5.4:
  - initialize: ok (7 tools cached)
  - list_dir /tmp: ok (1.2KB content)
  - read_file /nonexistent: ok (boltzmann's baseline §3 quirk —
    isError:false even on failure; content is authoritative)
  - nope_tool: rpc_error (code=-32601)
  - wrong auth: transport_error (HTTP 401)
  - unreachable host: transport_error (DNS failure)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:06:39 +00:00
marfrit f5daa6afc0 docs/PHASE2: re-review NITs — M.post shape, getinfo cdef, content flattening normative
Three follow-up NITs from the post-fold-in review:

  (1) Disambiguate M.post return shape: (body, status_code) on transport
      success regardless of status; (nil, errmsg) on libcurl failure
      stays unchanged. Phase 1 callers reading only the first slot are
      unaffected.

  (2) Note that the M.post extension requires extending ffi.cdef to
      include curl_easy_getinfo + CURLINFO_RESPONSE_CODE (decimal
      2097154, CURLINFOTYPE_LONG | 2) and a long[1] out-param shim.
      Implementation detail the commit #1 author will need.

  (3) Move the tool-result content-flattening rule from §12 risk note
      into §4 normative spec (forward-referenced both ways) — §4 is
      where a future reader looking for the tool-invocation contract
      will scan.

No design changes; clarifications only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:02:35 +00:00
marfrit d3570ccea4 docs/PHASE2: review fold-in — 5 BLOCKERs + 7 CONCERNs + key NITs
Independent review of the formulate+analyze+plan draft surfaced design
gaps that would have shipped as silent bugs. Resolutions applied:

BLOCKERs:
  B1 context.lua impact widened — Phase 1 :append asserts content and
     discards extra fields. Need (a) shape-per-role assert, (b) preserve
     tool_calls/tool_call_id on store, (c) emit from to_messages().
  B2 ffi/curl.M.post extended to return (body, status_code). lmcp's
     401 returns a non-JSON-RPC body that would have been mis-decoded.
  B3 §3 typo schema -> inputSchema.
  B4 pending_exec_output × tool-call sub-loop interaction specified.
  B5 §3/§12 broker dependency contradiction — broker takes opts.tools
     from caller; no layering inversion.

CONCERNs:
  C1 M.chat return polymorphism dropped (no consumer).
  C2 tool_calls[].index absent fallback: default to 0.
  C3 Re-injection stores accumulated text, not hard-coded empty.
  C4 :mcp connect failure: no auto-retry, status-log once.
  C5/C7 JSON-RPC error AND argument-parse failure both synthesize a
     role:"tool" turn — keeps strict-template alternation legal
     exactly the way PHASE0 §6 demanded for exec output.
  C6 §9 confirms §4 amendment is additive (preserves §3 invariant).

NITs:
  N3 protocolVersion fallback (lmcp doesn't negotiate).
  N4 Alternation assert in Context:append.
  N7 Model-routing bug filed as aish#23.
  N8 Day-one fallback test for use_tool_role=false in commit #3.

Manifest status: Plan (review folded). Status line and Resolutions
sections updated; commit-by-commit roadmap reflects revised specs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 13:00:07 +00:00
marfrit 447e430254 docs/PHASE2 §12: implementation plan — 7-commit roadmap
Bottom-up: mcp.lua → safety.lua → context.lua → renderer.lua → broker.lua
→ repl.lua → config.lua. Same cadence as Phase 0/1.

Risks called out explicitly:
- Empty tools array → omit field entirely (some servers reject [])
- isError:false on actual failure (baseline §3 finding) → pass content
  through regardless; let model read error text
- JSON-RPC error from tools/call → aish status only, no tool turn
  appended, no model recovery
- max_tool_depth=8 cap on tool-call sub-loop
- Argument JSON streaming may yield malformed JSON → status warn + skip
- Q18 fallback (use_tool_role=true default; prefix-injection plumbed
  but dead-coded; verify can flip)
- Connect-at-startup is sequential (~30ms × N); fine for N≤3

Two items left open for review: Q18 default flip vs ship-true-flip-on-fail,
and whether :mcp connect should re-fetch tools after the initial cache.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:37:27 +00:00
marfrit c5116bf129 docs/PHASE2-baseline: pre-implementation measurements
Phase 7 (verify) anchor. Captures:

- MCP RPC round-trip timings against boltzmann lmcp v0.5.4 (all sub-100ms
  on LAN; LLM is the latency floor, not the transport).
- 6 fixture responses saved to /tmp/aish-baseline/ covering initialize,
  notifications/initialized, tools/list, tools/call success, isError,
  and JSON-RPC unknown-tool error.
- Baseline design finding: boltzmann's read_file returns isError:false
  even on failure (error text in content). aish should treat content as
  authoritative, isError as advisory; feed both to the model. PHASE2.md
  §4's "pass-through" stance already accommodates; no manifest amendment
  needed.
- Streaming tool_calls delta shape verified against hossenfelder; matches
  PHASE2.md §5.
- Pre-MCP aish behavior snapshot: loaded model emits markdown code-fence
  ignoring the CMD: contract — once MCP tools exist the model gets a
  structured path that doesn't depend on prose-formatting compliance.
- Module pre-state at Phase 1 head 5878f73: LOC + capability snapshot
  per module so Phase 2 diff has a reference frame.
- Two boltzmann-proxy blockers (SSE buffering, model-field routing)
  carried explicitly into Phase 7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 12:34:32 +00:00
marfrit 5878f7347b docs/PHASE2: analyze — lmcp v0.5.4 probed, transport simplified
Live-probed against lmcp v0.5.4 (boltzmann) + hossenfelder broker proxy:

Transport simpler than spec:
- lmcp only implements POST-per-RPC with Connection: close; no held-open
  SSE channel. Combined with capabilities.tools.listChanged=false, no
  client-side listener is needed in v1. Drops the planned M.get_sse
  addition to ffi/curl.lua — Phase 1's M.post covers MCP.

Bearer auth is universal across the fleet — config schema grew
auth_token (literal) and auth_env (env-var indirection) fields per
server, mirroring PHASE0 §10's key_env convention.

Streaming tool_calls delta shape verified — accumulator by `index`,
function.arguments arrives as chunked JSON-string. Matches the
formulate-phase assumption in §5.

Resolutions:
  Q17 transport abstraction — POST-only, no SSE channel for lmcp.
  Q21 error mapping       — result.isError (model-recoverable, feed
                             back as tool turn) vs JSON-RPC error
                             (unknown method/tool, transport-level).
  Q18 role:"tool" turn    — accepted at protocol level (live-probed).
                             Mistral-nemo template verification
                             blocked by the hossenfelder model-field
                             routing bug; full closure carried to
                             Phase 7 verify.

Open-end recorded in §11: the hossenfelder proxy routes every request
to the loaded fast model regardless of model field, blocking Phase 2
testing against mistral-nemo specifically. Parallel to the SSE
buffering issue at marfrit/aish#15; same root (boltzmann proxy code).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 09:51:03 +00:00
marfrit ec6793c93c docs/PHASE2: formulate — MCP client + tool-calling bridge
Phase 2 formulate manifest. Three pillars per PHASE0.md §11 row 2:
mcp.lua (JSON-RPC 2.0 over HTTP+SSE, target: lmcp), tool-calling bridge
(OpenAI tools field <-> MCP tools/call), and the safety.lua
authorization gate (per-call confirm + auto_approve policy).

Resolves PHASE0.md §13 Q6–Q10:
  Q6  CMD: + tool-calls coexist; substrate §3 unchanged
  Q7  config-declared servers + runtime :mcp connect
  Q8  per-call confirm default, auto_approve policy in config
  Q9  hybrid system prompt: static frame + dynamic tools body field
  Q10 streaming-from-day-one on Phase 1 SSE; on_delta widens to (kind, payload)

New questions tracked in §11 (Q17–Q22): transport abstraction, role:tool
vs prefix injection (mistral-nemo template verification needed), large
tool-result handling, parallel dispatch, error mapping, aish-as-MCP-server
(parked).

§4 module layout amended: mcp.lua slots between broker.lua and router.lua.
The amendment is documented in this manifest; the actual §4 table edit
lands when implementation starts (Phase 2 implement phase).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 09:23:53 +00:00
marfrit 7d62eb5659 review followups: pcall shield, :resume guard, shell quoting, nits
CONCERNs from the Phase 1 review pass:

ffi/curl.lua:
  - SSE write_cb body is now pcall-wrapped. A Lua error in on_event (or
    in the parse loop itself) is captured into cb_error and surfaced
    after curl_easy_perform rather than propagating across the FFI
    callback boundary (which LuaJIT documents as process-fatal). The
    EOS flush path gets the same shield. Errors return
    (nil, "callback: <msg>") from post_sse.

history.lua:
  - sh_singlequote() escapes shell metacharacters; the mkdir -p and
    ls -1 shell-outs no longer double-quote (where $(...) and $VAR
    still expand) — single-quote with embedded-' escaping is the
    safe form.
  - M.load now returns (turns, meta) instead of (meta, turns). turns
    is ALWAYS a table on success, never nil-when-no-header; failure
    path is the unambiguous (nil, err). Callers can `if not turns
    then` without the previous ambiguity. repl.lua :resume updated
    to the new shape.

repl.lua :resume:
  - Refuse to resume into a non-empty ctx — silent overwrite was the
    Q15 default, but the review surfaced the no-undo / no-warning
    failure mode. User must :reset (or :save then re-launch) to
    express intent. The current session's on-disk log is unaffected
    either way.

NITs:
  - ffi/libc.lua READ_BUF: comment noting it's module-shared and
    Phase 1 has no reentrant readers; revisit when that changes.
  - PHASE1.md §7: \C-x\C-c reservation pinned to Phase 3 ("deferred
    from Phase 1 — no consumer here") rather than the previous
    dangling "(or here)".

Regression suite verifies:
  - history.load new signature on success + failure paths
  - shell-quoted history.dir with $ doesn't trip
  - aish scripted run: ctx with 2 turns refuses :resume anchor with
    a clear status; user must :reset first

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:05:23 +00:00
marfrit 1f1065157e review BLOCKER: PTY input forwarding + raw mode toggle
Phase 1 review caught a structural gap: executor.exec only drained the
PTY master fd, never forwarded user keystrokes — vim/less/htop/nano
would render and hang on input. PHASE1.md §5 specified bidirectional
multiplex but only the read leg landed. tcgetattr/tcsetattr were also
missing, so even with input forwarding the parent's line discipline
would buffer until newline (breaking single-key UIs).

ffi/libc:
  - struct termios opaque buffer + tcgetattr/tcsetattr + cfmakeraw
  - M.set_raw(fd) saves termios + applies cfmakeraw; returns saved or
    (nil, err) when fd isn't a tty (scripted / piped-stdin runs)
  - M.restore_termios(fd, saved)
  - struct pollfd + M.poll (POLLIN constant)

executor:
  - multiplex(sess): poll(stdin, master); reads master on any revents
    (POLLHUP fires when child closes its slave end, not POLLIN — the
    revents != 0 check catches both); forwards stdin keystrokes to
    master; loop exits when master read returns 0 (EOF / child gone)
  - stdin polling is only enabled when stdin_is_tty (set_raw succeeded);
    piped-stdin runs (tests / scripted) would otherwise drain queued
    aish commands into the child of the *current* cmd, swallowing them
  - raw mode is restored before returning so the user lands back at the
    aish prompt in canonical mode

renderer + repl:
  - exec_output(out, code) split into exec_begin() (top rule, before
    spawn) + exec_end(code) (closing rule with exit, after wait). PTY
    multiplex streams the body live to stdout in between; the renderer
    never re-prints the body.

PHASE1.md §3:
  - tcgetattr/tcsetattr changed from "optional" to "required for
    single-key UIs to work — done-criteria #2"; poll added to the libc
    row description.

Verified:
  - non-interactive smoke (echo / false / exit 7 / ls /nonexistent /
    printf multi-line) — all exit codes correct, output streamed live,
    a\nb\nc\n preserved byte-for-byte
  - scripted-stdin run reaches all expected lines (no stdin draining
    into a non-interactive child)
  - aish prompt + framed exec block + exit-code line all render in
    correct order

Live interactive verification (vim / less / htop in a real terminal)
still needs a user-test pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 20:00:53 +00:00
marfrit ee4d7f86d6 executor: swap popen+sentinel for pty.spawn (Phase 1)
Replaces the Phase 0 io.popen + sentinel-echo exit-code recovery with
forkpty + waitpid via ffi/pty. The §7 amendment paragraph on PHASE0.md
is rewritten to point at PHASE1.md §5 — the workaround is gone, not
just renamed.

User-visible behavioral changes:
  - Interactive commands (vim, less, htop, top) now work via $cmd /
    :exec / known-command shell paths because the child has a real
    PTY for line discipline.
  - Exit codes are accurate: `false` -> 1, `exit 7` -> 7, signal kill
    -> 128+N (bash convention), shell parse error -> sh's 2.
  - Broken-shell-syntax cmd now shows the actual sh diagnostic
    (e.g. "Syntax error: end of file unexpected") instead of Phase 0's
    "(no output — possible shell parse error)" guess.
  - Output normalization: PTY emits CR LF; executor collapses \r\n
    -> \n to keep the Phase 0 contract ("output uses \n separators").

Code path:
  pty.spawn(cmd) -> drain master_fd until EOF
                 -> wait() returns ("exit", N) | ("signal", N) | ...
                 -> exit_code mapped: exit -> N, signal -> 128+N, else -1

Phase 0 invariants intact: `cd` interception unchanged (still libc.chdir
per §3 + §7), `CMD: ` extraction unchanged.

PHASE0.md §7: the "LuaJIT 2.1 popen-close caveat" paragraph is rewritten
to "Superseded by Phase 1" — points at PHASE1.md §5 for the live model.
The illustrative sketch is left in place as historical context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 19:08:27 +00:00
marfrit 539408f480 phase1 formulate: scope, tech decisions, module changes, open questions
Inner-loop Phase 1 (formulate) deliverable for the milestone Phase 1 of
the aish project. Drafts docs/PHASE1.md to specify what lands on top of
the Phase 0 substrate — no code changes, no §3 invariant amendments.

Phase 1 milestone scope per PHASE0.md §11:
  1. SSE streaming via libcurl FFI (existing WRITEFUNCTION hook)
  2. PTY-backed exec via forkpty(3); replaces popen + retires the §7
     sentinel exit-code workaround in favor of waitpid
  3. Session persistence as append-only JSONL under
     <config.history.dir>/sessions/<utc>.jsonl
  4. Readline custom bindings (rl_bind_keyseq); Phase 1 reserves \C-n
     as a no-op for Phase 3's Norris consumer

Module growth (no new file names beyond the §4-stubs):
  ffi/curl     -> M.post_sse(url, body, headers, on_event)
  ffi/pty      -> M.spawn / read / write / close / wait
  ffi/libc     -> waitpid + WEXITSTATUS + tcgetattr/tcsetattr
  ffi/readline -> M.bind(seq, fn)
  broker       -> M.chat_stream; M.chat becomes a buffering wrapper
  executor     -> PTY path; sentinel hack deleted
  repl         -> :save, :resume <name>, :sessions; streaming render
  renderer     -> assistant_delta + assistant_flush
  history      -> open / append / load / list_sessions

Open questions Q11–Q16 (six new) tracked in §10:
  - SSE shape uniformity across OpenRouter routes (Q11, Phase 7)
  - CMD: highlight-on-stream strategy (Q12, plan phase)
  - tty raw-mode recovery on Lua error (Q13, plan phase)
  - bind \C-n now or defer to Phase 3 (Q14, plan phase)
  - :resume into non-empty context (Q15, plan phase)
  - session-log fsync policy (Q16, default close-only; tracked)

Next inner phase is "analyze": for each module change, identify
dependencies + risks + per-commit ordering. Then baseline (capture
Phase 0 behaviors we want to preserve), plan, review, implement, verify,
memory-update.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 18:56:20 +00:00
marfrit 16490e6905 fix: buffer exec output for next user turn; alternation for strict templates
User-test surfaced the bug: with `deep` (mistral-nemo-12b) active,
running `list files` -> y on `CMD: ls` -> `Are there directory entries
beginning with "lor"?` returned a Jinja exception:

    api: ... Error: Jinja Exception: After the optional system message,
    conversation roles must alternate user/assistant/user/assistant/...

Cause: §6 specified "exec output injected into context uses role 'user'
with a prefix tag '[exec output]'." This works for permissive templates
(qwen2.5-coder-1.5b, the `fast` preset) but produces a back-to-back
user/user pair on strict templates that enforce the OpenAI alternation
contract — `[exec output]` user turn followed by the user's actual
follow-up question.

Fix:

context.lua:
  - new field `pending_exec_output` (initially nil)
  - new method `:append_exec_output(out)` buffers (concat on subsequent
    captures so multi-shell-then-ai still merges everything)
  - new method `:append_user(content)` flushes buffered exec output as
    a `[exec output]\n...\n\n` prefix and appends a user turn
  - `:reset()` also clears the buffer

repl.lua:
  - run_shell calls ctx:append_exec_output(out) instead of
    ctx:append({role="user", content="[exec output]\n"..out})
  - ask_ai calls ctx:append_user(text) instead of raw :append; saves
    prev_pending so a broker error can restore the buffer for retry

PHASE0.md §6:
  - amended the role-injection paragraph to describe the buffer-and-
    prepend policy; the §3 invariants list is untouched (this was a §6
    design detail, not a locked invariant)

Verification:
  - context unit tests cover: alternation after the failing sequence,
    multi-shell merge, reset clears buffer, broker-error retry path
  - live reproduction against `deep` (mistral-nemo) of the exact
    user-reported sequence succeeds; model responds with a sensible
    `CMD: ls | grep '^lor'` instead of a Jinja exception

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 18:41:21 +00:00
marfrit a76ff664b3 phase0 amendment: §3/§7/§10 close review-surfaced manifest gaps
Three additions to PHASE0.md, all surfaced by the Phase 5 review of
the Phase 0 implementation. No invariant changes; manifest now matches
implementation reality.

§3 — FFI loader fallback paragraph. ffi.load("name") needs the
unversioned `libname.so` symlink that comes with the -dev package.
Phase 0 loaders try unversioned first then versioned sonames so
runtime-only hosts (no -dev) work as-is. Documents the actual
behavior in ffi/readline.lua and ffi/curl.lua.

§7 — LuaJIT 2.1 popen-close caveat paragraph. The §7 sketch had been
showing Lua 5.2's three-return io.popen():close() shape; LuaJIT 2.1
follows the Lua 5.1 ABI and returns just `true`. Phase 0 recovers
the exit status with a sentinel echo (`echo __AISH_EXIT_<tag>__$?`).
Phase 1 PTY+waitpid replaces the hack and the sketch becomes
accurate. Sketch left as-is (it's the right shape conceptually);
caveat now explicit.

§10 — cwd-relative package.path note. Phase 0 prepends `./?.lua;
./vendor/?.lua`, so aish must run from the repo root. Cwd-independent
resolution is a later concern. Also clarifies that --config is strict
(no fallback if the path is unopenable) — matches main.lua post the
review-followup commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 17:44:20 +00:00
marfrit 2704edd57d phase0 amendment: vendor dkjson 2.8 under vendor/
Captures the JSON-library decision noted as open in CLAUDE.md §6.
dkjson is pure Lua (preserves §3's "no compiled extensions" invariant),
single file, redistributable (MIT/X11). Sourced from Debian's `lua-dkjson`
package (/usr/share/lua/5.1/dkjson.lua, version 2.8) — Debian's curated
copy of the upstream at dkolf.de.

Vendoring (rather than relying on a system lua-dkjson install) keeps
aish self-contained per the §3 "no luarocks packages" invariant: any
host with luajit can run the tree as-is.

PHASE0.md §3 grows one row recording the choice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 11:30:16 +00:00
claude-noether e1d1931006 phase0 review: tighten phase 2 row + add Q9, Q10, sharpen Q6
Captures three findings from the review of 013c625 ("phase0 amendment:
insert MCP phase 2"). Opening as a PR rather than direct-to-main: the
non-PR-flow convention works fine for autonomous work, but feedback-
required iteration needs a readable medium that isn't the Claude Code
transcript.

§11 phase 2 row: spell out two scope items the original row left implicit —
the system-prompt rewrite to declare the tools schema (Phase 0's `CMD:`
contract is hard-coded into the prompt) and `safety.lua` extension to
gate tool calls (per Q8).

§13 Q6: explicit note that choosing "retire `CMD:`" requires a §3
invariant amendment in the same commit — keeps the substrate-vs-phase
boundary honest. Adds (§3 if retiring) to the impact column.

§13 Q9 (new): MCP system-prompt augmentation locus — static block in
broker.lua / per-request assembly from connected servers / hybrid.
Real architectural call with token-cost tradeoff per option.

§13 Q10 (new): tool-call streaming vs the Phase 1 SSE substrate —
phase-ordering question. Either Phase 2 lands on the blocking Phase 0
broker and refits when SSE arrives, or Phase 1 SSE moves before MCP
so tool-call deltas stream from day one.
2026-05-10 06:06:14 +00:00
marfrit ca8ff107c7 docs: fix Phase-N references stale after MCP renumber
Sweep four call-sites pointing at the wrong phase number:

- README.md:19 — Norris mode "Phase 2" → Phase 3 (renumbered by 013c625)
- README.md:62 — safety.lua "Phase 2+" → Phase 3+ (same renumber)
- PHASE0.md:58 — safety.lua "(Phase 1)" → (Phase 3) (was wrong pre-013c625
  too — referenced Phase 1 when Norris was actually Phase 2)
- PHASE0.md:214 — Norris-mode prompt example "(Phase 1)" → (Phase 3)
  (same pre-existing wrong reference)

Caught by review of 013c625. No semantic change; mechanical phase-number
sweep only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 05:43:58 +00:00
marfrit 013c6257f2 phase0 amendment: insert MCP phase 2, renumber subsequent phases
MCP/tool-calling lands as a distinct phase, before Norris mode so the
autonomous planner has tools as substrate. lmcp speaks MCP standard
JSON-RPC 2.0 over HTTP/SSE — fits the existing libcurl FFI plan; tool
calls ride the OpenAI-compatible `tools` field on /v1/chat/completions,
so the §6 broker contract is unchanged at the transport level.

§8: tokenization concern bumped Phase 2 → Phase 3 (still tracks Norris).
§11: Norris→3, memory→4, routing→5, tree-sitter→6.
§13: Q1/Q2/Q3/Q5 phase numbers tracked the renumber; added Q6 (CMD: vs
tools coexistence), Q7 (server discovery), Q8 (tool-call auth gate).

No §3 invariant broken. No code touched — Phase 0 implementation per
the locked manifest is still the next move.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 05:37:58 +00:00
claude-noether 4310207738 Phase 0: scaffold tree + manifest
- README, .gitignore, CLAUDE.md (project conventions)
- docs/PHASE0.md — full Phase 0 manifest (locked substrate)
- 10 root .lua modules + 4 ffi/ bindings, all stubs raising NotImplemented
  with module-scoped responsibilities matching the manifest
- config.lua wired to current dirac/hossenfelder endpoints (qwen-coder-7b
  snappy/32k + cloud via OpenRouter through hossenfelder)

File names match docs/PHASE0.md §4 exactly. Module bodies fill in across
later phases; the tree shape is locked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 23:16:07 +00:00