850251702172b8089d489553d3bcdf29f4abe0d5
11 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
8502517021 |
context: tokenize_fn + per-turn _tokens cache (Phase 8 commit #2)
Foundation for accurate Context:estimate_tokens. When the optional tokenize_fn is wired (Phase 8 commit #4 wires it from repl.lua), estimate_tokens uses it with per-turn caching for O(1) amortized cost. char/4 path unchanged when tokenize_fn nil. Changes: - Context.new accepts opts.tokenize_fn -> stored as self.tokenize_fn. - Context:estimate_tokens: if tokenize_fn nil -> existing char/4 (no behavior change). if tokenize_fn set -> - tokenize self.system_prompt every call (dynamic per compose_background/project/summary; can't cache). - for each turn: if t._tokens nil -> compute + cache; else use cached. Turn content immutable after append (we never mutate stored turns) so cache never goes stale. - :reset wipes self.turns which takes the _tokens cache with them; new turns start with t._tokens == nil and lazy-set on first count. 8/8 unit cases verified: - char/4 path unchanged when no tokenize_fn - tokenize_fn called 1+ N times on first estimate (sys + N turns) - subsequent estimates fire only 1 tokenize call (sys; turns cached) - new turn fires +1 tokenize call on next estimate - :reset + fresh turn fires fresh tokenize call (cache died with turn) No callers wire tokenize_fn yet — Phase 8 commit #4 lands the repl.lua wiring (after commit #3 adds the enforce_budget extension that's the real beneficiary of accuracy). Regression: test_safety 87/87, test_router_model 31/31. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7b4a9becc2 |
context: cost/usage accumulator (Phase 7 commit #2)
Adds the per-conversation accumulator that broker.lua's
on_delta("usage", ...) payload feeds into. No callers yet —
commit #3 wires the broker callback to ctx:add_usage in repl.lua,
commit #4 in safety.lua.
Changes:
- Context.new: new fields `usage_totals = {}` and
`cost_warn_state = { dollars = false, tokens = false }`. R4:
two independent flags so warn_at_dollars firing doesn't
suppress warn_at_tokens (or vice versa).
- Context:add_usage(model_name, category, usage):
Increments usage_totals[model_name][category] slot. R6: when
usage.cost is nil (local llama.cpp per B3), sets a sticky
`is_local = true` flag on the slot AND does NOT add to cost
(preserves the local-vs-cloud-zero distinction for :cost detail
annotation). When usage.cost is a number (cloud), accumulates.
- Context:total_cost() / total_tokens() — pure-Lua summation
across all slots; total_tokens returns (prompt, completion).
- Context:reset_usage() — explicit :cost reset path; zeros
usage_totals AND clears both flags atomically.
- Context:reset() — R8 parity: does NOT clear usage_totals OR
cost_warn_state. Matches the Phase 4 memory_items / Phase 6
project rule ("ambient context survives a user-driven
conversation reset").
Smoke verified (20/20 unit cases):
- Empty zeros; cloud cost accumulation; local nil-cost preserves
is_local=true sticky; calls counter; cost summation across
multiple cloud calls; is_local sticky after a later nil-cost
call on a cloud slot; separate slots per (model, category);
:reset preserves; :reset_usage zeros both totals and flags.
Regression: test_safety 87/87, test_router_model 31/31, repl loads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c4fc7fde01 |
context: [project] block plumbing (Phase 6 commit #1)
Foundation for Phase 6 — adds the field + composer + composition order with no callers yet. Nothing sets ctx.project; the meta hookup and startup auto-inject land in commit #2. Changes: - Context.new gains `project` (string, nil) and `_project_opts` (cached scan opts for `:tree refresh`; R7). - compose_project(text) helper mirrors compose_background / compose_summary. Returns "" for nil/empty; otherwise emits "\n\n[project]\n" + text. - to_messages inserts compose_project BETWEEN compose_background and compose_summary so the model reads memory facts -> project tree -> earlier conversation -> NORRIS suffix. - Same Norris-suppression guard as the other two dynamic blocks (R-C1 / R-C4 parity; planner stays on goal anchor). - Context:reset preserves ctx.project (R8 — matches the Phase 4 memory_items rule; startup-injected facts survive a user-driven context reset). Smoke verified (14/14 inline cases): - project nil -> no [project] block in sys_content - project set -> block present with contents - ordering: [background] < [project] < [earlier conversation summary] - norris_active suppresses all three; NORRIS suffix still appears - :reset clears turns/pending_exec_output/summary; preserves memory_items AND project Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
03497b5eea |
context: summarize-on-evict callback + summary block (Phase 5 commit #2)
Phase 5 commit #2 per docs/PHASE5.md §3 / §6. Context.new opts additions: - summarize_fn(prior_summary, evicted_turns) -> string|nil callback per R-B1 canonical signature: (nil, [turns]) → first-time summarize (str, [turns]) → additive: extend prior summary (str, nil) → compress: re-summarize the prior nil return → silent eviction (Phase 0 behavior preserved) - max_summary_chars (default 2000) — when ctx.summary grows past this, the callback is invoked AGAIN with the compress signal so the summary stays bounded across long sessions Context.summary (string|nil) is the rolling summary state. Composed into the SYSTEM MESSAGE (not as a turns[] entry — A3 resolution avoids system/system back-to-back). compose_summary() emits: [earlier conversation summary] <ctx.summary> between [background] and the NORRIS suffix. Both [background] and [earlier summary] are SUPPRESSED when ctx.norris_active (R-C4 — mirrors R-C1 from Phase 4; planner stays focused on its goal). enforce_budget() rewrite: - Collects the evicted pair before removing. - Calls summarize_fn(self.summary, pair) under pcall — wraps any callback error so a broken summarizer can't crash the REPL. - Updates self.summary if callback returned non-empty string. - If new summary exceeds max_summary_chars, invokes compress pass (callback with evicted=nil). - Removes pair from turns (same final state as Phase 0). Context:reset() clears the summary alongside turns + pending_exec_output. Smoke-tested with a mock summarizer over a 10-turn context with max_turns=4 and max_summary_chars=80: - 6 turns evicted to bring count down to 4 - Callback fired 4 times (3 additive + 1 compress when summary crossed 80 chars) - to_messages includes [earlier conversation summary] block - Under norris_active=true, summary suppressed (block absent) - :reset clears ctx.summary Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c1a5c736ec |
context: [background] memory injection block (Phase 4 commit #2)
Phase 4 commit #2 per docs/PHASE4.md §5/§12. ctx.memory_items (array of {kind, content, ...}) loaded by repl.lua at startup from history.load_memory(). When non-empty AND ctx not in Norris mode, to_messages() appends a [background] block to the system prompt: [background] (memory.jsonl; manage via :memory) - (fact) User prefers terse responses - (context) Project: aish (LuaJIT REPL) Suppression under Norris (R-C1): when ctx.norris_active is true the [background] block is omitted. Norris already anchors via its NORRIS suffix carrying the goal; a 2KB background block per planning iteration would add ~16K tokens of redundant input over an 8-step run. Suffix composition order is now: 1. DEFAULT_SYSTEM_PROMPT (Phase 0 + Phase 2 MCP, statically embedded) 2. [background] block — when memory_items non-empty AND NOT norris_active 3. NORRIS MODE block — when norris_active repl.lua wiring (memory_items population at startup, :memory meta cmds, :remember shortcut, :memory inject for live refresh) lands in commit #3. Verified composition order with 4 cases: default-only → 697 chars, no background, no norris memory_items only → 824 chars, background YES, no norris memory + norris → 1451 chars, background NO, norris YES (suppressed) norris only → 1451 chars, background NO, norris YES Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
a404b2a152 |
repl: Norris driver + \C-n + :norris/:safety meta (Phase 3 commit #5)
Phase 3 commit #5 per docs/PHASE3.md §12. Wires safety.norris_step (commit #4) into the REPL with the user-facing surface. ffi/readline.lua extensions (A1 + R-C4): - rl_insert_text + rl_redisplay added to ffi.cdef block; M.insert_text and M.redisplay wrappers exposed. - M.bind: removed `:free()` on previous callback. Now keeps every bound callback pinned for process lifetime in `_pinned` list (alongside `_bound[seq]` for current lookup). Avoids the use-after-free window between unbind and rebind that R-C4 flagged. Memory cost is bounded — one closure per key sequence binding. context.lua Norris suffix (R-C3 / §8): - to_messages() composes a dynamic NORRIS MODE block onto the system prompt when ctx.norris_active is set. The block carries ctx.norris_goal so eviction of the user's "[norris] goal:" turn doesn't lose the anchor. Returns to plain system prompt when Norris exits. repl.lua Norris driver: - prompt() now shows ⚡ marker when ctx.norris_active per PHASE0.md §9. - \C-n bound to a real handler — inserts ":norris " at the cursor (replaces Phase 1 status placeholder). - run_norris(goal) function: sets norris_active + norris_goal, appends a "[norris] <goal>" user turn, renders the banner, then loops calling safety.norris_step with an injected helpers table until a terminal status returns. Renders the closing banner. - norris_halt(): the [N] proceed/skip/abort prompt called by safety.norris_step via helpers.halt. Empty input → abort (safe). - dispatch_tool(): factored from the Phase 2 ask_ai code so safety.norris_step can call it. - norris_exec(): factored exec path for autonomous mode (skips the interactive run_shell cd-status renderer). - :norris <goal> meta — launches autonomous mode - :norris off meta — drops Norris flag (rare; usually 'abort') - :safety patterns meta — lists active is_destructive rules - :safety check <cmd> meta — probes a hypothetical command End-to-end mock-driven test: Submitted ":norris find files in /tmp" → banner → step 1 emits tool_call (auto_approved per policy) → dispatched → frame rendered → step 2 emits "GOAL: complete" → sub-loop exits → DONE banner. 2 broker invocations, no stalls. config.lua safety example block lands in commit #6. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7e9cfff04d |
repl: tool-call sub-loop + :mcp meta + system-prompt augmentation
Phase 2 commit #6 per docs/PHASE2.md §12. End-to-end wiring of the MCP tool-call flow on top of broker/safety/context/renderer/mcp. repl.lua additions: - mcp_sessions table populated from config.mcp.servers at startup. connect_mcp() helper does initialize + caches tools/list. Failures status-logged once; absent from mcp_sessions until manual reconnect (C4 — no auto-retry). - tools_schema() flattens connected sessions' tools into the OpenAI {type:"function", function:{name,description,parameters}} shape with "<alias>.<name>" namespacing. - flatten_content() concatenates content[type="text"] blocks; one-shot status warning when non-text blocks (image/resource) are dropped (§4 normative spec, v1 only handles text). - dispatch_tool_call(name, args_table) splits alias.tool, looks up session, calls. Returns (content_string, is_error). Errors of every flavor (missing alias, no session, rpc_error, transport_error) yield a synthesized "[aish] ..." string so callers always have a body for the role:"tool" turn — alternation preserved per C5/C7. - ask_ai rewritten as a sub-loop that re-issues the broker request until the model returns pure text or max_tool_depth (default 8) is hit. Each iteration: stream response → if tool_calls present, confirm-gate each → dispatch → append role:"tool" turn → continue. Argument-JSON parse failure produces a synthesized tool turn (C7). Decline at confirm produces "[aish] tool call declined by user" tool turn (alternation guarantee). - :mcp meta with sub-commands: list / tools / tool <a.n> / connect <url> [alias] / disconnect <alias>. HELP block extended. context.lua: DEFAULT_SYSTEM_PROMPT grows by ~4 lines per PHASE2.md §8 (hybrid prompt: static frame about MCP + dynamic tools list in the request body). Block is always present even when no MCP servers configured — ~60 tokens for clarity that 'CMD:' remains the fallback. CMD: extraction unchanged — runs on the FINAL pure-text response only (not on intermediate iterations of the tool sub-loop). Substrate §3 invariant preserved. End-to-end verified two ways: (1) Direct broker probe: aish's tools_schema fed through broker.chat_stream against hossenfelder → qwen-1.5b emits one tool_call payload with correct id + name="boltzmann.list_dir" + args='{"path":"/tmp"}'. Accumulator stitched the JSON-string across fragmented deltas. (2) Mocked-broker sub-loop test: ask_ai feeds 'list /tmp', mock emits text + tool_call, sub-loop dispatches against LIVE boltzmann lmcp (auto_approve via policy), 80+ files rendered inside the tool_call frame, broker re-invoked with the extended context, mock returns pure text, sub-loop terminates. Total broker invocations: 2. Known: the loaded fast model (qwen-1.5b) tends to emit "CMD: ..." suggestions even when an MCP tool is the better path; the small model's system-prompt compliance is weak. Larger models and the analyze-time direct probe confirm the tools_schema and tool_calls flow is wire-correct — Phase 7 verify will exercise this against qwen3-30b or cloud models when available. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
7c221a8aae |
context: tool turns + tool_calls on assistant; use_tool_role fallback
Phase 2 commit #3 per docs/PHASE2.md §12. Three concrete edits per §3 context.lua row (the BLOCKER-fold-in from review): (a) Loosen Context:append shape-per-role: assistant may carry empty content if tool_calls is non-empty; role:"tool" requires tool_call_id + content. (b) Preserve tool_calls / tool_call_id on store (Phase 1 :append built {role, content} only and silently dropped extras). (c) Extend to_messages() with two emission modes selected by use_tool_role: true (default) — OpenAI-standard role:"tool" + assistant turns with tool_calls (wrapped as {id, type:"function", function:{name, arguments}}). false (fallback) — collapse assistant-with-tool_calls + its following role:"tool" turns into a single assistant text turn with synthesized "[tool: name]\n<args>\n[result]\n <content>" body; merge consecutive assistant turns so the trailing post-tool-result text doesn't yield asst/asst back-to-back (same strict-template gotcha PHASE0.md §6 warned about for user/user). Alternation assert added (N4): role:"tool" turns must trace back through zero-or-more prior tool turns to an assistant-with-tool_calls. Catches sub-loop bugs at append time. Orphan tool turns rejected. pending_exec_output behavior unchanged per §3 row: buffer persists across tool-call sub-loops, flushes on next genuine user turn (B4). Smoke-tested §12 verify-row #3: (i) default mode round-trip — 5 OpenAI-shape messages, tool_calls + tool_call_id preserved. (ii) fallback mode round-trip — collapsed into 3 messages (system/user/assistant), tool_calls + role:"tool" not emitted. (iii) multi-call: 2 tool_calls in one assistant turn followed by 2 tool replies, both modes render correctly. (iv) orphan tool turn after user — assertion fires. (v) B4: pending_exec_output survives a tool sub-loop, flushes on next :append_user. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
16490e6905 |
fix: buffer exec output for next user turn; alternation for strict templates
User-test surfaced the bug: with `deep` (mistral-nemo-12b) active,
running `list files` -> y on `CMD: ls` -> `Are there directory entries
beginning with "lor"?` returned a Jinja exception:
api: ... Error: Jinja Exception: After the optional system message,
conversation roles must alternate user/assistant/user/assistant/...
Cause: §6 specified "exec output injected into context uses role 'user'
with a prefix tag '[exec output]'." This works for permissive templates
(qwen2.5-coder-1.5b, the `fast` preset) but produces a back-to-back
user/user pair on strict templates that enforce the OpenAI alternation
contract — `[exec output]` user turn followed by the user's actual
follow-up question.
Fix:
context.lua:
- new field `pending_exec_output` (initially nil)
- new method `:append_exec_output(out)` buffers (concat on subsequent
captures so multi-shell-then-ai still merges everything)
- new method `:append_user(content)` flushes buffered exec output as
a `[exec output]\n...\n\n` prefix and appends a user turn
- `:reset()` also clears the buffer
repl.lua:
- run_shell calls ctx:append_exec_output(out) instead of
ctx:append({role="user", content="[exec output]\n"..out})
- ask_ai calls ctx:append_user(text) instead of raw :append; saves
prev_pending so a broker error can restore the buffer for retry
PHASE0.md §6:
- amended the role-injection paragraph to describe the buffer-and-
prepend policy; the §3 invariants list is untouched (this was a §6
design detail, not a locked invariant)
Verification:
- context unit tests cover: alternation after the failing sequence,
multi-shell merge, reset clears buffer, broker-error retry path
- live reproduction against `deep` (mistral-nemo) of the exact
user-reported sequence succeeds; model responds with a sensible
`CMD: ls | grep '^lor'` instead of a Jinja exception
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
10848645af |
context: in-memory turn list + max_turns sliding-window eviction
Phase 0 implementation per PHASE0.md §6, §8.
Context.new(opts) constructs with the §6 default system prompt (the
`CMD: ` extraction contract is hard-coded in there per §3 — locked
substrate, do not edit). opts overrides: system_prompt, max_turns
(default 40), token_budget (default 4096; visibility only in Phase 0
per Q1, deferred to Phase 3 for accurate tokenization).
API:
ctx:append({role, content}) record a turn
ctx:to_messages() [{system,...}, ...turns] for broker.chat
ctx:enforce_budget() evict pairs (user+assistant) until
#turns <= max_turns; returns count
ctx:estimate_tokens() char/4 heuristic
ctx:reset() drop all turns (system_prompt kept)
System prompt is the §6 phrasing verbatim including the `CMD: ` clause
— stored on the context, NOT in self.turns, so it is prepended freshly
on every to_messages() call.
Smoke covers basic ops, no-evict-at-max, evict-on-overflow, bulk
eviction (14 turns -> 4), reset.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
4310207738 |
Phase 0: scaffold tree + manifest
- README, .gitignore, CLAUDE.md (project conventions) - docs/PHASE0.md — full Phase 0 manifest (locked substrate) - 10 root .lua modules + 4 ffi/ bindings, all stubs raising NotImplemented with module-scoped responsibilities matching the manifest - config.lua wired to current dirac/hossenfelder endpoints (qwen-coder-7b snappy/32k + cloud via OpenRouter through hossenfelder) File names match docs/PHASE0.md §4 exactly. Module bodies fill in across later phases; the tree shape is locked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |