Files
aish/context.lua
marfrit a3c1813465 context: proactive periodic summarization (closes #101)
Closes #101 (FR-A from the 2026-05-17 German strategy analysis,
small-model improvement strategy 5: "History-Zusammenfassung via
local").

Phase 5 summarize-on-evict only fires at budget pressure — exactly
when the local model is already suffering. Small models benefit
from tight context from turn 1, not "after eviction". This commit
adds CADENCE-triggered summarization that fires every N appends
regardless of budget, folding turns older than `summarize_keep_recent`
into ctx.summary via the existing Phase 5 summarize_fn closure.

context.lua additions:

- New ctx fields: summarize_every_n_turns, summarize_keep_recent
  (default 4), _turns_since_summarize (counter).
- Context:append bumps the counter on every store.
- Context:enforce_cadence — the new entry point. Returns the
  number of turns folded (0 on no-op). Guards:
    * disabled (cfg unset OR summarize_fn unset) -> 0
    * not yet due (_turns_since_summarize < N) -> 0
    * Norris-active (Phase 5 R-C4 parity — planner stays on goal) -> 0
    * #turns <= keep_recent (nothing to fold) -> 0
    * summarize_fn returns nil/empty -> 0 (defer to enforce_budget later)
  Orphan-tool guard: when the fold slice would end on an
  assistant-with-tool_calls, peel back the right edge until the
  next live turn isn't role=tool. Strict chat templates reject
  tool-without-assistant-anchor (#87 already encountered this).
- If ctx.summary grows past max_summary_chars after the fold,
  compress in a second pass (same shape as enforce_budget's
  Phase 5 logic).

repl.lua wiring:

- ctx_opts continues to copy all config.context keys; the new
  summarize_every_n_turns / summarize_keep_recent fields flow
  through automatically.
- make_summarize_fn is now wired when EITHER summarize_on_evict
  OR summarize_every_n_turns is set (same closure, different
  trigger — Phase 5's #51 #issue eviction path uses it on budget;
  #101 uses it on cadence).
- New status_cadence_fold helper: "[aish] proactively summarized N
  older turns".
- ask_ai's existing enforce_budget call site now first fires
  enforce_cadence, then enforce_budget. Cadence comes first so
  the token estimate enforce_budget sees is the tighter post-fold
  one — no spurious eviction of turns we just summarized.
- Norris path NOT wired: enforce_cadence is a no-op there via the
  norris_active guard (consistent with Phase 5 R-C4).

18 inline unit cases for enforce_cadence:
  - cfg disabled / no summarize_fn / below cadence -> 0
  - cadence met -> exact fold count (N - keep)
  - summary contains folded contents; first/last live turn IDs match
  - cadence counter resets; second fold fires after another N appends
  - Norris-active -> suppressed
  - orphan-tool guard: peels back when last folded = asst+tool_calls
  - summary compression triggers when over max_summary_chars

E2E verified on hossenfelder:8082, summarize_every_n_turns=4 /
summarize_keep_recent=2:
  5 user turns -> 2 cadence fires:
    [aish] proactively summarized 2 older turns
    [aish] proactively summarized 4 older turns
  :cost detail shows main=5 calls, summarize=2 calls (matches fires).
  Estimated ctx token count: 180 (vs ~1000 unsummarized).

Flag-off path: no status, identical to pre-#101 behavior.

Regression: 87/87 safety, 31/31 router_model, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 09:20:56 +00:00

620 lines
27 KiB
Lua

-- context.lua — in-memory conversation history + token budget.
-- Phase 0: ordered turn list, sliding-window eviction by max_turns.
-- Tokenization is char/4 heuristic in Phase 0; accurate count is Phase 3 (Q1).
-- Phase 2 (added 2026-05-12): support for `role:"tool"` turns and assistant
-- turns carrying `tool_calls = [...]`, plus a `use_tool_role` rendering
-- toggle for the strict-chat-template fallback path (Q18).
-- See docs/PHASE0.md §6, §8 and docs/PHASE2.md §3 / §5.
local M = {}
-- The §6 default system prompt. The `CMD: ` (exact prefix, single space)
-- contract is locked per §3 invariants — do not edit without amending PHASE0.
-- Phase 2 appends ~4 lines about MCP tools per PHASE2.md §8 (hybrid:
-- static frame here + dynamic tools list in the request body). The block
-- is always present even when no MCP servers are configured — the cost
-- is ~60 tokens and the model just sees instructions that don't apply.
local DEFAULT_SYSTEM_PROMPT = [[
You are aish, an AI-augmented shell assistant. You help the user execute shell
commands, write and debug code, and re-engineer software. When suggesting shell
commands, output them on a line beginning with exactly "CMD: " so aish can
identify and optionally execute them. Be concise. Prefer concrete actions over
explanations unless asked.
You may have access to MCP tools — they appear in this request's `tools` field.
Call a tool by emitting a tool_call; the result will be supplied in the next
turn. Use tools for structured operations (file reads, queries, etc.) and
`CMD:` lines for local shell commands. Prefer tools when available; fall back
to `CMD:` for anything not exposed as a tool.]]
local Context = {}
Context.__index = Context
function M.new(opts)
opts = opts or {}
return setmetatable({
system_prompt = opts.system_prompt or DEFAULT_SYSTEM_PROMPT,
turns = {},
pending_exec_output = nil, -- buffered until next user turn (§6)
max_turns = opts.max_turns or 40,
token_budget = opts.token_budget or 4096,
-- Phase 2: tool-role rendering toggle. true = emit OpenAI-standard
-- role:"tool" messages from to_messages(); false = collapse
-- assistant+tool_calls and tool turns into a single assistant text
-- turn for chat templates that reject the role:"tool" shape.
-- Default true per PHASE2.md §12 "Q18 default"; flip from caller.
use_tool_role = (opts.use_tool_role == nil) and true
or opts.use_tool_role,
-- Phase 5: summarize-on-evict. When set, enforce_budget calls
-- summarize_fn(prior_summary, evicted_turns) -> string | nil
-- and updates ctx.summary instead of silently dropping turns.
-- Callback contract per PHASE5.md R-B1:
-- (nil, [turns]) → first-time summarize
-- (str, [turns]) → additive: extend prior summary with new turns
-- (str, nil) → compress: re-summarize the prior summary
-- Returns nil → fall back to silent eviction (Phase 0 behavior).
summarize_fn = opts.summarize_fn,
summary = nil, -- rolling summary string
max_summary_chars = opts.max_summary_chars or 2000,
-- #101: proactive periodic summarization (cadence-triggered,
-- in addition to Phase 5's eviction-triggered path). When
-- summarize_every_n_turns is set AND summarize_fn is wired,
-- enforce_cadence() folds turns older than the last
-- summarize_keep_recent into ctx.summary every N appends.
-- Goal: keep the wire prompt tight from the start so small
-- local models aren't fed near-budget context until eviction
-- forces a fold. nil = disabled (existing behavior).
summarize_every_n_turns = opts.summarize_every_n_turns,
summarize_keep_recent = opts.summarize_keep_recent or 4,
_turns_since_summarize = 0,
-- Phase 6 (#issue Phase 6 §6): project file-tree block, set by
-- repl.lua via :tree meta or the cfg.project.auto_tree startup
-- hook. nil = no block injected. Cached scan opts (depth /
-- max_chars overrides) live on _project_opts for :tree refresh.
project = nil,
_project_opts = nil,
-- Phase 7 (docs/PHASE7.md): cost/usage accumulator. Keyed as
-- usage_totals[model_name][category] -> { prompt, completion,
-- calls, cost, is_local }. is_local (R6) is a sticky flag
-- set when ANY recorded usage for the slot had cost==nil
-- (preserves local-vs-cloud-zero distinction for :cost detail
-- annotation). cost_warn_state (R4) carries per-threshold
-- one-shot flags so warn_at_dollars firing doesn't suppress
-- warn_at_tokens. Both survive :reset (R8 parity).
usage_totals = {},
cost_warn_state = { dollars = false, tokens = false },
-- Phase 8 (docs/PHASE8.md): optional tokenize callback. When
-- set, Context:estimate_tokens uses it (with a per-turn cache
-- on turn._tokens for amortization). nil = char/4 fallback
-- (Phase 0 §8 — existing behavior, no change).
tokenize_fn = opts.tokenize_fn,
}, Context)
end
-- Append a turn. Phase 2 widens what's valid:
-- role="user" content (string) required
-- role="system" content (string) required (callers shouldn't add system
-- turns directly; system prompt is stored separately and
-- prepended at to_messages time per §6)
-- role="assistant" content may be empty IF tool_calls is non-empty;
-- otherwise content required
-- role="tool" tool_call_id required + content required; the preceding
-- stored turn must be an assistant turn with non-empty
-- tool_calls (debug assertion catches sub-loop bugs early
-- per PHASE2.md §3 row + N4 in review)
function Context:append(turn)
assert(type(turn) == "table" and turn.role,
"context:append requires { role = ... }")
local stored = { role = turn.role, content = turn.content or "" }
if turn.role == "assistant" and turn.tool_calls and #turn.tool_calls > 0 then
stored.tool_calls = turn.tool_calls
elseif turn.role == "tool" then
assert(turn.tool_call_id, "context:append role=tool requires tool_call_id")
assert(turn.content, "context:append role=tool requires content")
-- A tool turn may follow either an assistant-with-tool_calls (the
-- first reply in the sub-loop) or another tool turn (subsequent
-- replies when the assistant emitted multiple parallel tool_calls).
-- Walk back through tool turns until we hit a non-tool; that turn
-- must be an assistant with non-empty tool_calls.
local j = #self.turns
while j > 0 and self.turns[j].role == "tool" do j = j - 1 end
local anchor = self.turns[j]
assert(anchor and anchor.role == "assistant"
and anchor.tool_calls and #anchor.tool_calls > 0,
"context:append role=tool must follow assistant with tool_calls "
.. "(possibly via prior tool turns in the same sub-loop)")
stored.tool_call_id = turn.tool_call_id
else
assert(turn.content, "context:append requires content for role=" .. turn.role)
end
self.turns[#self.turns + 1] = stored
-- #101: bump cadence counter so enforce_cadence knows when to fire.
self._turns_since_summarize = (self._turns_since_summarize or 0) + 1
end
-- Buffer captured shell-exec output. Per §6 (post user-test fix), exec output
-- is NOT appended as its own user turn — strict chat templates (e.g. mistral-
-- nemo's Jinja) reject the resulting user/user back-to-back. Instead it is
-- held until the next user turn arrives, then prepended via :append_user.
function Context:append_exec_output(out)
if not out or out == "" then return end
local block = "[exec output]\n" .. out
if self.pending_exec_output then
self.pending_exec_output = self.pending_exec_output .. "\n" .. block
else
self.pending_exec_output = block
end
end
-- Append a user turn, flushing any pending exec output as a prefix. Use this
-- (rather than raw :append) for any turn whose role is "user".
function Context:append_user(content)
if self.pending_exec_output then
content = self.pending_exec_output .. "\n\n" .. content
self.pending_exec_output = nil
end
self:append({ role = "user", content = content })
end
-- Compact JSON-ish rendering used by the fallback (use_tool_role=false) path
-- to convert a tool_calls + tool-result pair into inline text. Not OpenAI-
-- standard — only used when a strict chat template rejects role:"tool".
local function inline_tool_call(call, result_content)
return ("[tool: %s]\n%s\n[result]\n%s")
:format(call.name or "?",
tostring(call.arguments or ""),
tostring(result_content or ""))
end
-- Render the messages array for broker.chat (system prompt prepended; turns
-- in order). Phase 2 adds two emission modes:
--
-- use_tool_role = true (default): pass through OpenAI-standard
-- {role:"assistant", content, tool_calls} and {role:"tool", tool_call_id,
-- content} turns unchanged.
--
-- use_tool_role = false (fallback, Q18): collapse each
-- assistant-with-tool_calls + its following role:"tool" turn(s) into a
-- single assistant text turn carrying the synthesized "[tool: name]\n
-- <args>\n[result]\n<content>" body. The role:"tool" turns and the
-- tool_calls field are NOT emitted. Same logical alternation seen by the
-- model (user → assistant → user → assistant), no strict-template breakage.
--
-- The system prompt is NOT stored in self.turns per §6.
-- Phase 4: [background] block composer. Memory items from memory.jsonl
-- are stored on self.memory_items (loaded by repl.lua at startup) and
-- rendered as a dim-styled suffix on the system prompt. Suppressed when
-- norris_active to avoid stacking large background contexts in
-- per-iteration broker calls (R-C1 review fold-in). Cap honored via
-- inject_max_chars argument from the caller (already truncated by repl).
local function compose_background(items)
if not items or #items == 0 then return "" end
local lines = { "", "", "[background] (memory.jsonl; manage via :memory)" }
for _, it in ipairs(items) do
lines[#lines + 1] =
("- (%s) %s"):format(it.kind or "?", (it.content or ""):gsub("\n", " "))
end
return table.concat(lines, "\n")
end
-- Phase 5 R-C4: summary block composer. Mirrors the [background]
-- pattern; suppressed under Norris (callers already guard, but the
-- function returns "" for empty input regardless).
local function compose_summary(summary_text)
if not summary_text or summary_text == "" then return "" end
return "\n\n[earlier conversation summary]\n" .. summary_text
end
-- Phase 6: project file-tree composer. Inserted between [background]
-- and [earlier summary] so the reading order is memory facts →
-- project tree → earlier conversation → NORRIS suffix. Same Norris-
-- suppression rule (callers gate via self.norris_active).
local function compose_project(project_text)
if not project_text or project_text == "" then return "" end
return "\n\n[project]\n" .. project_text
end
-- Phase 3: NORRIS MODE suffix appended to the system prompt when
-- self.norris_active. Carries self.norris_goal so eviction of the
-- user's "[norris] goal: ..." turn doesn't lose the anchor.
local NORRIS_SUFFIX_TEMPLATE = [[
[NORRIS MODE] You are operating autonomously toward the following goal:
%s
Plan and execute step by step using CMD: lines (for shell) or tool_calls
(when MCP tools are available). After each action, you will see its
result in the next turn. Re-plan based on what you observe.
When the goal is achieved, emit a single line:
GOAL: complete
on its own line, optionally followed by a brief summary.
If the goal is unreachable or you need user input, emit:
GOAL: blocked
with a one-line reason.
Avoid destructive operations unless the goal explicitly requires them.
The user will be prompted to confirm destructive actions; expect their
verdict in the next turn as a synthesized "[aish] ... skipped by user"
message if they declined.]]
-- Phase 10 / #89: optional task-hint block appended AFTER the NORRIS
-- suffix when the cloud preplanner emitted a TASK list at :norris
-- launch. self.norris_tasks shape: { current = 1, list = {...} }.
-- Returns "" when no tasks (preplan disabled OR preplan failed OR
-- list exhausted) — keeps the NORRIS suffix backward-compatible.
local function compose_norris_task_hint(self)
if not (self.norris_tasks and self.norris_tasks.list) then return "" end
local k = self.norris_tasks.current
local n = #self.norris_tasks.list
local task = self.norris_tasks.list[k]
if not task then return "" end -- exhausted → no hint
return string.format("\n\nCurrent step %d/%d:\n %s", k, n, task)
end
-- #87: route-aware context compression. Keeps the LAST keep_turns
-- turns; tail-truncates any turn whose content exceeds max_turn_chars.
-- Drops tool turns at the slice head (they'd be orphaned without
-- their assistant-with-tool_calls anchor; strict chat templates
-- reject the resulting tool-without-anchor shape). Returns a new
-- list of turn-shaped tables; self.turns is NEVER mutated.
local function _compress_turns(turns, keep_turns, max_chars)
local n = #turns
if keep_turns and n > keep_turns then
-- start index is the first turn we keep
end
local start = math.max(1, n - (keep_turns or 2) + 1)
-- Drop orphan tool turns at the head.
while start <= n and turns[start].role == "tool" do
start = start + 1
end
local out = {}
for i = start, n do
local t = turns[i]
local c = t.content or ""
if max_chars and #c > max_chars then
out[#out + 1] = {
role = t.role,
content = c:sub(-max_chars),
tool_calls = t.tool_calls,
tool_call_id = t.tool_call_id,
}
else
out[#out + 1] = t -- ref the existing turn; no copy needed
end
end
return out
end
function Context:to_messages(opts)
-- Phase 10 (#86): per-call system_prompt_override. Replaces the
-- BASE system_prompt for THIS render only (state unchanged); the
-- dynamic blocks ([background], [project], [earlier summary],
-- NORRIS suffix) still compose on top. Used by ask_ai's routing
-- path when cfg.routing.system_prompts[class] is set — gives
-- small local models tighter instructions while preserving
-- ambient memory/project context.
local sys_content = (opts and opts.system_prompt_override)
or self.system_prompt
-- Phase 4 [background] memory block + Phase 6 [project] file-tree
-- block + Phase 5 [earlier summary] block. All suppressed during
-- Norris (R-C1 / R-C4 — avoid redundant tokens per planning
-- iteration; planner stays focused on its goal anchor).
if not self.norris_active then
sys_content = sys_content .. compose_background(self.memory_items)
sys_content = sys_content .. compose_project(self.project)
sys_content = sys_content .. compose_summary(self.summary)
end
-- Phase 3 NORRIS MODE suffix. Last block so its instructions dominate.
if self.norris_active and self.norris_goal then
sys_content = sys_content
.. string.format(NORRIS_SUFFIX_TEMPLATE, self.norris_goal)
.. compose_norris_task_hint(self)
end
local msgs = { { role = "system", content = sys_content } }
-- #87: route-aware compression. When opts.compress is set, swap
-- the turn iterable for a truncated copy. self.turns unchanged
-- (this is a per-render transformation; persistence + display
-- via :history see the full context).
local turns = self.turns
if opts and opts.compress then
turns = _compress_turns(self.turns,
opts.compress.keep_turns or 2,
opts.compress.max_turn_chars or 800)
end
if self.use_tool_role then
for _, t in ipairs(turns) do
local m = { role = t.role, content = t.content }
if t.role == "assistant" and t.tool_calls then
-- OpenAI shape wraps each call as
-- {id, type:"function", function:{name, arguments}}.
local oai = {}
for i, c in ipairs(t.tool_calls) do
oai[i] = {
id = c.id,
type = "function",
["function"] = { name = c.name,
arguments = c.arguments or "" },
}
end
m.tool_calls = oai
elseif t.role == "tool" then
m.tool_call_id = t.tool_call_id
end
msgs[#msgs + 1] = m
end
return msgs
end
-- Fallback path: walk turns, collapse asst-with-tool_calls + following
-- tool turns into a single asst text turn. Merge consecutive assistant
-- turns afterward so the trailing post-tool-result assistant text
-- doesn't produce asst/asst back-to-back (which strict templates would
-- also reject — same gotcha PHASE0.md §6 warned about for user/user).
local function push_or_merge_assistant(content)
local last = msgs[#msgs]
if last and last.role == "assistant" then
last.content = last.content .. "\n" .. content
else
msgs[#msgs + 1] = { role = "assistant", content = content }
end
end
-- #87: same compressed `turns` view used by the fallback path.
local i = 1
while i <= #turns do
local t = turns[i]
if t.role == "assistant" and t.tool_calls then
local parts = {}
if t.content and t.content ~= "" then
parts[#parts + 1] = t.content
end
for ci, call in ipairs(t.tool_calls) do
local result_text = ""
local next_t = turns[i + ci]
if next_t and next_t.role == "tool"
and next_t.tool_call_id == call.id then
result_text = next_t.content
end
parts[#parts + 1] = inline_tool_call(call, result_text)
end
push_or_merge_assistant(table.concat(parts, "\n"))
i = i + 1 + #t.tool_calls
elseif t.role == "tool" then
-- Orphan tool turn (no preceding asst-tool_calls captured it).
-- Shouldn't happen given the :append assertion, but defensively
-- drop it rather than emit a malformed message.
i = i + 1
elseif t.role == "assistant" then
push_or_merge_assistant(t.content or "")
i = i + 1
else
msgs[#msgs + 1] = { role = t.role, content = t.content }
i = i + 1
end
end
return msgs
end
-- #101: proactive periodic summarization. Fires every
-- summarize_every_n_turns appends, folding turns older than the last
-- summarize_keep_recent into ctx.summary via summarize_fn. Returns
-- the number of turns folded (0 if disabled / not yet due / nothing
-- to fold / Norris-mode / callback failed).
--
-- Norris suppression (Phase 5 R-C4 parity): the planner stays
-- focused on its goal anchor — folding history mid-loop would
-- change its perceived progress.
--
-- Orphan-tool guard: never fold an assistant-with-tool_calls turn
-- without its matching role=tool turn(s). When the slice would end
-- on such an assistant, peel back until it doesn't (the unfolded
-- tail then becomes part of the live window — temporarily larger
-- than summarize_keep_recent, but chat-template-legal).
function Context:enforce_cadence()
if self.norris_active then return 0 end
if not self.summarize_fn then return 0 end
if not self.summarize_every_n_turns then return 0 end
if (self._turns_since_summarize or 0) < self.summarize_every_n_turns then
return 0
end
local keep = self.summarize_keep_recent or 4
local n = #self.turns
if n <= keep then return 0 end
local fold_count = n - keep
-- Orphan-tool guard: peel back from the right edge of the fold
-- slice while the last folded turn is assistant-with-tool_calls.
while fold_count > 0 do
local last = self.turns[fold_count]
if last and last.role == "assistant"
and last.tool_calls and #last.tool_calls > 0 then
fold_count = fold_count - 1
else
break
end
end
if fold_count == 0 then return 0 end
local pair = {}
for i = 1, fold_count do pair[i] = self.turns[i] end
local ok, new_summary = pcall(self.summarize_fn, self.summary, pair)
if not ok or type(new_summary) ~= "string" or new_summary == "" then
return 0 -- failure: leave turns; eviction will handle them later
end
self.summary = new_summary
if #self.summary > self.max_summary_chars then
local ok2, compressed = pcall(self.summarize_fn, self.summary, nil)
if ok2 and type(compressed) == "string" and compressed ~= "" then
self.summary = compressed
end
end
for _ = 1, fold_count do table.remove(self.turns, 1) end
self._turns_since_summarize = 0
return fold_count
end
-- Evict the oldest pair (user + assistant) while we exceed max_turns
-- OR token_budget (Phase 8 pillar 5). Returns total turns evicted.
-- Caller is responsible for rendering the §8 status line.
--
-- R2 guard: when system_prompt alone exceeds token_budget, the OR
-- condition stays true even when turns are empty — would spin
-- forever calling table.remove on a 0-length list. The `and
-- #self.turns > 0` clause ensures we exit when there's nothing
-- left to evict. Over-budget system_prompts (large [project]
-- blocks, etc.) are then on the user to shrink via :tree off /
-- :memory clear / etc.
function Context:enforce_budget()
local evicted = 0
while (#self.turns > self.max_turns
or self:estimate_tokens() > self.token_budget)
and #self.turns > 0 do
-- Collect evicted slice (pair: user + assistant)
local pair = {}
pair[#pair + 1] = self.turns[1]
if #self.turns >= 2 then pair[#pair + 1] = self.turns[2] end
-- Phase 5: ask the summarize callback (if wired) to absorb this
-- slice into the rolling summary. Callback contract per R-B1:
-- summarize_fn(prior_summary, evicted_turns) -> string | nil
-- nil return → silent eviction (Phase 0 behavior).
if self.summarize_fn then
local ok, new_summary = pcall(self.summarize_fn, self.summary, pair)
if ok and type(new_summary) == "string" and new_summary ~= "" then
self.summary = new_summary
-- R-C1: if grown past cap, compress in a second pass.
if #self.summary > self.max_summary_chars then
local ok2, compressed = pcall(self.summarize_fn,
self.summary, nil)
if ok2 and type(compressed) == "string"
and compressed ~= "" then
self.summary = compressed
end
end
end
end
-- Remove the pair from turns (matches Phase 0 visible effect)
table.remove(self.turns, 1)
evicted = evicted + 1
if #self.turns > 0 and (#self.turns > self.max_turns
or evicted % 2 == 1) then
table.remove(self.turns, 1)
evicted = evicted + 1
end
end
return evicted
end
-- Phase 0 §8: char/4 heuristic. Phase 8 (Q1 resolved): when
-- self.tokenize_fn is set, use it for accuracy. Per-turn _tokens
-- cache amortizes after the first count.
--
-- system_prompt is recomposed each call (memory/project/summary
-- blocks are dynamic), so it's not cached — one tokenize round-trip
-- per call when tokenize_fn is active.
--
-- Turn content is immutable after append (see Context:append; we
-- never mutate stored turns). The cache on t._tokens is therefore
-- safe to live forever on the turn; it dies with the turn on :reset.
function Context:estimate_tokens()
if self.tokenize_fn then
local n = self.tokenize_fn(self.system_prompt)
for _, t in ipairs(self.turns) do
if t._tokens == nil then
t._tokens = self.tokenize_fn(t.content)
end
n = n + t._tokens
end
return n
end
-- char/4 fallback (Phase 0 behavior, unchanged when tokenize_fn nil)
local n = #self.system_prompt
for _, t in ipairs(self.turns) do
n = n + #t.content
end
return math.floor(n / 4)
end
-- Phase 7: cost/usage accumulator helpers.
--
-- Context:add_usage(model_name, category, usage)
-- Increment the (model, category) slot. usage is the payload from
-- broker.lua's on_delta("usage", ...): { prompt_tokens, completion_
-- tokens, total_tokens, cost (nil for local per R6), model, category }.
-- We use the model_name + category args (not the payload fields)
-- because the caller may want to normalize (e.g., key by req_cfg
-- alias rather than model_cfg.model).
function Context:add_usage(model_name, category, usage)
model_name = model_name or "?"
category = category or "main"
self.usage_totals = self.usage_totals or {}
local m = self.usage_totals[model_name] or {}
local c = m[category] or {
prompt = 0, completion = 0, calls = 0, cost = 0,
-- R6: sticky flag; set once any nil-cost usage lands here.
is_local = false,
}
c.prompt = c.prompt + (usage.prompt_tokens or 0)
c.completion = c.completion + (usage.completion_tokens or 0)
c.calls = c.calls + 1
if usage.cost == nil then
c.is_local = true -- preserves local-vs-cloud-zero per R6
else
c.cost = c.cost + usage.cost
end
m[category] = c
self.usage_totals[model_name] = m
end
function Context:total_cost()
local total = 0
for _, m in pairs(self.usage_totals or {}) do
for _, c in pairs(m) do total = total + (c.cost or 0) end
end
return total
end
-- Returns (prompt_tokens, completion_tokens) summed across all slots.
function Context:total_tokens()
local p, comp = 0, 0
for _, m in pairs(self.usage_totals or {}) do
for _, c in pairs(m) do
p = p + (c.prompt or 0)
comp = comp + (c.completion or 0)
end
end
return p, comp
end
-- :cost reset path — zero accumulator AND clear per-threshold one-shot flags.
function Context:reset_usage()
self.usage_totals = {}
self.cost_warn_state = { dollars = false, tokens = false }
end
function Context:reset()
self.turns = {}
self.pending_exec_output = nil
self.summary = nil
-- Phase 10 R6: clear norris_tasks defensively. :reset is
-- unreachable mid-Norris (no readline prompt while the planner
-- runs), but if a Norris session crashed leaving the field stale,
-- :reset gives the user a clean recovery path.
self.norris_tasks = nil
-- R8 parity: usage_totals + cost_warn_state preserved (matches
-- memory_items + project — "ambient context survives a user-
-- driven conversation reset"). Use :reset_usage to zero the
-- cost meter explicitly.
end
return M