Context compression on route-to-local (truncate older turns + shell-output tails) #87

Closed
opened 2026-05-16 23:49:58 +00:00 by claude-noether · 0 comments
Collaborator

Motivation

Small local models have a smaller EFFECTIVE context window than their advertised one — quality degrades well before the token cap. The most expensive parts of the typical aish context are (a) old turns the model doesn't need and (b) shell-exec output blocks that are mostly noise after the last 30 lines.

Routing-aware context compression: when a request is about to fire against a LOCAL model preset, run a compress_for_local(turns) filter that keeps last N turns and tails long tool/exec content. Cloud routes get the full context unchanged.

Proposal

In context.lua (or alongside broker call site in repl.lua):

local function compress_for_local(turns)
    -- Keep only last 2 turns
    local n = #turns
    local start = math.max(1, n - 1)
    local compressed = {}
    for i = start, n do
        local t = turns[i]
        if t.role == "tool" or (t.content and #t.content > 800) then
            t = { role = t.role, content = t.content:sub(-800) }
        end
        compressed[#compressed + 1] = t
    end
    return compressed
end

Wiring: when the routed model is in a cfg.routing.local_compress_classes set (or any model marked model_cfg.local_compress = true), the messages array sent to broker.chat_stream is compressed first. Cloud models bypass.

Constraints

  • Norris autonomous mode probably wants the full context (planner reasoning needs history) — opt-out via helpers.skip_compress.
  • Phase 5's summarize_on_evict already does something similar but over the WHOLE session; this is per-request compression for one specific broker call.
  • Tool turns (role:"tool" from MCP) lose information when tail-truncated. Document as a known trade-off; user can disable per-class.

Estimate

~2 hours. Touches context.lua (the compressor) + repl.lua (route-aware call site).

Source

Architecture analysis (2026-05-16). Listed as "medium-high" gain.

## Motivation Small local models have a smaller EFFECTIVE context window than their advertised one — quality degrades well before the token cap. The most expensive parts of the typical aish context are (a) old turns the model doesn't need and (b) shell-exec output blocks that are mostly noise after the last 30 lines. Routing-aware context compression: when a request is about to fire against a LOCAL model preset, run a `compress_for_local(turns)` filter that keeps last N turns and tails long tool/exec content. Cloud routes get the full context unchanged. ## Proposal In `context.lua` (or alongside broker call site in repl.lua): ```lua local function compress_for_local(turns) -- Keep only last 2 turns local n = #turns local start = math.max(1, n - 1) local compressed = {} for i = start, n do local t = turns[i] if t.role == "tool" or (t.content and #t.content > 800) then t = { role = t.role, content = t.content:sub(-800) } end compressed[#compressed + 1] = t end return compressed end ``` Wiring: when the routed model is in a `cfg.routing.local_compress_classes` set (or any model marked `model_cfg.local_compress = true`), the messages array sent to broker.chat_stream is compressed first. Cloud models bypass. ## Constraints - Norris autonomous mode probably wants the full context (planner reasoning needs history) — opt-out via `helpers.skip_compress`. - Phase 5's `summarize_on_evict` already does something similar but over the WHOLE session; this is per-request compression for one specific broker call. - Tool turns (role:"tool" from MCP) lose information when tail-truncated. Document as a known trade-off; user can disable per-class. ## Estimate ~2 hours. Touches `context.lua` (the compressor) + `repl.lua` (route-aware call site). ## Source Architecture analysis (2026-05-16). Listed as "medium-high" gain.
claude-noether added the feature request label 2026-05-16 23:49:58 +00:00
Sign in to join this conversation.