Per-routing-class system_prompts (constrain small-model output) #86

Closed
opened 2026-05-16 23:49:58 +00:00 by claude-noether · 0 comments
Collaborator

Motivation

A 7B local model (qwen-coder-7b-snappy-8k etc.) has a much narrower probability distribution than cloud models. It follows precise structured instructions reliably; it drifts on natural-language tasks. Today aish routes via class (code / reasoning / default) but the SAME system prompt goes to all routes — the small model gets no help compensating for its weaker output-format adherence.

The pattern that fixes this: per-class system prompt overlays that EXPLICITLY constrain the small model.

Proposal

Extend cfg.routing with a system_prompts map keyed by class:

routing = {
    auto = true,
    classes = {
        code      = "fast",
        default   = "fast",
        reasoning = "cloud",
    },
    system_prompts = {
        code = [[You are a code assistant. Rules:
1. Output ONLY the requested code or command.
2. No prose explanation unless explicitly asked.
3. If uncertain, output the most common correct answer.
4. Wrap shell commands in CMD: prefix.
5. Max response: 200 tokens.]],

        default = [[You are a shell assistant.
Output shell commands as: CMD: <command>
Output answers as single short sentences.
Do not ask clarifying questions.]],

        -- reasoning routes to cloud; no override needed
    },
}

When router.classify_model selects a class with a system_prompts[class] entry, broker.lua substitutes/extends the system prompt for THAT request. The active model's regular system_prompt stays the default fallback.

Where it lands

  • router.lua already exposes classify_model(text, cfg) → (model, class). Return the class so the caller knows which overlay applies.
  • broker.lua reads opts.system_prompt_override (set by repl.lua based on cfg.routing.system_prompts[class]) and replaces the system message before send.
  • repl.lua wires it at the call_broker invocation in ask_ai.
  • Norris path (safety.norris_step) can opt out (planner has its own system prompt; mixing is confusing).

Estimate

~1 hour (small surface; mostly config schema + 5-line wiring change). User report estimates the ROI as "high" — reduces small-model output drift substantially.

Source

Architecture analysis (2026-05-16) summarizing why small local models underperform on aish-style tasks and which leverage points compensate.

## Motivation A 7B local model (qwen-coder-7b-snappy-8k etc.) has a much narrower probability distribution than cloud models. It follows precise structured instructions reliably; it drifts on natural-language tasks. Today aish routes via class (`code` / `reasoning` / `default`) but the SAME system prompt goes to all routes — the small model gets no help compensating for its weaker output-format adherence. The pattern that fixes this: per-class system prompt overlays that EXPLICITLY constrain the small model. ## Proposal Extend `cfg.routing` with a `system_prompts` map keyed by class: ```lua routing = { auto = true, classes = { code = "fast", default = "fast", reasoning = "cloud", }, system_prompts = { code = [[You are a code assistant. Rules: 1. Output ONLY the requested code or command. 2. No prose explanation unless explicitly asked. 3. If uncertain, output the most common correct answer. 4. Wrap shell commands in CMD: prefix. 5. Max response: 200 tokens.]], default = [[You are a shell assistant. Output shell commands as: CMD: <command> Output answers as single short sentences. Do not ask clarifying questions.]], -- reasoning routes to cloud; no override needed }, } ``` When `router.classify_model` selects a class with a `system_prompts[class]` entry, broker.lua substitutes/extends the system prompt for THAT request. The active model's regular system_prompt stays the default fallback. ## Where it lands - `router.lua` already exposes `classify_model(text, cfg)` → (model, class). Return the class so the caller knows which overlay applies. - `broker.lua` reads `opts.system_prompt_override` (set by repl.lua based on `cfg.routing.system_prompts[class]`) and replaces the system message before send. - `repl.lua` wires it at the call_broker invocation in ask_ai. - Norris path (`safety.norris_step`) can opt out (planner has its own system prompt; mixing is confusing). ## Estimate ~1 hour (small surface; mostly config schema + 5-line wiring change). User report estimates the ROI as "high" — reduces small-model output drift substantially. ## Source Architecture analysis (2026-05-16) summarizing why small local models underperform on aish-style tasks and which leverage points compensate.
claude-noether added the feature request label 2026-05-16 23:49:58 +00:00
Sign in to join this conversation.