boltzmann proxy: model field ignored — all requests routed to loaded fast model
#23
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
The boltzmann LLM proxy at
http://hossenfelder.fritz.box:8082/v1/chat/completionsechoes the loaded model name in response chunks regardless of themodelfield in the request body. Requests formistral-nemo-12b-instruct,deep,cloud, etc. all return chunks tagged"model":"qwen2.5-coder-1.5b-q4_k_m.gguf".Reproduction
Response:
Same result for
:model deepand:model cloudin aish — every request lands on the loaded fast model.Impact on aish
:model deep/:model cloudswitches do nothing functionally — every request answers from the fast model.docs/PHASE2.md) cannot verify mistral-nemo's strict chat-template behavior forrole:"tool"turns (Q18) until this is fixed. Currently parked indocs/PHASE2-baseline.md§6 row 2 anddocs/PHASE2.md§11 open-end.Where the fix lives
Not in this repo — the boltzmann proxy is the Python
BaseHTTPRequestHandlershim onhossenfelder.fritz.boxthat fronts llama.cpp + OpenRouter. Same component as #15 (SSE buffering). Filing here for aish-side traceability; the actual fix lands in the boltzmann proxy code.Likely cause
The proxy probably builds its forwarded request body from a hard-coded model name (the loaded one) rather than passing the client's
modelfield through. Or the upstream llama.cpp ignoresmodeland the proxy doesn't override it for routing decisions.Side finding context
Surfaced during Phase 2 analyze probes (2026-05-12). Reproduces every time. Not a flake.
Root cause (not the framing)
The
modelfield isn't ignored — it's looked up against each backend's/v1/models, and when no match is found, the proxy silently falls back to "first-up" backend. That's why every unknown-model request landed on the loaded fast model (coder, which is the only currently-running 1.5B-class slot; nemo isn't running).Boltzmann aggregator confirmed (
/home/mfritsche/npu/llm-proxy.py, do_POST):Same pattern in hossenfelder's
_proxy_local: unmatchedmodelfalls through to "Legacy first-up failover" instead of erroring.Fix
Both proxies patched 2026-05-12. New behavior:
modelnamed and a backend offers it → routed correctly (unchanged)modelnamed and no backend offers it → HTTP 404 with{"error":{"type":"model_not_found","message":"...","available":[...]}}modelnamed → first-up default (unchanged, for unscoped requests)Verification
Regression checks pass:
qwen3-30b-a3b-instruct(local) andanthropic/claude-haiku-4.5(cloud) both route correctly and return their actual model names in the response.aish-side implication
:model deep(alias formistral-nemo-12b-instruct) will now return a clean 404 with the available list instead of silently degrading to fast. aish can surface that to the user. Phase 2 nemo-template testing (Q18) still blocked until the nemo slot itself is brought back up — the proxy fix doesn't conjure the model into existence, just stops lying about which model served the request.Suggest closing once verified end-to-end through aish.
Patched: boltzmann
/home/mfritsche/npu/llm-proxy.py, hossenfelder/opt/llm-proxy.py. 2026-05-12.Verified 2026-05-16: the hossenfelder broker now respects the request
modelfield.Distinct providers / backends per request, plus a clean
model_not_founderror for unavailable names (rather than silent misroute). Closing as fixed broker-side.