boltzmann proxy: model field ignored — all requests routed to loaded fast model #23

Closed
opened 2026-05-12 12:55:48 +00:00 by claude-noether · 2 comments
Collaborator

What

The boltzmann LLM proxy at http://hossenfelder.fritz.box:8082/v1/chat/completions echoes the loaded model name in response chunks regardless of the model field in the request body. Requests for mistral-nemo-12b-instruct, deep, cloud, etc. all return chunks tagged "model":"qwen2.5-coder-1.5b-q4_k_m.gguf".

Reproduction

curl -sS -X POST -H 'Content-Type: application/json' \
  -d '{"model":"mistral-nemo-12b-instruct","messages":[{"role":"user","content":"hi"}],"stream":false}' \
  http://hossenfelder.fritz.box:8082/v1/chat/completions | python3 -m json.tool

Response:

{
    "choices": [...],
    "model": "qwen2.5-coder-1.5b-q4_k_m.gguf",   <-- not what was requested
    ...
}

Same result for :model deep and :model cloud in aish — every request lands on the loaded fast model.

Impact on aish

  • :model deep / :model cloud switches do nothing functionally — every request answers from the fast model.
  • Phase 2 (MCP, tracked in docs/PHASE2.md) cannot verify mistral-nemo's strict chat-template behavior for role:"tool" turns (Q18) until this is fixed. Currently parked in docs/PHASE2-baseline.md §6 row 2 and docs/PHASE2.md §11 open-end.

Where the fix lives

Not in this repo — the boltzmann proxy is the Python BaseHTTPRequestHandler shim on hossenfelder.fritz.box that fronts llama.cpp + OpenRouter. Same component as #15 (SSE buffering). Filing here for aish-side traceability; the actual fix lands in the boltzmann proxy code.

Likely cause

The proxy probably builds its forwarded request body from a hard-coded model name (the loaded one) rather than passing the client's model field through. Or the upstream llama.cpp ignores model and the proxy doesn't override it for routing decisions.

Side finding context

Surfaced during Phase 2 analyze probes (2026-05-12). Reproduces every time. Not a flake.

## What The boltzmann LLM proxy at `http://hossenfelder.fritz.box:8082/v1/chat/completions` echoes the **loaded** model name in response chunks regardless of the `model` field in the request body. Requests for `mistral-nemo-12b-instruct`, `deep`, `cloud`, etc. all return chunks tagged `"model":"qwen2.5-coder-1.5b-q4_k_m.gguf"`. ## Reproduction ```sh curl -sS -X POST -H 'Content-Type: application/json' \ -d '{"model":"mistral-nemo-12b-instruct","messages":[{"role":"user","content":"hi"}],"stream":false}' \ http://hossenfelder.fritz.box:8082/v1/chat/completions | python3 -m json.tool ``` Response: ```json { "choices": [...], "model": "qwen2.5-coder-1.5b-q4_k_m.gguf", <-- not what was requested ... } ``` Same result for `:model deep` and `:model cloud` in aish — every request lands on the loaded fast model. ## Impact on aish - `:model deep` / `:model cloud` switches do nothing functionally — every request answers from the fast model. - Phase 2 (MCP, tracked in `docs/PHASE2.md`) cannot verify mistral-nemo's strict chat-template behavior for `role:"tool"` turns (Q18) until this is fixed. Currently parked in `docs/PHASE2-baseline.md` §6 row 2 and `docs/PHASE2.md` §11 open-end. ## Where the fix lives Not in this repo — the boltzmann proxy is the Python `BaseHTTPRequestHandler` shim on `hossenfelder.fritz.box` that fronts llama.cpp + OpenRouter. Same component as #15 (SSE buffering). Filing here for aish-side traceability; the actual fix lands in the boltzmann proxy code. ## Likely cause The proxy probably builds its forwarded request body from a hard-coded model name (the loaded one) rather than passing the client's `model` field through. Or the upstream llama.cpp ignores `model` and the proxy doesn't override it for routing decisions. ## Side finding context Surfaced during Phase 2 analyze probes (2026-05-12). Reproduces every time. Not a flake.
claude-noether added the bug label 2026-05-12 12:55:48 +00:00
Author
Collaborator

Root cause (not the framing)

The model field isn't ignored — it's looked up against each backend's /v1/models, and when no match is found, the proxy silently falls back to "first-up" backend. That's why every unknown-model request landed on the loaded fast model (coder, which is the only currently-running 1.5B-class slot; nemo isn't running).

Boltzmann aggregator confirmed (/home/mfritsche/npu/llm-proxy.py, do_POST):

if model:
    target = find_backend_for_model(model)
# ...
if target is None:
    for b in BACKENDS:
        if fetch_models(b) is not None:
            target = b  # ← silent fallback
            break

Same pattern in hossenfelder's _proxy_local: unmatched model falls through to "Legacy first-up failover" instead of erroring.

Fix

Both proxies patched 2026-05-12. New behavior:

  • model named and a backend offers it → routed correctly (unchanged)
  • model named and no backend offers it → HTTP 404 with {"error":{"type":"model_not_found","message":"...","available":[...]}}
  • No model named → first-up default (unchanged, for unscoped requests)
  • All backends down → 502 (unchanged)

Verification

$ curl -sS -X POST -H 'Content-Type: application/json' \
    -d '{"model":"mistral-nemo-12b-instruct","messages":[{"role":"user","content":"hi"}],"max_tokens":5}' \
    http://hossenfelder.fritz.box:8082/v1/chat/completions
HTTP/1.1 404
{"error":{"message":"model 'mistral-nemo-12b-instruct' not available on any local backend",
          "type":"model_not_found",
          "available":["qwen2.5-coder-1.5b-q4_k_m.gguf","qwen3-30b-a3b-instruct"]}}

Regression checks pass: qwen3-30b-a3b-instruct (local) and anthropic/claude-haiku-4.5 (cloud) both route correctly and return their actual model names in the response.

aish-side implication

:model deep (alias for mistral-nemo-12b-instruct) will now return a clean 404 with the available list instead of silently degrading to fast. aish can surface that to the user. Phase 2 nemo-template testing (Q18) still blocked until the nemo slot itself is brought back up — the proxy fix doesn't conjure the model into existence, just stops lying about which model served the request.

Suggest closing once verified end-to-end through aish.


Patched: boltzmann /home/mfritsche/npu/llm-proxy.py, hossenfelder /opt/llm-proxy.py. 2026-05-12.

## Root cause (not the framing) The `model` field isn't ignored — it's looked up against each backend's `/v1/models`, and when no match is found, the proxy silently falls back to "first-up" backend. That's why every unknown-model request landed on the loaded fast model (coder, which is the only currently-running 1.5B-class slot; nemo isn't running). Boltzmann aggregator confirmed (`/home/mfritsche/npu/llm-proxy.py`, do_POST): ```python if model: target = find_backend_for_model(model) # ... if target is None: for b in BACKENDS: if fetch_models(b) is not None: target = b # ← silent fallback break ``` Same pattern in hossenfelder's `_proxy_local`: unmatched `model` falls through to "Legacy first-up failover" instead of erroring. ## Fix Both proxies patched 2026-05-12. New behavior: - `model` named and a backend offers it → routed correctly (unchanged) - `model` named and no backend offers it → **HTTP 404** with `{"error":{"type":"model_not_found","message":"...","available":[...]}}` - No `model` named → first-up default (unchanged, for unscoped requests) - All backends down → 502 (unchanged) ## Verification ```sh $ curl -sS -X POST -H 'Content-Type: application/json' \ -d '{"model":"mistral-nemo-12b-instruct","messages":[{"role":"user","content":"hi"}],"max_tokens":5}' \ http://hossenfelder.fritz.box:8082/v1/chat/completions HTTP/1.1 404 {"error":{"message":"model 'mistral-nemo-12b-instruct' not available on any local backend", "type":"model_not_found", "available":["qwen2.5-coder-1.5b-q4_k_m.gguf","qwen3-30b-a3b-instruct"]}} ``` Regression checks pass: `qwen3-30b-a3b-instruct` (local) and `anthropic/claude-haiku-4.5` (cloud) both route correctly and return their actual model names in the response. ## aish-side implication `:model deep` (alias for `mistral-nemo-12b-instruct`) will now return a clean 404 with the available list instead of silently degrading to fast. aish can surface that to the user. Phase 2 nemo-template testing (Q18) still blocked until the nemo slot itself is brought back up — the proxy fix doesn't conjure the model into existence, just stops lying about which model served the request. Suggest closing once verified end-to-end through aish. --- *Patched: boltzmann `/home/mfritsche/npu/llm-proxy.py`, hossenfelder `/opt/llm-proxy.py`. 2026-05-12.*
Author
Collaborator

Verified 2026-05-16: the hossenfelder broker now respects the request model field.

$ curl -d '{"model":"anthropic/claude-haiku-4.5","messages":[{"role":"user","content":"reply with the literal word: anthropic"}],"max_tokens":10,"stream":false}' http://hossenfelder.fritz.box:8082/v1/chat/completions

{"model":"anthropic/claude-4.5-haiku-20251001","provider":"Amazon Bedrock", ... "content":"anthropic" ...}

$ curl -d '{"model":"deepseek/deepseek-v3.2", ...}' ...
{"model":"deepseek/deepseek-v3.2-20251201","provider":"Baidu", ... "content":"deepseek" ...}

$ curl -d '{"model":"mistral-nemo-12b-instruct", ...}' ...
{"error": {"message": "model 'mistral-nemo-12b-instruct' not available on any local backend", "type": "model_not_found", "available": [...]}}

Distinct providers / backends per request, plus a clean model_not_found error for unavailable names (rather than silent misroute). Closing as fixed broker-side.

Verified 2026-05-16: the hossenfelder broker now respects the request `model` field. ``` $ curl -d '{"model":"anthropic/claude-haiku-4.5","messages":[{"role":"user","content":"reply with the literal word: anthropic"}],"max_tokens":10,"stream":false}' http://hossenfelder.fritz.box:8082/v1/chat/completions {"model":"anthropic/claude-4.5-haiku-20251001","provider":"Amazon Bedrock", ... "content":"anthropic" ...} $ curl -d '{"model":"deepseek/deepseek-v3.2", ...}' ... {"model":"deepseek/deepseek-v3.2-20251201","provider":"Baidu", ... "content":"deepseek" ...} $ curl -d '{"model":"mistral-nemo-12b-instruct", ...}' ... {"error": {"message": "model 'mistral-nemo-12b-instruct' not available on any local backend", "type": "model_not_found", "available": [...]}} ``` Distinct providers / backends per request, plus a clean `model_not_found` error for unavailable names (rather than silent misroute). Closing as fixed broker-side.
Sign in to join this conversation.