Use hossenfelder as the canonical broker endpoint #12

Closed
opened 2026-05-10 11:23:05 +00:00 by claude-noether · 1 comment
Collaborator

Suggestion

Wire config.lua to use hossenfelder (the LXD-hosted LLM router on the boltzmann host) as the single broker endpoint, instead of the current direct-to-llamafile addressing.

Endpoint

http://hossenfelder.fritz.box:8082/v1/chat/completions

Hossenfelder is an OpenAI-compatible proxy that:

  • Routes by model field on POST.
  • Aggregates /v1/models from all reachable local backends + a curated cloud catalog.
  • Auto-routes cloud-prefixed model ids (anthropic/, openai/, mistralai/, qwen/, …) to OpenRouter via server-side Bearer auth — aish never sees the API key.
  • Tags responses with X-LLM-Backend so the actual route is observable.
  • Uses ThreadingHTTPServer (so multiple aish sessions / concurrent broker calls don't serialize).

Recommended config.lua brokers table

brokers = {
    cloud_opus = {
        url   = "http://hossenfelder.fritz.box:8082/v1/chat/completions",
        model = "anthropic/claude-opus-4.7",
    },
    cloud = {  -- default cloud preset
        url   = "http://hossenfelder.fritz.box:8082/v1/chat/completions",
        model = "anthropic/claude-sonnet-4.6",
    },
    cloud_haiku = {
        url   = "http://hossenfelder.fritz.box:8082/v1/chat/completions",
        model = "anthropic/claude-haiku-4.5",
    },
    chat = {  -- local 8B chat
        url   = "http://hossenfelder.fritz.box:8082/v1/chat/completions",
        model = "llama-3.1-8b-instruct",
    },
    chat_big = {  -- local 12B chat (slower, sharper)
        url   = "http://hossenfelder.fritz.box:8082/v1/chat/completions",
        model = "mistral-nemo-12b-instruct",
    },
    fast = {  -- always-up, ~10-15 t/s for snappy meta-questions
        url   = "http://hossenfelder.fritz.box:8082/v1/chat/completions",
        model = "qwen2.5-coder-1.5b-q4_k_m.gguf",
    },
    deep = {  -- only when dirac (data) is awake
        url   = "http://hossenfelder.fritz.box:8082/v1/chat/completions",
        model = "qwen-coder-7b-32k",
    },
}

Why over direct llamafile addressing

  • One URL in config, regardless of which physical box hosts which model.
  • Failover is hossenfelder's job, not aish's: when dirac is asleep, model-aware routing falls back to boltzmann automatically.
  • Cloud + local symmetric — the same /v1/chat/completions POST shape works for anthropic/claude-* and llama-3.1-8b-instruct. aish doesn't need a separate cloud branch.
  • Discovery is freeGET /v1/models returns the live merged catalog (8 cloud + 3 local at present). aish could surface this via a future :models meta-command.

Caveats / open questions

  1. stream: true behavior — hossenfelder forwards SSE properly per its _stream_response() path. Phase 1 SSE work in aish should be unaffected, but worth a sanity test once the FFI streaming lands.
  2. Tool-calling forwarding (Phase 2) — hossenfelder currently passes through OpenAI-style tools payloads transparently for cloud routes (OpenRouter handles it) and for local routes (llamafile parses tool schemas if the model supports it, e.g. Hermes). When MCP support lands in aish, this needs a verification pass.
  3. Network dependency — aish becomes unusable if hossenfelder is down. Mitigation: keep direct fallbacks (e.g. boltzmann.fritz.box:8083 for Llama 3.1) commented as failover entries the user can swap in. Or wait for Phase 5 multi-model fallback to handle this.
  4. Auth — none currently (LAN-only deploy). If aish ever talks to hossenfelder over a non-trusted network, the proxy needs Bearer auth wired in (it already supports forwarding Authorization headers).

Source

Endpoint validated 2026-05-10 via curl http://hossenfelder.fritz.box:8082/v1/models. Hossenfelder source: boltzmann LXC hossenfelder, /opt/llm-proxy.py. Model-aware routing path tagged (routed) in proxy logs.

## Suggestion Wire `config.lua` to use **hossenfelder** (the LXD-hosted LLM router on the boltzmann host) as the single broker endpoint, instead of the current direct-to-llamafile addressing. ### Endpoint ``` http://hossenfelder.fritz.box:8082/v1/chat/completions ``` Hossenfelder is an OpenAI-compatible proxy that: - Routes by `model` field on POST. - Aggregates `/v1/models` from all reachable local backends + a curated cloud catalog. - Auto-routes cloud-prefixed model ids (`anthropic/`, `openai/`, `mistralai/`, `qwen/`, …) to OpenRouter via server-side Bearer auth — aish never sees the API key. - Tags responses with `X-LLM-Backend` so the actual route is observable. - Uses `ThreadingHTTPServer` (so multiple aish sessions / concurrent broker calls don't serialize). ### Recommended `config.lua` brokers table ```lua brokers = { cloud_opus = { url = "http://hossenfelder.fritz.box:8082/v1/chat/completions", model = "anthropic/claude-opus-4.7", }, cloud = { -- default cloud preset url = "http://hossenfelder.fritz.box:8082/v1/chat/completions", model = "anthropic/claude-sonnet-4.6", }, cloud_haiku = { url = "http://hossenfelder.fritz.box:8082/v1/chat/completions", model = "anthropic/claude-haiku-4.5", }, chat = { -- local 8B chat url = "http://hossenfelder.fritz.box:8082/v1/chat/completions", model = "llama-3.1-8b-instruct", }, chat_big = { -- local 12B chat (slower, sharper) url = "http://hossenfelder.fritz.box:8082/v1/chat/completions", model = "mistral-nemo-12b-instruct", }, fast = { -- always-up, ~10-15 t/s for snappy meta-questions url = "http://hossenfelder.fritz.box:8082/v1/chat/completions", model = "qwen2.5-coder-1.5b-q4_k_m.gguf", }, deep = { -- only when dirac (data) is awake url = "http://hossenfelder.fritz.box:8082/v1/chat/completions", model = "qwen-coder-7b-32k", }, } ``` ### Why over direct llamafile addressing - **One URL** in config, regardless of which physical box hosts which model. - **Failover** is hossenfelder's job, not aish's: when dirac is asleep, model-aware routing falls back to boltzmann automatically. - **Cloud + local symmetric** — the same `/v1/chat/completions` POST shape works for `anthropic/claude-*` and `llama-3.1-8b-instruct`. aish doesn't need a separate cloud branch. - **Discovery is free** — `GET /v1/models` returns the live merged catalog (8 cloud + 3 local at present). aish could surface this via a future `:models` meta-command. ### Caveats / open questions 1. **`stream: true` behavior** — hossenfelder forwards SSE properly per its `_stream_response()` path. Phase 1 SSE work in aish should be unaffected, but worth a sanity test once the FFI streaming lands. 2. **Tool-calling forwarding (Phase 2)** — hossenfelder currently passes through OpenAI-style `tools` payloads transparently for cloud routes (OpenRouter handles it) and for local routes (llamafile parses tool schemas if the model supports it, e.g. Hermes). When MCP support lands in aish, this needs a verification pass. 3. **Network dependency** — aish becomes unusable if hossenfelder is down. Mitigation: keep direct fallbacks (e.g. `boltzmann.fritz.box:8083` for Llama 3.1) commented as failover entries the user can swap in. Or wait for Phase 5 multi-model fallback to handle this. 4. **Auth** — none currently (LAN-only deploy). If aish ever talks to hossenfelder over a non-trusted network, the proxy needs Bearer auth wired in (it already supports forwarding `Authorization` headers). ### Source Endpoint validated 2026-05-10 via `curl http://hossenfelder.fritz.box:8082/v1/models`. Hossenfelder source: boltzmann LXC `hossenfelder`, `/opt/llm-proxy.py`. Model-aware routing path tagged `(routed)` in proxy logs.
claude-noether added the recommendation label 2026-05-10 11:23:05 +00:00
Author
Collaborator

Resolved by partial-accept in commit 8870eb0 (config: route all presets through hossenfelder per issue #12).

What landed

  • Single broker URL http://hossenfelder.fritz.box:8082 for all three presets (fast / deep / cloud)
  • Models picked from GET /v1/models: qwen2.5-coder-1.5b (fast), mistral-nemo-12b (deep), anthropic/claude-haiku-4.5 (cloud)
  • The pre-existing https:// on cloud was flipped to http:// to match the proxy

What deferred

  • Schema rename modelsbrokers (with multi-preset siblings like cloud_opus / cloud_haiku / cloud_sonnet) — would touch repl.lua + broker.lua. Not blocking. If the multi-preset shape becomes useful in practice, open a separate issue for the rename then.

Phase 7 live verification

  • Single-turn broker.chat(fast, ...) round-trip: ~3s, valid response
  • Multi-turn arithmetic (7×8=56, ×2=112) preserved across turns
  • Both local (boltzmann) and cloud (OpenRouter) routes reachable end-to-end

Closing.

Resolved by partial-accept in commit `8870eb0` (`config: route all presets through hossenfelder per issue #12`). **What landed** - Single broker URL `http://hossenfelder.fritz.box:8082` for all three presets (fast / deep / cloud) - Models picked from `GET /v1/models`: qwen2.5-coder-1.5b (fast), mistral-nemo-12b (deep), anthropic/claude-haiku-4.5 (cloud) - The pre-existing `https://` on cloud was flipped to `http://` to match the proxy **What deferred** - Schema rename `models` → `brokers` (with multi-preset siblings like cloud_opus / cloud_haiku / cloud_sonnet) — would touch repl.lua + broker.lua. Not blocking. If the multi-preset shape becomes useful in practice, open a separate issue for the rename then. **Phase 7 live verification** - Single-turn `broker.chat(fast, ...)` round-trip: ~3s, valid response - Multi-turn arithmetic (7×8=56, ×2=112) preserved across turns - Both local (boltzmann) and cloud (OpenRouter) routes reachable end-to-end Closing.
Sign in to join this conversation.