GBNF grammar-sampling passthrough for llama.cpp routes (enforce CMD: / YES/NO output) #88

Closed
opened 2026-05-16 23:49:59 +00:00 by claude-noether · 0 comments
Collaborator

Motivation

llama.cpp's chat-completions endpoint accepts a grammar field (GBNF) that constrains the sampler to only emit conforming tokens. For small models this eliminates format-drift entirely — CMD: <command> is enforced at the token level rather than via fragile prompt-discipline.

OpenAI / OpenRouter routes don't support this field; the parameter is harmless on cloud (servers either ignore or reject — handle gracefully).

Proposal

broker.lua build_request widens to accept opts.grammar:

if opts.grammar then req.grammar = opts.grammar end

Per-class grammars in cfg.routing:

routing = {
    grammars = {
        code = [[root ::= "CMD: " [^\n]+ "\n"]],
        default = [[root ::= ("CMD: " [^\n]+ "\n") | [^\n]+ "\n"]],
        probe = [[root ::= ("YES" | "NO") "\n"]],
    },
}

When ask_ai (or safety probes) hit a class with a grammar entry, opts.grammar is set to that GBNF string. Grammar transparently constrains the sampler on the broker side.

Risks

  • Cloud routes that REJECT unknown fields would 4xx — the model_cfg should carry a supports_grammar = true|false opt-in OR the grammar is set only when class matches routing.local_grammar_classes. v1 = opt-in per class to be safe.
  • A wrong/syntactically-broken grammar produces broker errors at request time; surface clearly.
  • Grammars are model-tokenizer-specific in some edge cases (rare for GBNF; mostly portable).

Estimate

~2 hours. Mostly broker.lua passthrough + config schema + safety.is_destructive could opt in (the LLM probe with routing.grammars.probe = "YES|NO" becomes much more reliable on small models).

Reference

llama.cpp GBNF docs.

Source

Architecture analysis (2026-05-16). Listed as "high" gain for CMD:-output reliability.

## Motivation llama.cpp's chat-completions endpoint accepts a `grammar` field (GBNF) that constrains the sampler to only emit conforming tokens. For small models this eliminates format-drift entirely — `CMD: <command>` is enforced at the token level rather than via fragile prompt-discipline. OpenAI / OpenRouter routes don't support this field; the parameter is harmless on cloud (servers either ignore or reject — handle gracefully). ## Proposal `broker.lua build_request` widens to accept `opts.grammar`: ```lua if opts.grammar then req.grammar = opts.grammar end ``` Per-class grammars in cfg.routing: ```lua routing = { grammars = { code = [[root ::= "CMD: " [^\n]+ "\n"]], default = [[root ::= ("CMD: " [^\n]+ "\n") | [^\n]+ "\n"]], probe = [[root ::= ("YES" | "NO") "\n"]], }, } ``` When ask_ai (or safety probes) hit a class with a grammar entry, opts.grammar is set to that GBNF string. Grammar transparently constrains the sampler on the broker side. ## Risks - Cloud routes that REJECT unknown fields would 4xx — the model_cfg should carry a `supports_grammar = true|false` opt-in OR the grammar is set only when class matches `routing.local_grammar_classes`. v1 = opt-in per class to be safe. - A wrong/syntactically-broken grammar produces broker errors at request time; surface clearly. - Grammars are model-tokenizer-specific in some edge cases (rare for GBNF; mostly portable). ## Estimate ~2 hours. Mostly broker.lua passthrough + config schema + safety.is_destructive could opt in (the LLM probe with `routing.grammars.probe = "YES|NO"` becomes much more reliable on small models). ## Reference [llama.cpp GBNF docs](https://github.com/ggml-org/llama.cpp/tree/master/grammars). ## Source Architecture analysis (2026-05-16). Listed as "high" gain for CMD:-output reliability.
claude-noether added the feature request label 2026-05-16 23:49:59 +00:00
Sign in to join this conversation.