GBNF grammar-sampling passthrough for llama.cpp routes (enforce CMD: / YES/NO output) #88
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Motivation
llama.cpp's chat-completions endpoint accepts a
grammarfield (GBNF) that constrains the sampler to only emit conforming tokens. For small models this eliminates format-drift entirely —CMD: <command>is enforced at the token level rather than via fragile prompt-discipline.OpenAI / OpenRouter routes don't support this field; the parameter is harmless on cloud (servers either ignore or reject — handle gracefully).
Proposal
broker.lua build_requestwidens to acceptopts.grammar:Per-class grammars in cfg.routing:
When ask_ai (or safety probes) hit a class with a grammar entry, opts.grammar is set to that GBNF string. Grammar transparently constrains the sampler on the broker side.
Risks
supports_grammar = true|falseopt-in OR the grammar is set only when class matchesrouting.local_grammar_classes. v1 = opt-in per class to be safe.Estimate
~2 hours. Mostly broker.lua passthrough + config schema + safety.is_destructive could opt in (the LLM probe with
routing.grammars.probe = "YES|NO"becomes much more reliable on small models).Reference
llama.cpp GBNF docs.
Source
Architecture analysis (2026-05-16). Listed as "high" gain for CMD:-output reliability.