marfrit/aish - aish - marfrit's space

Author	SHA1	Message	Date
marfrit	74e4bffb37	broker + repl + safety: GBNF grammar-sampling passthrough (closes #88 ) llama.cpp constrains the sampler to ONLY emit tokens matching a GBNF grammar. For small models this kills format drift at the token level — `CMD: <cmd>` is enforced by the sampler rather than hoped for via prompt discipline. Probe finding (this commit's pre-implementation): cloud (Anthropic via Bedrock) silently IGNORES the `grammar` field — returns normally via standard sampling. Default passthrough is safe for all routes; no per-model opt-in/opt-out needed in v1. Changes: - broker.lua build_request: `if opts.grammar then req.grammar = opts.grammar end`. Misformed grammar surfaces at request time via the existing transport-error path. - repl.lua ask_ai: `grammar_override = config.routing.grammars [req_class]` (same gating shape as #86's system_prompts override). Passed via opts.grammar in the call_broker invocation. - safety.lua is_destructive threads cfg.safety.probe_grammar through opts.grammar so llm_probe constrains the YES/NO output. Skips the regex-match dance entirely when the model can't drift. Caller-provided opts.grammar takes precedence over cfg. - config.lua gains two commented examples: * routing.grammars per class * safety.probe_grammar for the destructive probe 6 unit cases verified (stubbed curl.post_sse / broker.chat): - default: no grammar in body - opts.grammar -> body contains grammar JSON-encoded - safety probe_grammar reaches llm_probe via opts - no probe_grammar configured -> opts.grammar nil - caller opts.grammar takes precedence over cfg.safety.probe_grammar E2E against live local broker: - `routing.grammars.default = "root ::= \\"ACK\\""` configured; prompted "tell me a long story about a fox" -> model output EXACTLY "ACK" (sampler forced; would normally produce paragraphs). Grammar passthrough end-to-end confirmed. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 07:00:36 +00:00
marfrit	7ef2a6ed5c	broker: token_count + endpoint capability cache (Phase 8 commit #1 ) Foundation for Phase 8 — accurate tokenization via <endpoint>/tokenize where supported, char/4 fallback otherwise. Changes: - `M.token_count(model_cfg, text)`: Empty text -> 0. No endpoint -> char/4 immediately. Capability cache says false -> char/4. Otherwise -> POST `<endpoint>/tokenize` with `{content, model}`, 2s timeout. On 200 + parseable `{tokens=[...]}`: cache true, return #tokens. Anything else (non-200 / parse-fail / transport err / timeout): cache false, char/4. - `_tokenize_capable` cache keyed by ENDPOINT ONLY per R6 — B1 confirmed /tokenize ignores the model field, so same-endpoint presets share one cache entry. If a future broker honors the model field, revisit. - `M.tokenize_supported(model_cfg)`: returns nil/true/false for the cached state (introspection for tests + future :tokenize meta). - `M._reset_tokenize_cache()`: test hook so the session-local cache doesn't leak between test runs sharing a LuaJIT VM. Live verified against hossenfelder + a deliberately-broken endpoint: - "hello world" -> 2 tokens (matches manual curl probe) - 901-char text -> 201 real tokens vs 225 char/4 (24-token gap; real is LOWER here, opposite direction from the README probe where it was higher — confirms heuristic is inaccurate in both directions) - Pre-probe: tokenize_supported() returns nil - Post-probe: tokenize_supported() returns true (local) / false (broken) - Broken endpoint second call: still char/4, no re-probe - Empty / nil text edge cases handled Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 23:29:17 +00:00
marfrit	7364963b00	broker: usage capture + opts widening (Phase 7 commit #1 ) Foundation for Phase 7. broker.chat_stream now emits a third on_delta kind ("usage") after the stream completes successfully; broker.chat returns (text, usage). Backward-compatible — existing callers that ignore the new kind / second value continue working via Lua's drop-extra-returns semantics. Changes: - build_request widens (A3 + R3) — `(model_cfg, msgs, stream, opts)`. opts.tools / opts.max_tokens / opts.include_usage / opts.category all live inside opts now. Both internal call sites updated. - opts.include_usage defaults to true for streaming requests; sets `stream_options: { include_usage: true }` in the request body. B1: required for local llama.cpp to emit usage; cloud honors as a no-op (emits anyway). - on_event captures `doc.usage` into a closure-local `final_usage`. N1: the check is INDEPENDENT of the choice/delta branches — local emits usage on choices=[] chunks (choice nil) while cloud emits with non-empty choices + finish_reason. Both shapes funnel here. - After curl.post_sse returns successfully (NOT on transport/api errors), if final_usage is set, emit on_delta("usage", {prompt_tokens, completion_tokens, total_tokens, cost, model, category}). cost is nil for local (R6 preserves the nil vs 0 distinction the accumulator needs). model is model_cfg.model — caller-stable per B4 + R2 so call_broker's fallback retry attributes usage to the fallback's model name without wrapper-side tracking. - M.chat (R1 — BLOCKER fix): on_delta now also captures kind=="usage" alongside "text"; M.chat returns (text, usage). Without this fix 4 of 5 non-streaming categories (summarize / delegate / memory_summarize / probe) would silently report zero usage. Smoke verified against live hossenfelder:8082: - CLOUD chat -> (text, usage); cost=2.9e-05, model=anthropic/... - LOCAL chat -> (text, usage); cost=NIL (correct per R6), model=qwen-coder-7b-snappy-8k - CLOUD stream -> on_delta("usage", {...}) with category="test" echoed; model name caller-stable. Regression: test_safety 87/87, test_router_model 31/31, repl loads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-16 22:57:14 +00:00
marfrit	2abd5da3a6	safety: LLM second-opinion + session cache (Phase 3 commit #2 ) Phase 3 commit #2 per docs/PHASE3.md §12. Adds the LLM-probe gate on top of commit #1's static patterns. Together they form is_destructive. broker.lua extension: - opts.max_tokens (A2) — passed through to the request body. Phase 3 probes cap at 4 tokens for YES/NO replies. - opts.timeout_ms — overrides model_cfg.timeout_ms per-call. Probe uses 15000ms cap regardless of the model's normal timeout (the user's deep model has 1800000ms for long generations; the probe must stay snappy). - M.chat now accepts an opts table (same shape as chat_stream's). Backwards compatible — existing callers passing (cfg, msgs) unaffected. safety.lua additions: - llm_probe(cfg, system, cmd): single broker.chat call returning "YES"/"NO"/"YES_FAILSAFE"/"YES_UNPARSEABLE" — fail-safe defaults. - llm_second_opinion(cmd, cfg): two-probe protocol per R-B2. Probe 1: "Is this destructive?" — YES → flag. Probe 2 (only if probe 1 said NO): "Is this safe?" inverted question — NO → flag (disagreement = HALT). Both NO → safe. - Session-scoped cache _llm_cache keyed by normalized command (lowercased + whitespace-collapsed). Mitigates Q23 latency for repeated commands within a Norris run. - Model-selection precedence: cfg.safety.llm_model (explicit) → cfg.models.deep (independent local class) → cfg.models[default]. Fail-safe YES if none configured. - is_destructive(cmd, cfg): runs static patterns first (always), then LLM if cfg present + not explicitly opted-out. cfg=nil yields static-only mode (handy for tests). End-to-end verified against hossenfelder using qwen-coder-7b-32k as the deep probe (qwen3-30b-a3b-instruct in repo's config.lua isn't currently loaded on the local backend): cat /etc/hostname → hit=false (LLM: NO, NO inverted = safe) rm /tmp/x.log → hit=true (LLM flagged; static missed because no -r/-f flags) cp /etc/passwd /tmp/passwd.bak → hit=false (safe copy) cache: second probe on same cmd → 0s wall time static-only (cfg=nil): rm -rf /tmp/x → static hit, no LLM call opt-out (llm_second_opinion=false): cp x y → hit=false, no probe Test corpus (test_safety.lua, 87 cases) still all pass — cfg=nil preserves the static-only behavior. Note: production config.lua currently has `deep = qwen3-30b-a3b-instruct` which isn't loaded on the proxy backend right now; Norris users will hit the fail-safe (everything flagged destructive) until either the deep model is brought up OR cfg.safety.llm_model = "cloud" is set to route the probe through anthropic/claude-haiku-4.5. Update the config or model deployment for production use — covered by Phase 3 verify test case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 23:36:06 +00:00
marfrit	efdc7281c7	broker: opts.tools passthrough + streaming tool_call accumulator Phase 2 commit #5 per docs/PHASE2.md §12. Streaming broker grows tool-call support without taking a dependency on mcp.lua (caller supplies the tools array — B5 from review). chat_stream signature widens to (cfg, msgs, on_delta, opts): opts.tools - optional array, passed to the request body as the OpenAI-shape tools field. OMITTED entirely when nil or empty (#tools == 0) — some servers reject "tools": []. on_delta callback shape widens to (kind, payload): kind = "text", payload = string (Phase 1 path; unchanged semantics, signature changes from (delta) to ("text", delta)) kind = "tool_call", payload = {id, name, arguments} emitted ONCE per call on finish_reason "tool_calls" after the streaming accumulator pulls fragmented JSON-string arguments together. Accumulator behavior: - Keyed by delta.tool_calls[i].index. - If index is absent on a delta (some llama.cpp builds omit it on single-call streams; C2 in review), default to 0 with a one-shot stderr debug status per stream. - id and name captured from the opening delta of each slot. - function.arguments concatenated across all deltas as the raw JSON-string; caller (repl.lua / future Phase 2 commit #6) does dkjson.decode. - On finish_reason "tool_calls" the accumulator emits all collected calls in index order and resets. M.chat external contract unchanged (C1): wrapper now uses the new (kind, payload) shape internally but exposes the same text-string return. No caller of M.chat passes opts.tools so tool_call kinds are silently dropped. repl.lua minimal companion edit: ask_ai's chat_stream callback updated to the new shape. Text path unchanged; tool_call kinds are no-op placeholders until commit #6 lands the sub-loop. Keeps Phase 1 streaming functional between #5 and #6. Smoke-tested against hossenfelder/8082 (post-#23 fix): - text-only: ok=true, kind="text" deltas received - with opts.tools: model emitted one tool_call, accumulator collected id + name=get_weather + args={"city":"Paris"} correctly across fragmented deltas - opts.tools={}: server accepted (field omitted as required) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 14:20:32 +00:00
marfrit	e46a5c385d	broker: chat_stream over post_sse; chat is now a buffering wrapper Phase 1 streaming consumer per PHASE1.md §3. broker.chat_stream(model_cfg, messages, on_delta) -> true \| (nil, err) broker.chat(model_cfg, messages) -> content \| (nil, err) (now a thin buffer over chat_stream) The HTTP shape unifies on stream:true. on_event from ffi/curl.post_sse decodes each event's JSON, extracts choices[1].delta.content, and calls on_delta(content) for non-empty string deltas. The `[DONE]` sentinel is filtered. SSE-framed error envelopes ({"error":{"message":...}} arriving as data:) surface as "api: ..." errors. build_request is factored out so chat_stream and (future) any non-streaming consumer share URL/body/header construction. Live verification against hossenfelder fast preset: - chat_stream("Count one to five..."): 9 incremental deltas streamed token-by-token, assembled to "1 2 3 4 5" - chat("Reply with exactly: pong"): "pong" returned via buffer Error envelope path is correct by inspection but not exercised live — hossenfelder passes through bogus model names rather than rejecting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 19:16:07 +00:00
marfrit	f9f8b0370c	broker: blocking POST /v1/chat/completions via ffi/curl + dkjson Phase 0 implementation per PHASE0.md §6. M.chat(model_cfg, messages) -> content_string \| (nil, errmsg) Builds the OpenAI-compat JSON body: { model, messages, stream: false, temperature: model_cfg.temperature ?? 0.2 } Sends Content-Type and (optionally) Authorization Bearer pulled from model_cfg.key_env's process environment. Default timeout 60s; overridable per-model via model_cfg.timeout_ms. Error surfaces split: "transport: ..." curl-side (TCP/TLS/timeout) "decode: ..." non-JSON response body "api: ..." OpenAI-style { error: { message } } envelope "broker.chat: no choices[1].message.content..." shape miss Tested against four canned mock responses (nc -lN listener feeding HTTP/1.0 + Connection: close so EOF terminates the body): happy path, api error envelope, raw-text non-JSON, empty choices[]. The on-wire request body verified as well: POST path, headers, model/messages/ temperature/stream JSON. Live test against a real llama.cpp/hossenfelder endpoint deferred per issue #12 (broker endpoint configuration). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-10 14:10:00 +00:00
claude-noether	4310207738	Phase 0: scaffold tree + manifest - README, .gitignore, CLAUDE.md (project conventions) - docs/PHASE0.md — full Phase 0 manifest (locked substrate) - 10 root .lua modules + 4 ffi/ bindings, all stubs raising NotImplemented with module-scoped responsibilities matching the manifest - config.lua wired to current dirac/hossenfelder endpoints (qwen-coder-7b snappy/32k + cloud via OpenRouter through hossenfelder) File names match docs/PHASE0.md §4 exactly. Module bodies fill in across later phases; the tree shape is locked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-09 23:16:07 +00:00

8 Commits