test-case: streaming visibility — tokens arrive incrementally #15
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Steps
cd ~/src/aish && luajit main.lua[aish:fast]>, paste:Expected
[aish:fast]>prompt appears immediately after the final number.What this exercises
Phase 1 §1 done-criteria #1 (streaming).
broker.chat_stream→renderer.assistant_delta→ live stdout.First invocation, fast model:
aish: loaded config from ./config.lua
[aish:fast]> :models
[aish] models (active: fast):
deep mistral-nemo-12b-instruct @ http://hossenfelder.fritz.box:8082
cloud anthropic/claude-haiku-4.5 @ http://hossenfelder.fritz.box:8082
[aish:fast]> Count from one to fifty, space-separated, no other words.
CMD: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
execute '1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50'? [y/N] n
deep model presented the thinking too - but all at once.
Triage: this is an upstream proxy bug, not an aish bug.
Wire-level evidence
Running the same request against the proxy directly with timestamped stdin, every SSE event arrives in a single batch:
All events stamped t=8.551s on the receiving side, but the
createdtimestamps inside the JSON span 1778535796..1778535804 — so the upstream llama.cpp did generate incrementally over ~8 s; the proxy held all of it before flushing.Response headers
Content-Lengthon atext/event-streambody is the smoking gun: Python'sBaseHTTPRequestHandleris collecting the entire upstream stream into a buffer, then sending response with a fixed Content-Length. SSE is supposed to be chunked transfer-encoding or close-on-EOF with progressivewfile.write()+flush().aish-side verification
ffi/curl.post_sseparses SSE events as they arrive viaWRITEFUNCTION— confirmed correct (fast model with the same proxy round-trip would have looked the same if there were a client buffer).broker.chat_streamdecodes per-event and invokeson_deltaimmediately.renderer.assistant_deltacallsio.write()+io.flush()per delta.Nothing on the aish side accumulates.
Side finding (separate)
The response chunks come back with
"model":"qwen2.5-coder-1.5b-q4_k_m.gguf"even when the request specifiedmistral-nemo-12b-instruct. Either the proxy is routing all traffic to the loaded model regardless ofmodelfield, or only one model was loaded at probe time. Worth a separate issue against the boltzmann proxy.Resolution
Fix lives in the boltzmann proxy on hossenfelder (Python
BaseHTTPRequestHandlerSSE handler — switch to chunked transfer-encoding, dropContent-Length, flush after everydata:\\n\\nwrite). Not actionable in this repo.Proposing this issue be re-tagged from
test-casetobugand either (a) re-scoped as a tracking issue for the proxy fix landing, or (b) closed here and re-filed against the boltzmann repo. Your call.Now seeing streamed responses.