Concurrent handler dispatch (follow-up to #16) #20

Closed
opened 2026-05-17 16:54:30 +00:00 by claude-noether · 3 comments
Collaborator

Follow-up to #16. The Streamable HTTP rewrite delivered I/O concurrency (persistent SSE, sessions, multiplexed slow clients) but NOT handler concurrency. A synchronous tool handler (e.g. shell running sleep 3) blocks the single-threaded Lua event loop for its full duration; concurrent fast POSTs wait behind it.

Phase 7 of #16 measured this:

fast (ping): 4.3s   (target ~0s)
slow (sleep 3): 4.5s

Why now is the right time

Without this, the otherwise-good transport rewrite has a sharp edge: clients may open multiple connections expecting parallel work but get strict serialisation. A user with a long web_search running can't even ping the server.

Options (all substantial)

  1. Worker subprocesses. Each tools/call forks a child; main loop awaits SIGCHLD via signal-fd / pipe. Heavy on Linux, awkward on Windows. Best isolation.
  2. Coroutine-yielding handlers. Wrap each handler in a coroutine; every blocking call (io.popen, the run() polling helper in server.lua, every curl shell-out) becomes a yield point. Invasive — every existing tool handler in server.lua needs auditing.
  3. Non-blocking run() helper. Replace the file-based polling in server.lua:74-138 with a state machine that yields between sentinel checks. Doesn't require touching individual handlers, but os.execute is still synchronous.

Recommendation

Option 3 + option 2 hybrid for tools that already use run(). Specifically:

  • Refactor run() to return a coroutine-yieldable poller
  • Wrap each tools/call dispatch in coroutine.wrap from the event loop side
  • Tools that don't use run() (pure-Lua handlers) continue to work synchronously — they're fast anyway

Estimated 1-2 days. Should land BEFORE issues #9 (sampling) and #10 (roots) so server-initiated requests can flow during a long tool call without the event loop being frozen.

Scope

  • Refactor server.lua run() helper to a coroutine API
  • Event-loop integration in lmcp:run()
  • Audit pure-Lua handlers for any incidental long-running paths
  • Verify the original T2 from #16 Phase 7 (fast POST during slow shell)

Priority

High. Blocks the practical value of #9, #10. Also a UX paper cut for any user running shell sleep N (test scenario but representative).

**Follow-up to #16.** The Streamable HTTP rewrite delivered I/O concurrency (persistent SSE, sessions, multiplexed slow clients) but NOT handler concurrency. A synchronous tool handler (e.g. `shell` running `sleep 3`) blocks the single-threaded Lua event loop for its full duration; concurrent fast POSTs wait behind it. Phase 7 of #16 measured this: ``` fast (ping): 4.3s (target ~0s) slow (sleep 3): 4.5s ``` ## Why now is the right time Without this, the otherwise-good transport rewrite has a sharp edge: clients may open multiple connections expecting parallel work but get strict serialisation. A user with a long `web_search` running can't even `ping` the server. ## Options (all substantial) 1. **Worker subprocesses.** Each `tools/call` forks a child; main loop awaits SIGCHLD via `signal-fd` / pipe. Heavy on Linux, awkward on Windows. Best isolation. 2. **Coroutine-yielding handlers.** Wrap each handler in a coroutine; every blocking call (`io.popen`, the `run()` polling helper in server.lua, every curl shell-out) becomes a yield point. Invasive — every existing tool handler in server.lua needs auditing. 3. **Non-blocking run() helper.** Replace the file-based polling in `server.lua:74-138` with a state machine that yields between sentinel checks. Doesn't require touching individual handlers, but `os.execute` is still synchronous. ## Recommendation Option 3 + option 2 hybrid for tools that already use `run()`. Specifically: - Refactor `run()` to return a coroutine-yieldable poller - Wrap each `tools/call` dispatch in `coroutine.wrap` from the event loop side - Tools that don't use `run()` (pure-Lua handlers) continue to work synchronously — they're fast anyway Estimated 1-2 days. Should land BEFORE issues #9 (sampling) and #10 (roots) so server-initiated requests can flow during a long tool call without the event loop being frozen. ## Scope - Refactor `server.lua` `run()` helper to a coroutine API - Event-loop integration in `lmcp:run()` - Audit pure-Lua handlers for any incidental long-running paths - Verify the original T2 from #16 Phase 7 (fast POST during slow shell) ## Priority **High.** Blocks the practical value of #9, #10. Also a UX paper cut for any user running `shell sleep N` (test scenario but representative).
Owner

Make this v1.1.0

Make this v1.1.0
Author
Collaborator

Acked: v1.1.0. Recommendation from my own issue body still stands — option 3+2 hybrid (refactor server.lua run() helper to a coroutine-yieldable poller; tools that already use run() get concurrency for free; pure-Lua handlers remain synchronous). Will block #11 progress/cancellation usefulness on this one. Awaiting v1.1.0 milestone assignment from a repo-write account.

Acked: v1.1.0. Recommendation from my own issue body still stands — option 3+2 hybrid (refactor server.lua run() helper to a coroutine-yieldable poller; tools that already use run() get concurrency for free; pure-Lua handlers remain synchronous). Will block #11 progress/cancellation usefulness on this one. Awaiting v1.1.0 milestone assignment from a repo-write account.
marfrit added this to the v1.1.0 milestone 2026-05-17 18:52:40 +00:00
Author
Collaborator

Implemented (commit 2ac502e, on master).

How it works:

  • Tool handlers run inside Lua coroutines on the HTTP transport. The select()-based event loop already from v1.0.0-rc1 multiplexes I/O; this change extends the same cooperative scheduling to tool execution.
  • server.lua:sleep_ms checks coroutine.running(); inside an lmcp coroutine it yields { wake_at = gettime() + ms/1000 }. On the main thread it falls back to today's blocking sleep.
  • server.lua:run() (the shell-out polling helper every non-pure-Lua tool goes through) now yields automatically via sleep_ms. Zero handler source-code changes.
  • _dispatch_post wraps handle_request in coroutine.create. Synchronous completion takes the inline-response path; yields park the coroutine in self._pending_handlers and the conn enters dispatching_async.
  • _scheduler_tick (new) services pending coroutines whose wake_at has passed; _finalise_dispatch (extracted helper) builds the deferred response with Accept-awareness preserved.
  • select() timeout tightens to the earliest pending wake_at.

Measurement (Phase 7):

BEFORE: fast ping during slow shell sleep 3 = 4.28s
AFTER:  fast ping during slow shell sleep 3 = 0.01s    (~400× faster)
3 parallel slow shells: 3.77s total wall (was ~9s serialised)

Full regression suite passes: ping (#19), tools/list pagination+annotations (#12/#14), fetch structuredContent (#13), initialize Mcp-Session-Id (#16), stdio (#15), SSE-on-POST response shape.

Phase 5 review fixes folded in:

  1. socket.gettime() is wall-clock; documented in _scheduler_tick. Acceptable on chrony-slewed fleet.
  2. run() uses gettime() deltas, not accumulated interval counter — matches wall-clock under scheduler delays.
  3. _finalise_dispatch re-reads conn.headers["accept"] so SSE-on-POST shape survives parking.

Known limits (filed in memory project_handler_coroutines):

  • stdio transport is intentionally serialised (single-client per process; no concurrent stdio use case exists).
  • Cancellation (#11) is now tractable — scheduler can flip a flag between resumes. Implementation pending.
  • Server-initiated request await (sampling/roots from inside a handler) still requires a future yield-on-pending helper.

Unblocks the practical usefulness of #11. v1.1.0 release tag waits for #11 + #18 to also land.

Implemented (commit `2ac502e`, on master). **How it works:** - Tool handlers run inside Lua coroutines on the HTTP transport. The select()-based event loop already from v1.0.0-rc1 multiplexes I/O; this change extends the same cooperative scheduling to tool execution. - `server.lua:sleep_ms` checks `coroutine.running()`; inside an lmcp coroutine it yields `{ wake_at = gettime() + ms/1000 }`. On the main thread it falls back to today's blocking sleep. - `server.lua:run()` (the shell-out polling helper every non-pure-Lua tool goes through) now yields automatically via `sleep_ms`. **Zero handler source-code changes.** - `_dispatch_post` wraps `handle_request` in `coroutine.create`. Synchronous completion takes the inline-response path; yields park the coroutine in `self._pending_handlers` and the conn enters `dispatching_async`. - `_scheduler_tick` (new) services pending coroutines whose wake_at has passed; `_finalise_dispatch` (extracted helper) builds the deferred response with Accept-awareness preserved. - `select()` timeout tightens to the earliest pending wake_at. **Measurement (Phase 7):** ``` BEFORE: fast ping during slow shell sleep 3 = 4.28s AFTER: fast ping during slow shell sleep 3 = 0.01s (~400× faster) 3 parallel slow shells: 3.77s total wall (was ~9s serialised) ``` Full regression suite passes: ping (#19), tools/list pagination+annotations (#12/#14), fetch structuredContent (#13), initialize Mcp-Session-Id (#16), stdio (#15), SSE-on-POST response shape. **Phase 5 review fixes folded in:** 1. `socket.gettime()` is wall-clock; documented in `_scheduler_tick`. Acceptable on chrony-slewed fleet. 2. `run()` uses gettime() deltas, not accumulated `interval` counter — matches wall-clock under scheduler delays. 3. `_finalise_dispatch` re-reads `conn.headers["accept"]` so SSE-on-POST shape survives parking. **Known limits (filed in memory `project_handler_coroutines`):** - stdio transport is intentionally serialised (single-client per process; no concurrent stdio use case exists). - Cancellation (#11) is now tractable — scheduler can flip a flag between resumes. Implementation pending. - Server-initiated request `await` (sampling/roots from inside a handler) still requires a future yield-on-pending helper. Unblocks the practical usefulness of #11. v1.1.0 release tag waits for #11 + #18 to also land.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/lmcp#20