v1.1.0/#11: progress + cancellation notifications

ctx augmentation:
- ctx.progress(p, total?, message?) emits notifications/progress on
  the session's notify_q. No-op when the original request omitted
  _meta.progressToken (per spec: only emit when client opted in).
  Type-checks numeric args; passes progressToken through unchanged
  (spec allows number OR string keys).
- ctx.cancelled() returns true once the client has sent a
  notifications/cancelled for this request's id.

handle_request:
- New side-effect in the id==nil branch: notifications/cancelled
  scans the module-level _ctx_by_co for an in-flight ctx whose
  request_id matches; flips self._cancelled_ids[rid_str] only when
  found. Unknown rids drop silently (no map growth).
- Pre-handler short-circuit: if cancel arrived before dispatch
  reached tools/call, skip the handler entirely.

Cross-module ctx lookup:
- Module-level weak _ctx_by_co table in lmcp.lua keyed by
  coroutine. lmcp.current_ctx() returns the ctx of the running
  coroutine. server.lua's run() lazy-requires lmcp and uses it
  to opt into auto-cancellation without depending on lmcp internals.

server.lua:run():
- After each sleep_ms cycle, check ctx.cancelled(); exit poll loop
  with cancelled=true if set.
- Poll interval capped at 500ms when a ctx is present so worst-case
  cancel latency stays ≤500ms (vs. 2s default growth).
- Returns "(cancelled)" sentinel; handler propagates normally.

_finalise_dispatch:
- Single cleanup site for both _cancelled_ids and _ctx_by_co (per
  Phase 5 review).
- When was_cancelled: emit JSON-RPC -32800 "Request cancelled"
  (deviation from Phase 4 plan; documented).

Phase 4 deviation explained: plan was silent TCP close (per spec
"SHOULD NOT respond"). Empirically: os.execute's fork+exec
inherits the parent's TCP socket FD into the spawned shell, so
sock:close() doesn't actually deliver FIN until the subshell exits
(i.e. the long-running command completes anyway). Verified
luasocket close() works on bare sockets (curl exits with RST in
511ms). The fix would be FD_CLOEXEC on accepted sockets, which
luasocket doesn't expose — needs a C shim or luaposix. Deferred.
Captured in memory project_fd_inheritance_in_run.

Practical UX with the deviation: client receives a structured
-32800 error within ~420ms of POSTing the cancel notification.

Measurements (Phase 7):
  cancel timing (3 runs, sleep 10 with cancel at 0.4s):
    run 1: t=0.42s code=-32800
    run 2: t=0.42s code=-32800
    run 3: t=0.42s code=-32800
  progress: 3/3 events arrived on SSE; spec-shaped payload
  concurrent fast+slow (#20 regression): unchanged (fast 0.01s)
  all previously-closed issues regression-test green

Zero handler source-code changes. Existing tools (shell, fetch,
web_search, hub remote_*) get cancellation for free via run().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-17 19:29:00 +00:00
parent 2ac502e50f
commit 55ead8041f
2 changed files with 181 additions and 14 deletions
+31
View File
@@ -44,6 +44,22 @@ local function gettime()
return _socket.gettime()
end
-- Lazy access to the lmcp module for cross-module ctx lookup (issue #11).
-- server.lua doesn't statically require lmcp (it's an example/runtime
-- server, not the library); but lmcp must already be loaded when we run.
-- Defensive: if the lookup fails for any reason, current_ctx returns nil
-- and run() falls back to non-cancellable behaviour.
local _lmcp_mod = nil
local function current_ctx()
if _lmcp_mod == false then return nil end
if _lmcp_mod == nil then
local ok, mod = pcall(require, "lmcp")
_lmcp_mod = ok and mod or false
if _lmcp_mod == false then return nil end
end
return _lmcp_mod.current_ctx and _lmcp_mod.current_ctx() or nil
end
-- in_coroutine() — true if we're running inside an lmcp dispatch
-- coroutine (issue #20). Handles both Lua 5.4 (coroutine.running →
-- (co, isMain)) and LuaJIT 5.1 (coroutine.running → nil on main).
@@ -111,13 +127,26 @@ local function run(cmd, timeout_sec)
-- may delay our resume by more than `interval`, so an accumulator
-- diverges from real elapsed. gettime() comparison stays honest in
-- both busy-poll and yield-resume modes.
--
-- Auto-cancellation (issue #11): if a ctx is available on the
-- running coroutine AND it has been cancelled, exit the polling
-- loop early. The interval is capped at 500ms when a ctx is
-- present so worst-case cancel latency is ~0.5s, not ~2s.
local started = gettime()
local cancelled = false
local function poll_loop()
local interval = WINDOWS and 100 or 50 -- ms
while gettime() - started < timeout_sec do
if file_exists(done_file) then return true end
local ctx = current_ctx()
if ctx and ctx.cancelled and ctx.cancelled() then
cancelled = true
return false
end
sleep_ms(interval)
if interval < 2000 then interval = math.floor(interval * 1.5) end
-- When cancellable, cap so we can respond to cancel quickly.
if ctx and interval > 500 then interval = 500 end
end
return false
end
@@ -140,6 +169,7 @@ local function run(cmd, timeout_sec)
remove_silent(done_file)
if not completed then
if cancelled then return "(cancelled)" end
return output or ("Error: command timed out after " .. timeout_sec .. "s")
end
return output and output ~= "" and output or "(no output)"
@@ -158,6 +188,7 @@ local function run(cmd, timeout_sec)
remove_silent(done_file)
if not completed then
if cancelled then return "(cancelled)" end
return output or ("Error: command timed out after " .. timeout_sec .. "s")
end
return output and output ~= "" and output or "(no output)"