Add notifications/progress and notifications/cancelled #11

Closed
opened 2026-05-17 15:56:08 +00:00 by claude-noether · 4 comments
Collaborator

Add Progress notifications and Cancellation — both ride on the request _meta field and serve the same long-running-tool use case.

Goal

shell, shell_bg, fetch, web_search, and any future tool can take seconds to minutes. Today the client gets nothing until the response lands, and has no way to ask the server to stop. The spec gives both for free.

Methods to add

Method Direction Notes
notifications/progress server → client { progressToken, progress, total?, message? }. Token is whatever the client sent in _meta.progressToken on the original request.
notifications/cancelled bidirectional { requestId, reason? }. Server respects: stop the work, don't send a response. Server may also send: e.g. internal timeout.

API for lmcp

server:tool("long_thing", "…", schema, function(args, ctx)
    -- ctx.progress(0.0, "starting")
    -- ctx.progress(0.5, "halfway")
    if ctx.cancelled() then return "(cancelled)" end
    -- …
    return "done"
end)

ctx is a new third arg to tool handlers — currently they take only args. Backwards compatible because old handlers just don't read it.

Wiring

  • Read params._meta.progressToken (a number or string) when handling tools/call. Stash it on the per-request ctx.
  • ctx.progress(progress, message?) sends notifications/progress over the open response channel.
  • Track in-flight request IDs in a server-local map; on notifications/cancelled, set a flag the corresponding ctx.cancelled() reads.

Capabilities

No new capability flag — both notifications are spec-mandatory utilities.

Scope (v1)

  • Token bookkeeping per request.
  • ctx.progress and ctx.cancelled.
  • Stop the busy-poll in run() (server.lua:122) when ctx.cancelled() flips. Surface "cancelled" in the tool response rather than the full output.

Depends on

  • Streamable HTTP done properly for notifications/progress to actually reach the client mid-call. Until then, progress() is a no-op (or stderr-logs).
  • Cancellation can be partially supported even without it: the request body can carry an _meta.progressToken, and a subsequent notifications/cancelled POST can flip a flag the in-flight handler polls.

Priority

Medium. Cancellation is the higher-value half — shell-tools that runaway today have no escape valve. Progress is nicer-to-have.

Add **Progress notifications** and **Cancellation** — both ride on the request `_meta` field and serve the same long-running-tool use case. ## Goal `shell`, `shell_bg`, `fetch`, `web_search`, and any future tool can take seconds to minutes. Today the client gets nothing until the response lands, and has no way to ask the server to stop. The spec gives both for free. ## Methods to add | Method | Direction | Notes | |---|---|---| | `notifications/progress` | server → client | `{ progressToken, progress, total?, message? }`. Token is whatever the client sent in `_meta.progressToken` on the original request. | | `notifications/cancelled` | bidirectional | `{ requestId, reason? }`. Server respects: stop the work, don't send a response. Server may also send: e.g. internal timeout. | ## API for lmcp ```lua server:tool("long_thing", "…", schema, function(args, ctx) -- ctx.progress(0.0, "starting") -- ctx.progress(0.5, "halfway") if ctx.cancelled() then return "(cancelled)" end -- … return "done" end) ``` `ctx` is a new third arg to tool handlers — currently they take only `args`. Backwards compatible because old handlers just don't read it. ## Wiring - Read `params._meta.progressToken` (a number or string) when handling `tools/call`. Stash it on the per-request `ctx`. - `ctx.progress(progress, message?)` sends `notifications/progress` over the open response channel. - Track in-flight request IDs in a server-local map; on `notifications/cancelled`, set a flag the corresponding `ctx.cancelled()` reads. ## Capabilities No new capability flag — both notifications are spec-mandatory utilities. ## Scope (v1) - Token bookkeeping per request. - `ctx.progress` and `ctx.cancelled`. - Stop the busy-poll in `run()` (server.lua:122) when `ctx.cancelled()` flips. Surface "cancelled" in the tool response rather than the full output. ## Depends on - **Streamable HTTP done properly** for `notifications/progress` to actually reach the client mid-call. Until then, progress() is a no-op (or stderr-logs). - Cancellation can be partially supported even without it: the request body can carry an `_meta.progressToken`, and a subsequent `notifications/cancelled` POST can flip a flag the in-flight handler polls. ## Priority **Medium**. Cancellation is the higher-value half — `shell`-tools that runaway today have no escape valve. Progress is nicer-to-have.
Author
Collaborator

Finding from implementation triage: cancellation has no deliverable value in pure-sync single-threaded Lua dispatch.

lmcp processes one request at a time: a tool handler runs to completion before the next stdin line / HTTP request is read. A notifications/cancelled line can therefore only arrive after the target request has already finished — ctx.cancelled() would never flip from false to true mid-call. The structural prerequisite is true async dispatch (a polling thread that reads new requests while a handler is running), which is part of issue #16 (Streamable HTTP) or would require migrating handlers to coroutines + yield points (invasive).

Recommendation: keep this open as a v2 follow-up to #16. The progress half (server → client notifications/progress) is also gated on the same bidirectional transport. Once #16 lands with the event loop, both halves of this issue become tractable in one pass.

No code change in this session.

**Finding from implementation triage:** cancellation has no deliverable value in pure-sync single-threaded Lua dispatch. lmcp processes one request at a time: a tool handler runs to completion before the next stdin line / HTTP request is read. A `notifications/cancelled` line can therefore only arrive *after* the target request has already finished — `ctx.cancelled()` would never flip from false to true mid-call. The structural prerequisite is true async dispatch (a polling thread that reads new requests while a handler is running), which is part of issue #16 (Streamable HTTP) or would require migrating handlers to coroutines + yield points (invasive). **Recommendation:** keep this open as a v2 follow-up to #16. The progress half (server → client `notifications/progress`) is also gated on the same bidirectional transport. Once #16 lands with the event loop, both halves of this issue become tractable in one pass. No code change in this session.
Owner

Make this v1.1.0

Make this v1.1.0
Author
Collaborator

Acked: v1.1.0. Will land with #20 (the structural prerequisite) — implementation stub already in place (server-initiated request helper, cancellation-token plumbing skeleton). Awaiting the v1.1.0 milestone assignment from a repo-write account.

Acked: v1.1.0. Will land with #20 (the structural prerequisite) — implementation stub already in place (server-initiated request helper, cancellation-token plumbing skeleton). Awaiting the v1.1.0 milestone assignment from a repo-write account.
marfrit added this to the v1.1.0 milestone 2026-05-17 18:52:40 +00:00
Author
Collaborator

Implemented (commit 55ead80, on master). Builds on #16 (Streamable HTTP) and #20 (concurrent handler dispatch).

ctx augmentation:

  • ctx.progress(p, total?, message?) — emits notifications/progress on the session's notify_q. No-op when the original request omitted _meta.progressToken (per spec). Type-checks numeric args; passes progressToken through unchanged (spec allows number OR string).
  • ctx.cancelled() — returns true once a notifications/cancelled for this request's id has arrived.

Notification side-effect:

  • notifications/cancelled scans the module-level _ctx_by_co for an in-flight ctx with matching request_id and flips self._cancelled_ids[rid_str] only if found. Unknown rids drop silently (no map growth — per Phase 5 review #2).
  • Pre-handler short-circuit: if cancel arrived before dispatch reached tools/call, the handler is skipped entirely.

Cross-module ctx lookup:

  • Module-level weak _ctx_by_co table in lmcp.lua keyed by coroutine. lmcp.current_ctx() returns the ctx of the running coroutine. server.lua:run() lazy-requires lmcp and uses it to opt into auto-cancellation without depending on lmcp internals (no _current_server singleton — Phase 5 review #1).

server.lua:run():

  • After each sleep_ms cycle, checks ctx.cancelled(); exits poll loop with cancelled=true if set.
  • Poll interval capped at 500ms when a ctx is present so worst-case cancel latency stays ≤500ms.
  • Returns "(cancelled)" sentinel; handler propagates normally.

Measurements:

cancel timing (3 runs, sleep 10 with cancel at 0.4s):
  run 1: t=0.42s code=-32800
  run 2: t=0.42s code=-32800
  run 3: t=0.42s code=-32800
progress: 3/3 events arrived on SSE; spec-shaped payload
concurrent fast+slow (#20 regression): unchanged (fast 0.01s)
all previously-closed issues: regression green

Phase 4 deviation, documented:

Plan was "silent TCP close" per spec wording "MUST/SHOULD NOT respond." Empirically: os.execute in server.lua:run() fork+execs a shell subprocess that inherits the parent's TCP socket FD. Even after sock:close(), the connection stays open at kernel level until the spawned shell exits (i.e. the long-running command completes anyway, defeating the purpose).

Verified luasocket's close() does work in isolation (curl exits with RST in 511ms on a bare-socket test) — the issue is exclusively fork-inherit.

Trade-off taken: emit JSON-RPC -32800 Request cancelled (conventional code for cancellation). Spec wording is "SHOULD NOT respond" (not MUST). Client sees structured response in ~420ms with proper error correlation — better UX than waiting for the underlying shell to complete.

Proper fix deferred: set FD_CLOEXEC on accepted sockets. luasocket doesn't expose it; needs a C shim or luaposix. Tracked as potential v1.2+ follow-up.

Memory: project_fd_inheritance_in_run.md captures the trap so the same surprise doesn't bite future "silent close" features.

Implemented (commit `55ead80`, on master). Builds on #16 (Streamable HTTP) and #20 (concurrent handler dispatch). **ctx augmentation:** - `ctx.progress(p, total?, message?)` — emits `notifications/progress` on the session's notify_q. No-op when the original request omitted `_meta.progressToken` (per spec). Type-checks numeric args; passes progressToken through unchanged (spec allows number OR string). - `ctx.cancelled()` — returns `true` once a `notifications/cancelled` for this request's id has arrived. **Notification side-effect:** - `notifications/cancelled` scans the module-level `_ctx_by_co` for an in-flight ctx with matching `request_id` and flips `self._cancelled_ids[rid_str]` only if found. Unknown rids drop silently (no map growth — per Phase 5 review #2). - Pre-handler short-circuit: if cancel arrived before dispatch reached `tools/call`, the handler is skipped entirely. **Cross-module ctx lookup:** - Module-level weak `_ctx_by_co` table in lmcp.lua keyed by coroutine. `lmcp.current_ctx()` returns the ctx of the running coroutine. `server.lua:run()` lazy-requires lmcp and uses it to opt into auto-cancellation without depending on lmcp internals (no `_current_server` singleton — Phase 5 review #1). **`server.lua:run()`:** - After each `sleep_ms` cycle, checks `ctx.cancelled()`; exits poll loop with `cancelled=true` if set. - Poll interval capped at 500ms when a ctx is present so worst-case cancel latency stays ≤500ms. - Returns `"(cancelled)"` sentinel; handler propagates normally. **Measurements:** ``` cancel timing (3 runs, sleep 10 with cancel at 0.4s): run 1: t=0.42s code=-32800 run 2: t=0.42s code=-32800 run 3: t=0.42s code=-32800 progress: 3/3 events arrived on SSE; spec-shaped payload concurrent fast+slow (#20 regression): unchanged (fast 0.01s) all previously-closed issues: regression green ``` **Phase 4 deviation, documented:** Plan was "silent TCP close" per spec wording "MUST/SHOULD NOT respond." Empirically: `os.execute` in `server.lua:run()` fork+execs a shell subprocess that inherits the parent's TCP socket FD. Even after `sock:close()`, the connection stays open at kernel level until the spawned shell exits (i.e. the long-running command completes anyway, defeating the purpose). Verified luasocket's `close()` does work in isolation (curl exits with RST in 511ms on a bare-socket test) — the issue is exclusively fork-inherit. **Trade-off taken:** emit JSON-RPC `-32800 Request cancelled` (conventional code for cancellation). Spec wording is "SHOULD NOT respond" (not MUST). Client sees structured response in ~420ms with proper error correlation — better UX than waiting for the underlying shell to complete. **Proper fix deferred:** set `FD_CLOEXEC` on accepted sockets. luasocket doesn't expose it; needs a C shim or `luaposix`. Tracked as potential v1.2+ follow-up. Memory: `project_fd_inheritance_in_run.md` captures the trap so the same surprise doesn't bite future "silent close" features.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/lmcp#11