Add sampling capability (server-initiated sampling/createMessage) #9

Closed
opened 2026-05-17 15:56:08 +00:00 by claude-noether · 1 comment
Collaborator

Add the Sampling capability — the server can ask the client's LLM to generate text on its behalf (server-initiated sampling/createMessage request).

Goal

Some tools want to reason mid-execution without requiring the human to drive each step. A web_search tool could ask the client LLM to pick the best 3 of 25 raw results before returning. An analyze_log tool could ask for a one-line summary. With sampling, the server makes the request and waits for the client's response.

Methods to add

Method Direction Notes
sampling/createMessage server → client { messages: [{ role, content: { type, text } }], modelPreferences?, systemPrompt?, includeContext?, temperature?, maxTokens, stopSequences? }{ role, content, model, stopReason? }

API for lmcp

local function sample(ctx, prompt, max_tokens)
    return server:sample({
        messages = {{ role = "user", content = { type = "text", text = prompt } }},
        maxTokens = max_tokens or 512,
    }, ctx)  -- ctx carries the request id we'll correlate on
end

ctx is needed because the request being served must remain open while the server awaits the client's response — only that connection can route the answer back.

Capabilities (advertised by client, server checks)

client.capabilities.sampling = {}

If absent, server:sample returns an error immediately rather than hanging.

Scope (v1)

  • Text content only.
  • One pending sampling request at a time per tool invocation (no nesting).
  • modelPreferences passed through verbatim — server doesn't validate.

Depends on

  • Streamable HTTP done properly. Without persistent server-initiated SSE, there is no channel for the server to send a request to the client. This issue is a no-op until that one lands.

Priority

Medium. High-leverage when it works (unlocks agentic tools), but heavily transport-gated. Pair with the Streamable HTTP work.

Add the **Sampling** capability — the server can ask the client's LLM to generate text on its behalf (server-initiated `sampling/createMessage` request). ## Goal Some tools want to reason mid-execution without requiring the human to drive each step. A `web_search` tool could ask the client LLM to pick the best 3 of 25 raw results before returning. An `analyze_log` tool could ask for a one-line summary. With sampling, the server makes the request and waits for the client's response. ## Methods to add | Method | Direction | Notes | |---|---|---| | `sampling/createMessage` | server → client | `{ messages: [{ role, content: { type, text } }], modelPreferences?, systemPrompt?, includeContext?, temperature?, maxTokens, stopSequences? }` → `{ role, content, model, stopReason? }` | ## API for lmcp ```lua local function sample(ctx, prompt, max_tokens) return server:sample({ messages = {{ role = "user", content = { type = "text", text = prompt } }}, maxTokens = max_tokens or 512, }, ctx) -- ctx carries the request id we'll correlate on end ``` `ctx` is needed because the request being served must remain open while the server awaits the client's response — only that connection can route the answer back. ## Capabilities (advertised by **client**, server checks) ``` client.capabilities.sampling = {} ``` If absent, `server:sample` returns an error immediately rather than hanging. ## Scope (v1) - Text content only. - One pending sampling request at a time per tool invocation (no nesting). - `modelPreferences` passed through verbatim — server doesn't validate. ## Depends on - **Streamable HTTP done properly**. Without persistent server-initiated SSE, there is no channel for the server to send a request to the client. This issue is a no-op until that one lands. ## Priority **Medium**. High-leverage when it works (unlocks agentic tools), but heavily transport-gated. Pair with the Streamable HTTP work.
Author
Collaborator

Implemented. Builds on the bidirectional transport from #16.

Added in lmcp.lua:

  • self._client_caps captured from initialize request params
  • server:sample(session_id, opts, on_response) — thin wrapper over server_request that enforces the client claimed sampling capability and validates opts.messages + opts.maxTokens
  • Tool handler ctx now exposes session_id so handlers can call self:sample(ctx.session_id, ...)

Verified end-to-end (no real client LLM needed):

  1. Init with capabilities:{sampling:{}} → session created
  2. Open SSE GET on that session
  3. Call a tool that fires server:sample(...) — server emits a sampling/createMessage JSON-RPC request on the SSE stream with id srv-N
  4. Simulate client posting back {jsonrpc, id:"srv-N", result:{role, content, model, stopReason}} → 202 Accepted
  5. Server-side on_response callback fires with parsed result

Capability gate verified: tool calls sample(...) when init didn't claim sampling → returns false, "client did not advertise sampling capability".

Honest limit (same as #11): tool handlers cannot await the sampling response in the current single-threaded event loop. The handler must dispatch + return immediately; the callback fires later (out-of-band). Real await patterns wait on #20 (concurrent handler dispatch). Today's value: fire-and-forget patterns.

Implemented. Builds on the bidirectional transport from #16. **Added in lmcp.lua:** - `self._client_caps` captured from `initialize` request params - `server:sample(session_id, opts, on_response)` — thin wrapper over `server_request` that enforces the client claimed `sampling` capability and validates `opts.messages` + `opts.maxTokens` - Tool handler `ctx` now exposes `session_id` so handlers can call `self:sample(ctx.session_id, ...)` **Verified end-to-end (no real client LLM needed):** 1. Init with `capabilities:{sampling:{}}` → session created 2. Open SSE GET on that session 3. Call a tool that fires `server:sample(...)` — server emits a `sampling/createMessage` JSON-RPC request on the SSE stream with id `srv-N` 4. Simulate client posting back `{jsonrpc, id:"srv-N", result:{role, content, model, stopReason}}` → 202 Accepted 5. Server-side `on_response` callback fires with parsed result Capability gate verified: tool calls `sample(...)` when init didn't claim sampling → returns `false, "client did not advertise sampling capability"`. **Honest limit (same as #11):** tool handlers cannot `await` the sampling response in the current single-threaded event loop. The handler must dispatch + return immediately; the callback fires later (out-of-band). Real `await` patterns wait on #20 (concurrent handler dispatch). Today's value: fire-and-forget patterns.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/lmcp#9