Same json.lua empty-table → [] gotcha that bit `ping` in v1.0.0-rc1
(project_json_empty_table_gotcha memory) bit again — this time on
tool inputSchemas with `properties = {}`. Symptom: spec-strict MCP
clients (Zod et al.) reject tools/list with:
expected: record, code: invalid_type,
path: [tools, N, inputSchema, properties],
message: "Invalid input: expected record, received array"
Fix: in `lmcp:tool()`, normalise the registered inputSchema —
when `properties` is an empty Lua table, drop the key entirely.
JSON Schema permits omitting `properties` on `type: "object"`
(means "any object, no constraints" — exactly what a no-arg tool
wants).
Clone-before-mutate so the caller's table isn't trampled (matters
when a server author shares one schema across multiple
registrations).
Smoke tested locally with 3 tools (empty, default-nil, populated):
- `properties = {}` → emitted as `{"type":"object"}`
- nil schema → same default, same output
- populated properties → emitted intact with full shape
Discovered against hertz-tools live (lxc_list, network_status had
`properties = {}` — hertz hotfixed by hand before this commit;
this protects every future tool author from the same trap).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Discovered building v1.1.0 that the MSI can be produced entirely on
Linux — no Windows VM, no manual WiX install, no GUI babysitting:
apt install wixl unzip gcc-mingw-w64-x86-64 binutils-mingw-w64-x86-64 \
mingw-w64-x86-64-dev curl
The new build-msi.sh script:
1. Runs sync.sh to refresh pkg/{lmcp,server,json}.lua from root.
2. Downloads Lua 5.4.2 Win64 binaries from LuaBinaries (Tools +
Library zips — interpreter + headers + import lib).
3. Cross-compiles LuaSocket 3.1.0 via x86_64-w64-mingw32-gcc
(produces socket-3.0.0.dll + mime-1.0.3.dll for Win64).
4. Stages pkg/lua/{lua.exe, lua54.dll, socket/, mime/, *.lua} per
the WiX manifest layout.
5. Invokes wixl on the lmcp.wxs manifest (with sed for the
Windows backslash path separators → forward slashes).
Output: lmcp-<version>.msi. Version is read from lmcp.wxs
Version="…", so bump that before each release.
Cold build: ~30s. Warm cache: ~5s. The artifact contains all 17
files the WiX manifest expects, ProductVersion matches lmcp.wxs.
README updated to point at build-msi.sh as the recommended path;
the Windows-side candle/light recipe kept as an alternative.
Reproducibility note (deferred): the MSI is not yet bit-reproducible
across builds — file mtimes in the Lua binaries' zip propagate to
the cab inside the MSI. The debian/lmcp/build-deb.sh in marfrit-
packages uses SOURCE_DATE_EPOCH to fix this; same pattern would
apply here. Out of scope for the first cut.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
windows/ was previously an untracked working tree with manually-
copied .lua files that drifted ~6 months out of date (missed every
feature added since April 2026). #18 introduces Option 1 from the
issue body: build-time sync.
New tracked files:
- windows/sync.sh — copies root {lmcp,server,json}.lua to pkg/.
Idempotent; run before WiX. Catches missing source files; logs
each sync.
- windows/README.md — workflow doc + tracked-vs-generated map.
- windows/lmcp.wxs — MSI manifest (Version bumped 0.1.0 → 1.1.0).
- windows/pkg/{install_service,start}.bat — Windows service
installer + launcher (now tracked; they were already in pkg/).
New .gitignore at repo root:
- windows/pkg/{lmcp,server,json}.lua — regenerated by sync.sh
- windows/pkg/lua/ — bundled Lua + LuaSocket runtime (downloaded
separately, not in git)
- editor noise (*.swp, *.swo, .DS_Store)
Verification (Phase 7):
$ ./windows/sync.sh
synced lmcp.lua
synced server.lua
synced json.lua
$ diff lmcp.lua windows/pkg/lmcp.lua → empty
$ git ls-files -o --exclude-standard windows/
windows/README.md
windows/lmcp.wxs
windows/pkg/install_service.bat
windows/pkg/start.bat
windows/sync.sh
$ git check-ignore windows/pkg/{lmcp,server,json}.lua → all 3 ignored
The "missed every feature since April" failure mode this fixes:
running sync.sh before each MSI build now guarantees pkg/ matches
master. Forgetting to run it is failure-loud (the MSI ships the
last sync's snapshot, easy to spot in QA), not silent (the manifest
points at fresh files that mismatch root).
Closes v1.1.0 milestone with #11, #20.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ctx augmentation:
- ctx.progress(p, total?, message?) emits notifications/progress on
the session's notify_q. No-op when the original request omitted
_meta.progressToken (per spec: only emit when client opted in).
Type-checks numeric args; passes progressToken through unchanged
(spec allows number OR string keys).
- ctx.cancelled() returns true once the client has sent a
notifications/cancelled for this request's id.
handle_request:
- New side-effect in the id==nil branch: notifications/cancelled
scans the module-level _ctx_by_co for an in-flight ctx whose
request_id matches; flips self._cancelled_ids[rid_str] only when
found. Unknown rids drop silently (no map growth).
- Pre-handler short-circuit: if cancel arrived before dispatch
reached tools/call, skip the handler entirely.
Cross-module ctx lookup:
- Module-level weak _ctx_by_co table in lmcp.lua keyed by
coroutine. lmcp.current_ctx() returns the ctx of the running
coroutine. server.lua's run() lazy-requires lmcp and uses it
to opt into auto-cancellation without depending on lmcp internals.
server.lua:run():
- After each sleep_ms cycle, check ctx.cancelled(); exit poll loop
with cancelled=true if set.
- Poll interval capped at 500ms when a ctx is present so worst-case
cancel latency stays ≤500ms (vs. 2s default growth).
- Returns "(cancelled)" sentinel; handler propagates normally.
_finalise_dispatch:
- Single cleanup site for both _cancelled_ids and _ctx_by_co (per
Phase 5 review).
- When was_cancelled: emit JSON-RPC -32800 "Request cancelled"
(deviation from Phase 4 plan; documented).
Phase 4 deviation explained: plan was silent TCP close (per spec
"SHOULD NOT respond"). Empirically: os.execute's fork+exec
inherits the parent's TCP socket FD into the spawned shell, so
sock:close() doesn't actually deliver FIN until the subshell exits
(i.e. the long-running command completes anyway). Verified
luasocket close() works on bare sockets (curl exits with RST in
511ms). The fix would be FD_CLOEXEC on accepted sockets, which
luasocket doesn't expose — needs a C shim or luaposix. Deferred.
Captured in memory project_fd_inheritance_in_run.
Practical UX with the deviation: client receives a structured
-32800 error within ~420ms of POSTing the cancel notification.
Measurements (Phase 7):
cancel timing (3 runs, sleep 10 with cancel at 0.4s):
run 1: t=0.42s code=-32800
run 2: t=0.42s code=-32800
run 3: t=0.42s code=-32800
progress: 3/3 events arrived on SSE; spec-shaped payload
concurrent fast+slow (#20 regression): unchanged (fast 0.01s)
all previously-closed issues regression-test green
Zero handler source-code changes. Existing tools (shell, fetch,
web_search, hub remote_*) get cancellation for free via run().
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the synchronous tools/call path with a coroutine-wrapped
dispatch. The select()-based event loop from v1.0.0-rc1 already
multiplexes I/O; this change extends the same single-thread
cooperative scheduling to tool handler execution.
How:
- server.lua:sleep_ms detects coroutine context and yields with
{ wake_at = gettime() + ms/1000 } instead of blocking. Falls back
to today's busy-blocking sleep when on the main thread (stdio
dispatch, init code).
- server.lua:run() now uses gettime() deltas for timeout accounting
(Phase 5 review fix — the prior interval-accumulator diverged
from wall-clock when scheduler delayed resumes).
- lmcp.lua wraps the handle_request call inside _dispatch_post in a
coroutine. Synchronous completion (no yield) takes the inline-
response path; if the handler yields, the coroutine parks in
self._pending_handlers and the conn enters dispatching_async.
- New _scheduler_tick services pending coroutines whose wake_at has
passed; on completion calls the shared _finalise_dispatch helper
to build the deferred HTTP response (Accept-aware: SSE or JSON).
- select() timeout tightens to the next pending wake_at so short
yields don't pay the full 100ms tick.
Measurement (Phase 7):
before: fast ping during slow shell sleep 3 = 4.28s
after: fast ping during slow shell sleep 3 = 0.01s (~400×)
3 parallel slow shells: 3.77s total wall (was ~9s).
Zero handler source-code changes. Every existing tool that goes
through run() (shell, shell_bg, fetch, web_search, list_dir,
search_files, systeminfo, hub remote_*) gets concurrency for free.
Pure-Lua handlers (ping, read_file, write_file, edit_file) continue
to complete inline. stdio transport stays serialised by design
(single-client per stdio process).
Known limits documented in memory project_handler_coroutines:
- socket.gettime() is wall-clock not monotonic; large NTP steps may
bunch resumes. Acceptable on chrony-slewed fleet.
- Cancellation (#11) is now tractable since the scheduler can flip a
flag between resumes — implementation pending.
- Server-initiated request await (sampling/roots from inside a
handler) still requires a future yield-on-pending helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Companion to lmcp-hub.service. Gives a copy-and-edit starting point for
per-host lmcp instances (foo-tools style). Handles the Arch-vs-Debian
/usr/bin/lua vs /usr/bin/lua5.4 split via a comment pointing users to
override ExecStart.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three fixes addressing the recurring "hub wedges on offline backends" class
of failures (2 incidents in 24h, root-caused to single-threaded Lua + an
uninterruptible os.execute ssh call):
1. Hard wall-clock cap on ssh fallback via GNU `timeout --kill-after=2 30`.
ConnectTimeout alone only bounds TCP connect; a half-dead sshd (auth
stall, remote bash-s hang) used to freeze the whole event loop
indefinitely. Configurable via LMCP_HUB_SSH_HARD_TIMEOUT. Also adds
ServerAliveInterval=5/Count=2 so an established-but-dead tunnel dies.
2. Parallel lmcp probes for remote_list_hosts. Shells out a single bash
fan-out of curl -m 3 calls, bounded by PROBE_BUDGET. Wall clock for a
full 12-backend probe went from ~28 s (sum of per-host ssh connect
timeouts) to ~3 s.
3. Probe is lmcp-only — ssh is no longer used as health check. The hub
exists to absorb lots of offline hosts, so an expensive ssh per probe
was the exact wrong tradeoff. Actual remote_* tool calls still fall
through to ssh fallback when lmcp is down.
4. Sticky DOWN cache with exponential backoff: 60 → 120 → 240 → 480 →
900 s. Prevents a sleeping fleet from burning probe budget on every
health check. UP hosts still use 30 s TTL. Tunable via
LMCP_HUB_PROBE_TTL_{UP,DOWN_MIN,DOWN_MAX}.
5. Per-request logging to stderr (tool, host, via, elapsed) — invisible
before, now captured in journal for the next hang's RCA.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
server.lua gains a shell_bg tool that launches a detached command via
setsid + nohup + stdio-redirect + &, returns immediately with PID and
log path. Linux-only for MVP (Windows Start-Process equivalent TBD).
hub.lua gains remote_shell_bg, forwarding to backend shell_bg. lmcp-only,
no ssh fallback — fallback for fire-and-forget is semantically murky.
Addresses the 'how do I launch a daemon over lmcp without the sentinel-
file wrapper blocking forever' question. Existing remote_shell keeps
its current synchronous-with-timeout behaviour.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BSD find on macOS silently emits nothing when the starting path is
itself a symlink (no trailing slash, no -L). On riemann with Homebrew,
/usr/local/share/lua is a symlink to /usr/local/Cellar/luarocks/.../share/lua
which tripped this — search_files returned empty for clearly-matching
patterns. GNU find on Linux follows the starting arg by default, so the
bug was invisible on every other host.
Add -L explicitly. Both BSD and GNU find accept it, both detect cycles,
and behavior becomes consistent.
Fixes marfrit-tracker task #16 (opened 2026-04-18 while stress-testing
riemann-tools MCP).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
One lmcp server on a central host (typically hertz) that proxies
remote_* tools to every backend in a registry, with a clean SSH
fallback for hosts whose lmcp is temporarily down or not installed.
Tools: remote_list_hosts, remote_{shell,read_file,write_file,edit_file,
list_dir,search_files}. Each takes a `host` argument naming the target
in /opt/herding/etc/hub-backends.conf (or $LMCP_HUB_BACKENDS).
Lazy 30s health cache; `remote_list_hosts force=true` bypasses it.
Bearer auth on inbound (standard lmcp opts.conf / LMCP_TOKEN machinery);
backend Bearer tokens kept in the registry and forwarded per-call.
SSH fallback uses `ssh host 'bash -s' < local_script` — stdin-piped
script body is the canonical shell-escape-free technique. Covers
shell/read_file/write_file/list_dir/search_files. edit_file is lmcp-only
because the literal-match + uniqueness check is nontrivial to replicate
safely in shell.
Ships an example systemd unit and a commented backends.conf template
in examples/. No migration required for existing lmcp deployments —
hub.lua is additive alongside the existing server.lua.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Original draft assumed `brew install lua luasocket` works. It doesn't:
luasocket isn't a brew formula (install via luarocks), and default `lua`
is 5.5 while the rest of the fleet is on 5.4. Fix tested on riemann
(Intel Mac, macOS 14.8.3):
- Pin to lua@5.4 (keg-only brew formula) — matches fleet library paths.
- Install luasocket via luarocks into the user-local rocks tree.
- Source brew shellenv ourselves so non-login bash shells can find brew.
- Bake LUA_PATH / LUA_CPATH into the LaunchAgent plist so the service
resolves `require 'socket'` from ~/.luarocks/.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lmcp.lua: if opts.auth_token and opts.conf are both unset, fall back to
the LMCP_TOKEN environment variable. Empty string treated as unset.
This is the primitive launchd/systemd drop-ins need — no conf file
bookkeeping on hosts that don't already use one.
scripts/lmcp-install-macos.sh: macOS installer via Homebrew. Drops the
Lua library files into $(brew --prefix)/share/lua/5.4/, mints (or
reuses) a Bearer token stored at $(brew --prefix)/etc/lmcp/token,
installs a ~/Library/LaunchAgents/ plist with LMCP_TOKEN baked in,
launchctl-loads it, and smoke-tests. Prints the Claude Code ~/.claude.json
snippet at the end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Literal string replacement with uniqueness check. Fails if old_string
is not found or matches multiple times (unless replace_all=true).
Matches the Claude Code harness Edit tool so sibling lmcp clients get
the same behaviour they already expect for in-place patches.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add MAX_BODY_SIZE (64KB) check before reading body — prevents pre-auth
OOM on internet-facing deployments
- Add JSON nesting depth limit (64 levels) — prevents C stack overflow
that bypasses pcall and crashes the process
- Timing-safe token comparison via XOR accumulate — prevents timing
oracle on Bearer token
- Auth token from LMCP_TOKEN env var (highest priority) — avoids storing
token in a file readable by the read_file tool
- Silent handling of unknown JSON-RPC notifications (spec compliance)
- Exact path matching on /mcp endpoint (was prefix-based)
- Remove dead json.array() function
Findings from architecture review + security audit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>