v0.5.3 - lmcp - marfrit's space

marfrit/lmcp

Fork 0

RSS Feed

v0.5.3 17af91a99b

Compare
v0.5.3: hub hardening — hard ssh timeout, parallel probes, sticky DOWN cache

marfrit released this 2026-04-21 11:58:44 +00:00 | 9 commits to master since this release
Three fixes addressing the recurring "hub wedges on offline backends" class
of failures (2 incidents in 24h, root-caused to single-threaded Lua + an
uninterruptible os.execute ssh call):
1. Hard wall-clock cap on ssh fallback via GNU timeout --kill-after=2 30.
  ConnectTimeout alone only bounds TCP connect; a half-dead sshd (auth
  stall, remote bash-s hang) used to freeze the whole event loop
  indefinitely. Configurable via LMCP_HUB_SSH_HARD_TIMEOUT. Also adds
  ServerAliveInterval=5/Count=2 so an established-but-dead tunnel dies.
2. Parallel lmcp probes for remote_list_hosts. Shells out a single bash
  fan-out of curl -m 3 calls, bounded by PROBE_BUDGET. Wall clock for a
  full 12-backend probe went from ~28 s (sum of per-host ssh connect
  timeouts) to ~3 s.
3. Probe is lmcp-only — ssh is no longer used as health check. The hub
  exists to absorb lots of offline hosts, so an expensive ssh per probe
  was the exact wrong tradeoff. Actual remote_* tool calls still fall
  through to ssh fallback when lmcp is down.
4. Sticky DOWN cache with exponential backoff: 60 → 120 → 240 → 480 →
  900 s. Prevents a sleeping fleet from burning probe budget on every
  health check. UP hosts still use 30 s TTL. Tunable via
  LMCP_HUB_PROBE_TTL_{UP,DOWN_MIN,DOWN_MAX}.
5. Per-request logging to stderr (tool, host, via, elapsed) — invisible
  before, now captured in journal for the next hang's RCA.
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
Downloads
- Source Code (ZIP)
- Source Code (TAR.GZ)

1 Release 15 Tags

v0.5.3: hub hardening — hard ssh timeout, parallel probes, sticky DOWN cache