Add fetch tool — HTTP GET with bounded output and optional HTML→plain rendering
#3
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Goal
First-class web-fetch primitive in lmcp so MCP clients (notably
aish) can pull URLs without going through the genericshelltool. Cleaner schema for the model to introspect, safer (no shell-injection surface), and the byte-cap / render mode are part of the contract rather than ad-hoc pipe stages.Motivation
Web research is currently possible via
shell+curl … | pandoc -f html -t plain | head -c 50000. Three problems with that:fetchtool withurl+render+max_bytesparameters is much easier for the model to pick correctly, and surfaces in:mcp toolsas a discrete capability.head -c N. Easy to forget; OOM-grade pages then poison context.pandoc/lynxmay or may not be installed on the lmcp box. Afetchhandler can pick whatever is available (or shell out internally) and present one consistent interface to the client.API sketch
Follow redirects by default (up to 5). Reject
file:///gopher:/// non-http(s) schemes. No cookie persistence.Implementation notes
shelltool), so handler can becurl -sSL --max-time T -A UA -w '\n__HTTP_STATUS__=%{http_code}\n__CONTENT_TYPE__=%{content_type}\n' …and post-process the body. No new C dependency.render="plain": trypandoc -f html -t plainfirst, thenlynx -stdin -dump -nolist, thenw3m -dump -T text/html, then raw. Cache which renderer worked across calls (per-process).max_bytesbefore rendering —curl --max-filesizeplus a Lua-side defensive trim.json.lua), not just a string, so the model can see the status code without parsing.Priority
Medium. The
shell-based workaround exists, but every aish session that wants web research re-derives the samecurl | pandoc | headincantation, and the model picks the wrong cap roughly half the time. Worth ~half a day to land cleanly.Out of scope (defer to a follow-up)
authparam; left out of v1 to keep the surface small.Implemented in server.lua.
fetchtool: HTTP GET/HEAD via curl with--max-filesize(mid-stream cap),--max-time, structured{ok, status, content_type, bytes_read, body, truncated, renderer, error?}return, renderer chainpandoc → lynx → w3m → pure-Lua strip, RFC-3986 URL whitelist. Verified live across 12 cases incl. truncation, transport failures, 404, HEAD, malformed URLs.Also patched json.lua to combine UTF-16 surrogate pairs (issue #4 context — same change-cycle benefits both tools).
Memory: project_runtime_lua.md captures the LuaJIT vs Lua 5.4 os.execute portability rule the renderer probe relies on.