Add web_search tool — pluggable backend (SearXNG / DDG-html / Tavily / Brave)
#4
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Goal
First-class web-search primitive in lmcp returning
[{title, url, snippet}]. Pairs with the proposedfetchtool (issue #3) — search yields URLs, fetch reads them.Motivation
aish(and any LLM-driven MCP client) needs a search step before it can usefetchfor anything that isn't a URL the user already typed. Today there is no path inside aish; the user has to leave the shell, search, and paste a URL back. A structuredweb_searchtool inside lmcp closes that loop.Unlike
fetch, search is not implementable purely withcurl+ post-processing — it requires a search backend. So this is a real new capability rather than a shell-wrapping convenience.API sketch
Results are ranked by the upstream backend; lmcp does no re-ranking.
Backend options (configurable at server start)
<instance>/search?format=json&q=…. Free, no API key, no rate ceiling beyond the instance's own. mfritsche could host one at e.g.searx.fritz.boxand point lmcp at it viaLMCP_SEARXNG_URL. Recommended default backend.https://html.duckduckgo.com/html/and parse the result. Brittle (DDG changes the HTML occasionally) but zero-config. Useful so the tool works out of the box even without setup.https://api.tavily.com/search) — API key, paid past free tier, returns JSON natively. Quality is good; cost is real.https://api.search.brave.com/res/v1/web/search) — API key, free tier exists. JSON native.Backend selected by env at lmcp start:
If the selected backend isn't configured (missing URL or key), tool surfaces a clean error rather than silently falling back — the operator should know which backend is being used.
Implementation notes
curl -sS+json.lua+ small per-backend normalization to the common result shape.lmcp.new()startup: log which backend is active and whether config is complete; don't error out (operator may add the tool but not configure it yet — that's fine, errors land at call time).safesearch— pass-through to backends that support it (SearXNG, DDG, Brave); silently ignore on backends that don't (Tavily).Priority
Medium-low.
fetch(issue #3) is the higher-leverage one — once that lands, the model can do research on URLs the user pastes.web_searchis needed for the autonomous flow where the user just says "find me an example of X" and Norris mode goes off and gathers URLs itself.Out of scope
Related
fetchtool) — pairs with this; search→fetch is the canonical flow.Implemented in server.lua.
web_searchtool with backend pluggability: explicitLMCP_SEARCH_BACKENDenv wins; auto-picks first-configured ofSEARXNG_URL/TAVILY_API_KEY/BRAVE_API_KEY; falls back to DDG-HTML zero-config. Structured{ok, backend, query, results:[{title,url,snippet,age?}], error?}envelope.DDG parser: per-block iteration (avoids title↔snippet mispair); per-block URL unwrap from
uddg=; drops results with un-decodable href; surfaces anti-bot 202 as structured{ok=false, error="ddg parser matched no results"}rather than silent empty list.Memory: project_search_backends.md captures that DDG is anti-bot-blocked from the deployment host — SearXNG self-host is the recommended path. Phase 5 reviewer caught a Phase 0 loopback (DDG worked once then 202-d within the same session); success criterion was honestly re-anchored.