needle-rs + Browser MCP Client
Published
Last time I wrote to you all, I showed a simple implementation of the MCP spec in javascript. At the time I didn’t have a great way to actually use it - most tool calling LLMs that had been ported to the browser were pretty large. Smaller machines couldn’t run them at all, and large machines couldn’t run them fast.
Luckily people are doing fun things every day, like the folks at Cactus Compute, who have created Needle, a 26M parameter model that calls tools and does nothing else. It’s perfect for what I need, and because I didn’t find it until it had already been out for a week, somebody else compiled a WASM runtime for it so I can run it in the browser.
It’s pretty simple to connect this with the MCP client I wrote, and the results are really fun - below, you can try out connecting to an MCP server, loading the model and tools, and then querying the model. Once again, I’ll link to this list of public MCP servers you can use for testing. Code is on github if you want to reproduce!
Challenges
Needle does well for single tool calls based on specific and direct queries, but does a bit worse when requests are chained (Roll a d20, roll a d6, then pick a number between 1 and 100 and flip 2 coins) or vague. Ideally, we could have a separate layer with an additional model specialized to decompose tasks and refer complicated tasks to server-side models.
There’s also the issue inherent to using this model, which is that you’ll need another model for actual text generation (i.e. passing on the tool results to users in plain language). But for instances where you’re handling the presentation of data with UI components, this model alone is a good fit.
Basic Implementation
The component below wires needle-rs into the MCP flow:
MCPClient.connect()to establish a session.listTools()to get server tools.NeedleWasm.load()with safetensor weights +vocab.txt.engine.run(query, toolsJson)to produce structured tool-call JSON.client.callTool(...)to execute the selected tool.
Needle + MCP Router
Tool schemas
Router output
Execution output
Log
Notes
- The default model asset URLs in the demo point to the public Hugging Face files from the
needle-rsREADME. - For production usage, host the weight files behind your own caching strategy.
- This pattern works well when you want deterministic local routing and remote tool execution over MCP.