Writing a Good MCP Server
May 12, 2026
Every time someone declares MCP dead, a new major platform ships an MCP integration. I wrote about why MCP is the protocol that sticks a few weeks ago. The short version: the moment you have multiple users accessing their own data through an agent, you need a protocol, and MCP is converging as the one. That fight is settled enough that the interesting question has moved on. Not whether to build an MCP server. How to build one that works.
Most MCP servers are bad. Not architecturally. Functionally. They expose too many tools. The tool names are confusing. The error messages are useless. The model bumbles through them and the developer blames the model. It’s not the model. It’s the interface you gave it.
An MCP server is a developer experience product. The developer is an LLM. Once you internalize that, the design decisions get obvious.
Minimize the Tool Surface
The single most impactful thing you can do is expose fewer tools.
Models choose from tool definitions the same way they choose from context: by attending to all of it and selecting what seems most relevant. Ten tools is a small menu. Fifty is a wall of text the model has to parse on every single turn. Each tool definition eats context tokens, and each additional option makes the selection problem harder. Anthropic’s own guidance says to keep tool counts under a few dozen where possible, and in my experience the number is lower than that.
Cloudflare ran into this at the extreme end. Their API has 2,500+ endpoints. Exposing each as an MCP tool would consume over a million tokens of context just for the definitions. Their solution was Code Mode: collapse the entire API into two tools, search() and execute(). The model searches for relevant API endpoints, then writes code against a typed SDK to call them. Two tools. 1,000 tokens. 99.9% reduction in context overhead.
You probably don’t have 2,500 endpoints. The principle still applies. If you have get_user, get_user_by_email, get_user_by_id, get_user_by_name, and search_users, you have four tools too many. One find_user tool with a flexible query parameter does the same work and leaves the model with one choice instead of five.
The question for every tool: does this need to be a separate action, or is it a parameter on an existing action? Most of the time it’s a parameter.
Tool Names Are Prompting
A tool name is not a label for your convenience. It is a token sequence the model reads during planning. It influences how the model reasons about what the tool does and when to use it.
set_book_status(status: "enabled") and enable_book() expose the same functionality. They are different prompts. enable_book tells the model there is a discrete action for enabling books. set_book_status tells the model there is a state machine and it is setting a value. The second framing invites the model to ask what other statuses exist. The first says: flip this switch.
Both are valid. Neither is neutral. You’re shaping the model’s mental model of your API every time you name a tool.
Names that work well:
- Verb-noun pairs that describe the action:
find_customer,create_invoice,cancel_order. The model parses these instantly. - Consistent prefixes that group related tools:
inventory_check,inventory_update,inventory_reserve. The model infers that these operate on the same domain.
Names that cause problems:
- Generic names like
process,handle,execute. The model has to read the full description to understand what these do. On every single turn. - Abbreviations and internal jargon.
upsrt_cust_recmeans nothing to a model that didn’t sit through your onboarding. - Inconsistent patterns. If one tool is
getUserand another isfetch_ordersand a third islistAllPayments, the model spends attention reconciling three naming conventions instead of understanding three tools.
Error Messages Are Prompting
When your MCP tool returns an error, you are not logging to a file. You are speaking directly to an LLM that just tried to help a user and hit a wall.
Most developers write error messages like this:
Error: Invalid parameter
Error: Not found
Error: 403
Those are for log files. An LLM gets these and has almost nothing to work with. It will guess. It will retry the same call. It will hallucinate a workaround. The model needed guidance. You gave it nothing.
Write error messages like you’re talking to a junior developer who just ran into this for the first time:
No customer found with email "sarah@acme.co".
Try searching by name with find_customer(query: "Sarah Chen")
or check if the email domain is different.
Cannot update order #4481: status is "shipped".
Only orders in "pending" or "processing" status can be modified.
Use get_order to check current status first.
Authentication expired. The user needs to re-authenticate
before this tool can access their calendar.
Do not retry this call. Tell the user their session has expired.
Each of these tells the model what went wrong, why, and what to try next. The model is good at recovering from failures when it has information to recover with. Your error messages are the information. I’ve watched the same agent loop five times and give up on a vague error, then self-correct in one try when the error told it what to do.
Same principle applies to success responses. If a tool creates a resource, return the ID and a hint about what the model might want to do next. Every response feeds the next step of a reasoning loop.
Don’t Sweat the Cold Start
I see developers agonize over Lambda cold starts and serverless spin-up times when building MCP servers. Understandable instinct. Wrong target.
Think about what’s happening in the exchange. A user says something to an agent. The agent reasons about it. The agent selects a tool. It formulates the parameters. It makes the call. It reads the result. It reasons about the result. It maybe calls another tool. Then it generates a response.
The model inference on either side of your tool call takes seconds. Often many seconds. The user is already waiting for a thinking, planning, generating loop that dwarfs the latency of a cold Lambda. A 500ms cold start that you spent a week eliminating was invisible inside a 15-second agent turn.
A Lambda cold start is fine. Optimize your server for correctness, clear responses, and sensible tool design. Those directly affect output quality. A 200ms response versus a 700ms response does not. The bottleneck is never your server. It’s the model.
Build Documentation Endpoints
Your MCP server can serve more than tools. It can serve context.
Add a tool called get_help or get_docs that returns usage guides for your other tools. Not the schema. A guide. When to use what. Common patterns. Examples of multi-step workflows. You can reference it in your tool descriptions: “For complex queries, call get_docs(‘search’) first for query syntax.”
Think of it as a README the model can pull at the moment it needs it, rather than having it eat context space on every turn. On-demand context loading. If your tool API has any complexity beyond CRUD, this alone will improve how well models use it.
You can split documentation by topic. get_docs("search") returns query syntax. get_docs("permissions") returns authorization rules. get_docs("workflows") returns common multi-step patterns. Organize it the same way you’d organize reference docs for a human. Break it up by what someone needs at the moment they’re stuck.
Models already seek help when they’re uncertain. Give them somewhere to look.
The Whole Server Is a Prompt
Every design decision in your MCP server is context engineering. Tool names, descriptions, parameter schemas, error messages, response formats. All of it lands in a context window and shapes how the model reasons.
The test is simple. Point an agent at your server with no system prompt guidance and see what happens. If it flounders, the problem is your server, not the model.
The model reads your tool definitions once, reasons about them, and commits to an action. There’s no second pass, no REPL, no docs tab open on the side. What you put in the schema, the descriptions, and the error messages is all it has to work with.
There’s also a surface most servers ignore: MCP Apps let your server render interactive UI directly in the conversation. Same design principles, more surface area.