MCP Apps: When Your Server Needs a Face

I wrote about designing MCP servers that models actually use well a couple weeks ago. The entire premise was that your server is a developer experience product and the developer is an LLM. Name tools clearly. Write errors that guide recovery. Minimize the surface area. The whole server is a prompt.

MCP Apps flip the audience. Your server can talk directly to the user.

The MCP Apps extension (SEP-1865, stable since January 2026) lets an MCP server declare interactive UI components that render inside the conversation. Not screenshots. Not markdown tables. Actual HTML applications running in sandboxed iframes, with bidirectional communication back to your server. Claude, ChatGPT, VS Code Copilot, and others already support it.

The spec has been stable for months. Almost nobody is using it. Most MCP servers still return text and let the model figure out how to present it. That’s fine for most tools. But there’s an entire class of interaction where it’s actively bad, and the solution has been sitting there.

How It Works

The mechanism is straightforward. Two primitives, one protocol.

Primitive one: your tool declares a UI resource. When you define a tool, you add _meta.ui.resourceUri pointing to a ui:// URI. This tells the host that this tool has a visual component.

Primitive two: you serve HTML at that URI. The host fetches it via resources/read, gets back text/html;profile=mcp-app content, and renders it in a sandboxed iframe.

Communication between the iframe and the host uses JSON-RPC 2.0 over postMessage. The app can call tools on your server. The host can push data updates to the app. The user interacts with the UI directly. The model stays informed through ui/update-model-context, which feeds user actions back into the conversation context.

The security model is what you’d expect from iframes: sandboxed, no parent DOM access, no cookies, no storage. CSP rules are declared in your tool metadata so the host can review and enforce them before rendering anything. All messages are auditable JSON-RPC. No escape hatches.

Three display modes: inline (embedded in the conversation flow), fullscreen, and picture-in-picture. The app declares which it supports. The host decides what it actually offers.

The Display Problem

Here is why this exists. Not the spec motivation. The real one.

Models are bad at being dashboards.

When a tool returns structured data, the model serializes it into prose or a markdown table. Fine for five rows. Unusable for fifty. And every time the user asks a follow-up (“sort by date,” “show me just the failures,” “what about Q3”), the model re-fetches, re-serializes, and re-renders the entire state. No filters. No drill-down. No persistence. Each turn is a from-scratch reconstruction of something that should be interactive.

Maps, charts, multi-step configuration forms. Anything where the user needs to explore rather than read. These are visual, stateful interactions that text handles poorly. The model becomes a bottleneck between the user and the data. Every interaction costs a round trip through inference.

MCP Apps remove the bottleneck. The server renders the interface. The user interacts directly. The model observes but doesn’t mediate. A dashboard stays a dashboard across turns instead of being regenerated from scratch each time.

When Not To Build One

The same instinct from minimizing the tool surface applies here. Every MCP app you build is complexity you maintain. It’s HTML, JavaScript, and CSS bundled into your server. It’s a second interface to test. It’s a visual design problem added to an API design problem.

If the model can describe the result in a paragraph, a widget is overhead. If your tool returns a status and a message, that’s text. Let the model present it. It’s good at that.

Build an app when:

Users need to explore data. Filtering, sorting, drilling into detail. The model re-describing a table on every turn is a symptom you need direct manipulation.
The interaction is spatial. Maps. Diagrams. Anything where position carries meaning that prose destroys.
Configuration has dependencies. A form where selecting option A changes what options B and C show. The model can walk through this, but it takes three turns and the user hates it.
State persists across turns. If the user is building something up over multiple interactions, a persistent UI holds that state without the model reconstructing it.

Don’t build an app when text works. Most tools return text results. Most of those are fine as text. The bar is: would a human open a dedicated UI for this, or would they just read the answer?

The Model Stays in the Loop

If you skip this, you’ve built a side channel.

When a user clicks something in your app, the model doesn’t know. It can’t see the iframe. It doesn’t get DOM events. As far as the conversation context is concerned, nothing happened.

ui/update-model-context fixes this. Your app calls it to push structured data back into the conversation. “User selected items A, B, and C.” “User set the date range to Q1 2026.” “User approved 3 of 5 line items.” Now the model knows what happened and can reason about it in subsequent turns.

Skip this and you’ve built a blind spot. The user does something in your UI, asks the model about it, and the model has no idea what they’re looking at. Context engineering means the model has the full picture. If your app changes the picture, tell the model.

The inverse also matters. When the model calls a tool that has a UI component, the host sends ui/notifications/tool-result to the app. The model acts, the UI updates. Bidirectional. Both sides stay current.

Graceful Degradation

Not every client supports MCP Apps. Not every client that supports them today will support them the same way tomorrow. The spec handles this through capability negotiation during initialize:

{
  "extensions": {
    "io.modelcontextprotocol/ui": {
      "mimeTypes": ["text/html;profile=mcp-app"]
    }
  }
}

If the client doesn’t advertise this, your server must not register UI-enabled tools. Or rather, it must register the tools without the UI component. Text-only fallback.

This means you design the text response first. The text version is not a degraded experience. It is the primary experience. The app is an enhancement for clients that support it. If your tool only works as a widget, you’ve locked out every terminal-based client, every client that hasn’t shipped the extension yet, and every context where iframes aren’t appropriate.

Build the text path. Make it good. Then add the app for clients that can render it.

The UI Is Part of the Prompt Surface

The argument from the MCP server article was that every design decision in your server is context engineering. Tool names shape reasoning. Error messages guide recovery. Documentation endpoints provide on-demand context.

MCP Apps have extended that surface since January. Your UI communicates to users. Your update-model-context calls communicate to the model. Both are part of the prompt. The total information environment that determines what happens next.

The capability is there. The adoption isn’t. Most MCP servers are still text-only, and most of them should be. But the ones that shouldn’t be are paying a real cost in user experience every turn they force the model to serialize a dashboard into prose.