Agents Are About Correction
When people hear “AI agent,” they picture AI that can do things. Schedule a meeting. Run a bash script. Query a database. Not wrong, but incomplete in a way that sends you down the wrong path.
AI with tools can already act on the world. That’s been true since function calling shipped. You give a model a tool definition and a handler. It reaches outside itself. Useful. Also fragile. It makes one attempt. If the result is wrong, nobody finds out until a human looks.
The thing that makes an agent an agent is the loop. And the point of the loop is not autonomy. It’s correction.
Let Agents Be Wrong
This is the observation that I think gets insufficient weight. Models produce the wrong bash command. They grep for a pattern that doesn’t exist. They look for information in the wrong place. They misparse the user’s request on the first pass. This isn’t a bug. It’s a property. Models are probabilistic. They are confidently, fluently wrong some percentage of the time.
The instinct is to prevent this. Constrain the tools. Narrow the prompts. Build guardrails so tight that the model can’t make a wrong move. I get it. But it’s the wrong optimization. You’re spending your engineering effort trying to make a probabilistic system deterministic. You will lose that fight.
The better move is to let the model be wrong and give it the machinery to recover. In a single-shot tool call, failure is terminal. The model tried. It was wrong. You got bad output. Done. In a loop, that same failure is a data point. The model sees what went wrong and adjusts. Allow the mistakes. They rapidly converge on correct.
What Correction Looks Like
Put this in a customer service context. An agent hears a problem description. Its goal: find the customer’s order and resolve the issue.
Four steps. Two outright failures. The agent didn’t stall on any of them. Each failure was information. The knowledge base miss told the agent this wasn’t a known documentation issue. The phone number miss told it the customer’s contact info was unreliable. The name lookup succeeded and unlocked everything downstream.
A human support rep does exactly this. They try one thing, it doesn’t work, they try another. The skill isn’t knowing the right path on the first attempt. It’s narrowing the problem space with each attempt until the right path becomes obvious. That behavior, applied to an LLM with tool access, is what makes agents formidable.
This isn’t hypothetical. I’ve built this. You give the model a handful of lookup tools, a system prompt that says “resolve the customer’s issue,” and a loop. You do not tell it which tool to call first. You do not write fallback logic. You do not build a state machine. The model figures out the path by trying things, observing what comes back, and adjusting. The failures aren’t a problem to be engineered away. They’re the mechanism by which the agent finds the answer.
The SDK Pattern
Both the Claude Agent SDK1 and the GitHub Copilot SDK2 expose this loop as a first-class primitive. The API shapes differ. The underlying pattern is identical. That convergence is worth paying attention to.
In the Claude Agent SDK, query() returns an async iterator. You stream messages as the model works. Each iteration yields reasoning, a tool call, a tool result, or a conclusion. The SDK runs the tools, feeds results back, and lets the model decide what to do next.
from claude_agent_sdk import query, ClaudeAgentOptions
options = ClaudeAgentOptions(
allowed_tools=["Read", "Grep", "Bash"],
permission_mode="acceptEdits"
)
async for message in query(
prompt="Find all callers of this function and check if any pass None",
options=options
):
print(message)
That async for is the loop. The model calls Grep, gets no results, adjusts the pattern, tries again. It reads a file, notices something unexpected, pivots its approach. The SDK manages tool execution and context. You consume the stream. The model owns the recovery path.
Custom tools use the @tool decorator and get bundled into an in-process MCP server.3 You hand the model a set of capabilities and a goal. It decides which to use and in what order.
from claude_agent_sdk import tool, create_sdk_mcp_server, ClaudeSDKClient, ClaudeAgentOptions
@tool("lookup_customer", "Find customer by name", {"name": str})
async def lookup_customer(args):
return {"content": [{"type": "text", "text": find_customer(args["name"])}]}
@tool("get_orders", "Get order history", {"account_id": str})
async def get_orders(args):
return {"content": [{"type": "text", "text": fetch_orders(args["account_id"])}]}
server = create_sdk_mcp_server("crm", "1.0.0", [lookup_customer, get_orders])
options = ClaudeAgentOptions(
mcp_servers={"crm": server},
allowed_tools=["mcp__crm__lookup_customer", "mcp__crm__get_orders"],
)
async with ClaudeSDKClient(options=options) as client:
await client.query("Find order #4481 for a customer named Sarah Chen")
async for msg in client.receive_response():
handle(msg)
No orchestration code specifying the sequence. No fallback chains. No state machine tracking which lookup to try next. The model owns the recovery path because the loop gives it the information it needs to recover.
The GitHub Copilot SDK does the same thing with an event-driven model. Tools are defined with Pydantic schemas via @define_tool. The agent runs inside a session. You listen for events. SessionIdleData fires when the model has nothing left to do.
from copilot import CopilotClient, define_tool
from copilot.session import PermissionHandler
from copilot.generated.session_events import AssistantMessageData, SessionIdleData
from pydantic import BaseModel, Field
class CustomerQuery(BaseModel):
name: str = Field(description="Customer full name")
@define_tool(description="Look up customer record by name")
async def lookup_customer(params: CustomerQuery) -> str:
return find_customer(params.name)
async with CopilotClient() as client:
async with await client.create_session(
model="claude-sonnet-4.5",
tools=[lookup_customer],
on_permission_request=PermissionHandler.approve_all,
) as session:
session.on(handle_event)
await session.send("Find the order for Sarah Chen")
await done.wait()
Different decorator. Different event model. Different company. Same loop. The model tries a tool, observes the result, adjusts, tries again. The SDK handles execution and context. You define what the model can reach and what it’s trying to accomplish. The Copilot SDK also supports BYOK, so you can point it at Anthropic, OpenAI, or Azure. The harness is interchangeable. The pattern is not.
Two competing SDKs, built by two companies with very different business models, converged on the same architecture. A loop that runs tools, feeds results back to the model, and lets the model decide the next step. That’s not a coincidence. It’s the shape of the solution.
What You’re Actually Building
When you’re building an agent, here’s the mental model that I think produces the best results.
Define the tools, not the workflow. Give the model capabilities. Don’t prescribe the order. A customer service agent gets lookup tools, a knowledge base search, an order history endpoint. It doesn’t get a flowchart. The model discovers the workflow by attempting it.
Define success, not the path. The system prompt should describe what a resolved case looks like. Not which tool to call first. The model is better at tactical decisions than you are at predicting every case it will encounter.
Make failures informative. When a tool returns no results, don’t return an empty string. Return “No customer found with phone number 555-0142.” The model uses that to decide what to try next. The richer the failure message, the faster the convergence.
Trust the loop. This is the hardest part for engineers. We are trained to handle every case. To write the fallback. To anticipate the failure mode and route around it. In an agentic system, that instinct produces brittle code that fights the model instead of leveraging it. The model is good at recovering from failures. Let it.
The Convergence Property
What makes a good human researcher effective isn’t knowing where to look first. It’s that when they look in the wrong place, what they didn’t find narrows down where to look next. Every failed attempt reduces the search space. You wouldn’t evaluate a researcher by whether their first query returned the right result. You’d evaluate them by how fast they got there.
Same standard applies here. An LLM in a loop with the right tools and a clear success condition will get things wrong on individual steps. It will also converge on the correct answer, fast, because each wrong step carries information about where the right step is.
The people building the most effective agents right now aren’t the ones with the most constrained tool sets or the most defensive prompting. They’re the ones who gave their models room to fail and a loop to fail in. The correction is the capability. Everything else is plumbing.
Footnotes
-
Custom tools in the Claude Agent SDK run as in-process MCP servers, meaning they execute in your application’s process with no subprocess or IPC overhead. See the permissions guide for controlling tool approval. ↩