A Tool Registry Pattern for pydantic-ai: Type-Safe Args, Typed Results, and Error Propagation

Most agent codebases start with one or two tools registered inline next to the Agent constructor, and within a few weeks the file has fifteen @agent.tool decorators, ad-hoc dict returns, and try/except blocks that swallow exceptions the LLM never sees. The registry pattern below treats tools as a first-class collection: each tool is a class that owns its argument schema (a Pydantic model), its result type, and its error contract. The agent stays thin, tools stay unit-testable, and the LLM gets structured failures instead of stack traces.

Why a registry, and why pydantic-ai specifically

pydantic-ai already does the hard part — it inspects function signatures, derives a JSON schema from Pydantic types, and validates the LLM's tool-call payload before your code runs. What it does not give you is organization. If you let @agent.tool proliferate, you lose three things at once: discoverability (which tools exist?), composability (can I share a tool across two agents?), and observability (which tool failed and why?).

A registry solves all three. The registry is a plain dict keyed by tool name, populated at startup, and consumed by a single helper that calls agent.tool_plain or agent.tool per entry. Compare this to LangChain's @tool decorator approach: LangChain encourages decorators on free functions, which is fine for prototypes but makes cross-agent reuse awkward. pydantic-ai's design is closer to OpenAI's function-calling primitives, so a class-per-tool registry maps cleanly onto how the framework already thinks.

Step 1: Define the tool protocol

Start with a Protocol so tools are duck-typed but type-checked. Each tool exposes a name, a Pydantic args model, a Pydantic result model, and an async run method.

from typing import Protocol, TypeVar, Generic
from pydantic import BaseModel

ArgsT = TypeVar("ArgsT", bound=BaseModel)
ResultT = TypeVar("ResultT", bound=BaseModel)

class Tool(Protocol, Generic[ArgsT, ResultT]):
    name: str
    description: str
    args_model: type[ArgsT]
    result_model: type[ResultT]

    async def run(self, args: ArgsT) -> "ToolResult[ResultT]": ...

The Protocol is structural — any class with these attributes counts. No inheritance required, which keeps tools shareable between agents without a base-class lock-in.

Step 2: Model results as a discriminated union

The single biggest reason agent loops break is that a tool raises, the harness catches it, the LLM never learns what failed, and it retries the same call. Make failure a value, not an exception.

from typing import Literal, Union
from pydantic import BaseModel, Field

class ToolOk(BaseModel, Generic[ResultT]):
    status: Literal["ok"] = "ok"
    value: ResultT

class ToolErr(BaseModel):
    status: Literal["err"] = "err"
    code: str
    message: str
    retriable: bool = False

ToolResult = Union[ToolOk[ResultT], ToolErr]

This is the Rust Result<T, E> shape, expressed in Pydantic. The discriminator field status lets pydantic-ai serialize either branch into the LLM context. When the tool fails, the model sees {"status": "err", "code": "rate_limit", "retriable": true} and can decide whether to back off, fall back to a sibling tool, or surface the failure to the user. That decision belongs to the LLM, not to a try/except two stack frames away.

Step 3: Implement a concrete tool

Here's a fetch-url tool that respects the protocol. Notice that run never raises — every failure path returns a ToolErr.

import httpx
from pydantic import BaseModel, HttpUrl, Field

class FetchUrlArgs(BaseModel):
    url: HttpUrl
    timeout_s: float = Field(default=10.0, ge=0.1, le=30.0)

class FetchUrlResult(BaseModel):
    status_code: int
    body_preview: str = Field(max_length=2000)
    content_type: str

class FetchUrlTool:
    name = "fetch_url"
    description = "Fetch a URL and return the first 2KB of the body."
    args_model = FetchUrlArgs
    result_model = FetchUrlResult

    async def run(self, args: FetchUrlArgs) -> ToolResult[FetchUrlResult]:
        try:
            async with httpx.AsyncClient(timeout=args.timeout_s) as client:
                resp = await client.get(str(args.url))
        except httpx.TimeoutException:
            return ToolErr(code="timeout", message=f"GET {args.url} timed out after {args.timeout_s}s", retriable=True)
        except httpx.HTTPError as exc:
            return ToolErr(code="transport", message=str(exc), retriable=False)

        return ToolOk(value=FetchUrlResult(
            status_code=resp.status_code,
            body_preview=resp.text[:2000],
            content_type=resp.headers.get("content-type", ""),
        ))

Two things matter here. First, the args_model does double duty — pydantic-ai validates the LLM's JSON against FetchUrlArgs before run is called, so by the time you have an args object the URL is parsed and the timeout is in range. Second, the exception handling is flat: one try, two except, no nesting. Anything else gets refactored into a helper.

Step 4: Build the registry and bind to the agent

from pydantic_ai import Agent, RunContext

TOOLS: dict[str, Tool] = {}

def register(tool: Tool) -> None:
    if tool.name in TOOLS:
        raise ValueError(f"duplicate tool name: {tool.name}")
    TOOLS[tool.name] = tool

def bind_tools(agent: Agent) -> None:
    for tool in TOOLS.values():
        # Closure captures `tool` by reference; copy into default arg.
        async def _runner(ctx: RunContext, args: BaseModel, _t=tool) -> dict:
            result = await _t.run(args)
            return result.model_dump()

        agent.tool_plain(
            name=tool.name,
            description=tool.description,
            args_type=tool.args_model,
        )(_runner)

register(FetchUrlTool())
agent = Agent("anthropic:claude-sonnet-4-6", system_prompt="...")
bind_tools(agent)

The bind_tools helper is the one place that touches the framework. Every tool flows through it. If you later swap to a different agent framework (LangGraph, OpenAI's Assistants API), bind_tools is the only file that changes — the tool classes themselves are framework-agnostic.

Tool registry vs. inline decorators: when each wins

Inline @agent.tool is the right choice for a prototype with three tools and one agent. The registry pays off when you cross any of these thresholds: more than five tools, more than one agent sharing tools, or a need for runtime tool discovery (e.g. the LLM asks "what tools are available for HTTP?"). At that point the inline approach starts producing 80% boilerplate and 20% logic; the registry inverts that ratio.

Concretely, in a recent refactor of an 11-tool agent, moving from inline decorators to a registry cut the agent setup file from ~340 lines to ~90 lines, and the per-tool unit tests dropped from "spin up an Agent fixture" to "instantiate the tool class and call await tool.run(args)" — about 3× faster to write and run.

Error propagation: the rule that keeps the loop alive

The single rule that makes this pattern work in production: a tool's run method must not raise. Every failure becomes a ToolErr. The agent loop sees a structured error in its context, the LLM reasons about it, and either retries (when retriable=True), falls back, or asks the user. Compare that to the alternative — a raised exception caught by the framework's harness — where the LLM either sees nothing (silent failure, infinite retry) or sees a generic "tool errored" message with no actionable code.

You can enforce the no-raise rule with a thin wrapper around run in bind_tools:

async def _safe_run(t: Tool, args: BaseModel) -> dict:
    try:
        return (await t.run(args)).model_dump()
    except Exception as exc:
        return ToolErr(
            code="unhandled",
            message=f"{type(exc).__name__}: {exc}",
            retriable=False,
        ).model_dump()

Treat a hit on the unhandled branch as a bug to fix, not a feature to rely on. In well-typed tools, that branch should never fire.

Testing tools without the agent

Because tools are plain classes, tests are plain pytest functions. No Agent fixture, no LLM mocking, no recorded transcripts.

import pytest

@pytest.mark.asyncio
async def test_fetch_url_timeout(httpx_mock):
    httpx_mock.add_exception(httpx.TimeoutException("slow"))
    tool = FetchUrlTool()
    result = await tool.run(FetchUrlArgs(url="https://example.com"))
    assert result.status == "err"
    assert result.code == "timeout"
    assert result.retriable is True

That test runs in milliseconds and covers the exact contract the LLM relies on. Any agent built on this registry inherits the same testability for free.

References: