An AI Agent Pattern of the Day: Reflexive Tool-Use with Bounded Retries

Most agent failures in production are not reasoning failures. They are tool failures dressed up as reasoning failures. A search API returns 503, the agent hallucinates a recovery, and the trace looks like the model "got confused." It didn't. The tool layer broke and the agent had no rule for what to do next.

This post documents one pattern that addresses that gap: reflexive tool-use with bounded retries. It's not novel research. It's the load-bearing scaffold underneath most production agent stacks worth shipping, and it's worth writing down because the obvious naive version (a try/except around the tool call) fails in subtle, expensive ways.

The Problem with Naive Tool Calls

Here's the version most people write first:

async def call_tool(name: str, args: dict) -> dict:
    try:
        return await tools[name](**args)
    except Exception as e:
        return {"error": str(e)}

The agent sees {"error": "..."}, decides what to do, and life goes on. Until it doesn't. The failure modes that show up after a few hundred runs:

The agent retries the same call with the same args, sometimes forever, because the model can't tell a transient failure from a permanent one.
Errors get stringified with str(e), which loses status codes, headers, and the request payload. Debugging requires rerunning the whole trace.
A 429 from one provider and a 500 from another get treated identically. The agent learns nothing useful from the error shape.
There is no global circuit breaker. A flaky downstream tool can burn an entire context window of retry attempts before anyone notices.

Compare that to a reflexive pattern: the agent's tool wrapper itself enforces retry policy, classifies errors into a small vocabulary, and escalates to the model only when a human-meaningful decision is needed.

The Pattern

Three components, each doing one thing:

**1. A typed error vocabulary.** The agent only ever sees one of: transient, permanent, rate_limited, validation, unauthorized. Five categories cover roughly 95% of real failures and let the model reason at the right level of abstraction.

2. A wrapper that retries on transient and rate_limited only. Bounded — typically 3 attempts with exponential backoff plus jitter. After exhaustion, the wrapper returns permanent to the agent, which is now a meaningful signal: "I tried, it's not coming back, decide what to do."

3. A structured error envelope. The agent receives a small dict, not a stringified stack trace. The model handles the dict; the full trace goes to logs.

import asyncio, random
from typing import Any, Callable, Awaitable

ERROR_KIND = {
    400: "validation",
    401: "unauthorized",
    403: "unauthorized",
    429: "rate_limited",
    500: "transient",
    502: "transient",
    503: "transient",
    504: "transient",
}

async def reflexive_call(
    fn: Callable[..., Awaitable[Any]],
    *args: Any,
    max_attempts: int = 3,
    base_delay: float = 0.5,
    **kwargs: Any,
) -> dict[str, Any]:
    last_error: dict[str, Any] | None = None
    for attempt in range(1, max_attempts + 1):
        try:
            result = await fn(*args, **kwargs)
            return {"ok": True, "value": result, "attempts": attempt}
        except HTTPError as e:
            kind = ERROR_KIND.get(e.status, "transient")
            last_error = {"ok": False, "kind": kind, "status": e.status, "attempts": attempt}
            if kind in ("validation", "unauthorized"):
                return last_error
            if attempt < max_attempts:
                await asyncio.sleep(base_delay * (2 ** (attempt - 1)) + random.random() * 0.1)
                continue
            return {**last_error, "kind": "permanent"}
    return last_error or {"ok": False, "kind": "permanent", "attempts": 0}

The model never sees the HTTPError. It sees {"ok": False, "kind": "rate_limited", "attempts": 3} and can reason: "this provider is throttling, I should switch to the fallback or summarize what I have."

Why This Beats Letting the Model Decide

Two reasons, both empirical.

Cost. A single retry loop driven by the model costs roughly one full agent turn — system prompt, tool definitions, conversation history, the failed result, the model's reasoning, the new tool call. That's typically 4–8k tokens per retry. The reflexive wrapper retries in microseconds at zero token cost. For a tool that fails transiently 5% of the time across a 20-step agent run, you're saving on the order of 50k tokens per run before any of the smarter optimizations kick in.

Determinism. Models are not great at backoff. Ask Claude or GPT-4 to "wait a bit and try again" and you'll get a tool call back in the next turn with no actual delay — the model has no concept of wall-clock time inside a turn. The wrapper enforces real asyncio.sleep, real jitter, real bounded attempts. The model handles policy ("should we retry at all?"); the wrapper handles mechanism ("how, and how often?").

This is the same separation Erlang/OTP figured out 25 years ago: supervisors handle restart strategy, workers handle business logic. The agent is the worker. The tool wrapper is the supervisor.

The Escape Hatch

Bounded retries imply a contract: after N failures, control returns to the model with a clear permanent signal. The model then chooses one of:

Switch to a fallback tool (e.g. a different search provider)
Continue with degraded data and mark the gap in the response
Stop and report the failure to the user

This is where prompt engineering matters. The system prompt should explicitly instruct the agent on the escape hatch:

When a tool returns {"ok": false, "kind": "permanent"}, do not retry it
in this conversation. Either use the fallback tool listed for that
capability, or proceed with what you have and note the gap.

Without that line, you'll see the model retry the same broken tool a fourth time, because its prior probability of "retry on failure" is high. With it, you get the behavior you want.

When This Pattern Is Wrong

Reflexive retries assume the failure is on the network/service axis. They are the wrong tool for:

Logical errors in the agent's own arguments. If the model called search(query=""), more retries won't help. That's a validation error and it short-circuits in the code above for exactly this reason.
Long-running batch jobs. A 20-minute video render that fails at minute 19 should not be naively retried. Use a job-queue pattern with checkpointing instead.
Tools with side effects. If the tool sends an email or charges a card, retry semantics need explicit idempotency keys at the tool layer, not the wrapper layer.

The pattern's sweet spot is read-heavy, idempotent, network-bound tool calls — search, fetch, lookup, classify. Roughly 80% of what production agents do.

What to Build Next

Once the reflexive wrapper is in place, two extensions pay for themselves quickly:

A circuit breaker per tool. If a tool has returned permanent for the last 5 calls across all sessions in the past minute, short-circuit and return permanent immediately without attempting. Most production HTTP clients ship this; the agent layer rarely does.
Structured retry telemetry. Log {tool, kind, attempts, latency_ms} per call. After a week, you'll know which tools cause 80% of your retry budget and can target them for caching or replacement.

Neither requires the agent or the model to know anything new. Both are pure infrastructure improvements that the model benefits from invisibly.

That's the whole pattern. One wrapper, five error kinds, an explicit escape hatch in the prompt. It's the kind of thing that looks too simple to write a post about until you've shipped an agent without it and watched the trace logs.

References: