aiagent.
aiagent7 min read

Isolate the Anthropic SDK Behind One Adapter: A Pattern for Agent Codebases

Wrap Anthropic Sonnet calls in a single adapter module to get testability, prompt versioning, retry policy, and cost tracking without scattering SDK calls across your agent codebase.

Isolate the Anthropic SDK Behind One Adapter

Agent codebases rot fastest at the LLM seam. A from anthropic import Anthropic in three modules turns into seven, then twelve. Each callsite invents its own retry logic, its own system prompt assembly, its own model string. When Sonnet 4.6 ships and you want to A/B against 4.5, you grep for claude-sonnet and find nine places to edit. When a test needs to run offline, you monkey-patch anthropic.Anthropic and pray the mock matches the real shape.

The fix is boring and well-understood: put every Anthropic call behind one adapter module. The non-obvious part is what belongs inside the adapter versus what stays at the callsite. Get that boundary wrong and you either build a god-object that leaks prompt logic, or a thin pass-through that solves nothing.

What the adapter owns

A working adapter for an agent codebase owns five things:

  1. SDK instantiation \u2014 exactly one Anthropic() client per process, configured from env once.
  2. Model selection \u2014 model IDs live as named constants, not string literals at callsites.
  3. Retry + timeout policy \u2014 exponential backoff on RateLimitError and APIConnectionError, hard cap on total wall time.
  4. Token + cost accounting \u2014 every call records input/output tokens and dollar cost to a metrics sink.
  5. Prompt versioning hook \u2014 system prompts are looked up by ID, not pasted inline at the callsite.

What it does NOT own: the actual prompt content, the business logic that decides which prompt to use, or the parsing of tool calls. Those stay at the use-case layer. The adapter is a transport with policy, not a planner.

Minimum viable adapter

Here is the shape that has held up across several production agent codebases. Single file, ~120 lines, no abstractions you don't use.

from __future__ import annotations

import logging
import time
from dataclasses import dataclass
from typing import Any, Literal

from anthropic import Anthropic, APIConnectionError, RateLimitError
from anthropic.types import Message

logger = logging.getLogger(__name__)

# Model IDs as constants \u2014 never literal strings at callsites
SONNET_4_6 = "claude-sonnet-4-6"
HAIKU_4_5 = "claude-haiku-4-5-20251001"
OPUS_4_7 = "claude-opus-4-7"

# Pricing per million tokens (USD) \u2014 keep in sync with Anthropic pricing page
PRICING = {
    SONNET_4_6: {"input": 3.00, "output": 15.00},
    HAIKU_4_5: {"input": 0.80, "output": 4.00},
    OPUS_4_7: {"input": 15.00, "output": 75.00},
}


@dataclass(frozen=True)
class LlmResult:
    text: str
    model: str
    input_tokens: int
    output_tokens: int
    cost_usd: float
    prompt_id: str
    latency_ms: int


class AnthropicAdapter:
    def __init__(self, client: Anthropic | None = None) -> None:
        self._client = client or Anthropic()

    def complete(
        self,
        *,
        prompt_id: str,
        system: str,
        user: str,
        model: str = SONNET_4_6,
        max_tokens: int = 2048,
        max_retries: int = 3,
    ) -> LlmResult:
        start = time.monotonic()
        msg = self._call_with_retry(
            system=system, user=user, model=model,
            max_tokens=max_tokens, max_retries=max_retries,
        )
        elapsed_ms = int((time.monotonic() - start) * 1000)
        return self._build_result(msg, model, prompt_id, elapsed_ms)

    def _call_with_retry(
        self, *, system: str, user: str, model: str,
        max_tokens: int, max_retries: int,
    ) -> Message:
        delay = 1.0
        for attempt in range(max_retries):
            try:
                return self._client.messages.create(
                    model=model,
                    max_tokens=max_tokens,
                    system=system,
                    messages=[{"role": "user", "content": user}],
                )
            except (RateLimitError, APIConnectionError) as exc:
                if attempt == max_retries - 1:
                    raise
                logger.warning("anthropic retry %d/%d: %s", attempt + 1, max_retries, exc)
                time.sleep(delay)
                delay *= 2
        raise RuntimeError("unreachable")

    def _build_result(
        self, msg: Message, model: str, prompt_id: str, elapsed_ms: int,
    ) -> LlmResult:
        text = "".join(b.text for b in msg.content if b.type == "text")
        cost = (
            msg.usage.input_tokens * PRICING[model]["input"]
            + msg.usage.output_tokens * PRICING[model]["output"]
        ) / 1_000_000
        return LlmResult(
            text=text, model=model,
            input_tokens=msg.usage.input_tokens,
            output_tokens=msg.usage.output_tokens,
            cost_usd=cost, prompt_id=prompt_id, latency_ms=elapsed_ms,
        )

Notice the helper split. complete is a flat dispatch; _call_with_retry handles transport failures; _build_result normalizes the response. Each method has one job. No nested try blocks, no nested if chains. That matters less for line count and more for the diff: when you change retry policy, you touch one method.

Why prompt_id is a parameter

Every call carries a prompt_id. This is the lever that unlocks the rest of the adapter's value. With it, you can:

  • Log structured rows: prompt_id=plan.decompose model=sonnet-4-6 cost=$0.0034 latency=820ms
  • Aggregate cost-by-prompt across a day: cheaper to debug than cost-by-endpoint
  • A/B prompts behind a flag: route 10% of plan.decompose calls to plan.decompose.v2
  • Catch regressions: if plan.decompose p95 latency jumps from 800ms to 2200ms after a prompt edit, the metrics show it inside an hour

Without prompt_id, your token logs become an undifferentiated stream and you lose the ability to attribute cost or quality to specific prompts.

Testability \u2014 the actual reason this is worth doing

Mocking the Anthropic SDK directly is fragile. The SDK's response shapes change between versions, the streaming API has subtly different types from the non-streaming one, and any test that patches anthropic.Anthropic ends up coupled to internal SDK structure.

The adapter solves this by giving you a single seam to fake:

from unittest.mock import MagicMock

class FakeAdapter:
    def __init__(self, response_text: str) -> None:
        self._response = response_text

    def complete(self, *, prompt_id: str, **kwargs: Any) -> LlmResult:
        return LlmResult(
            text=self._response, model="fake",
            input_tokens=10, output_tokens=20, cost_usd=0.0,
            prompt_id=prompt_id, latency_ms=5,
        )

def test_planner_handles_empty_plan():
    adapter = FakeAdapter(response_text="[]")
    planner = Planner(llm=adapter)
    assert planner.decompose("noop task") == []

Use cases inject the adapter (or a fake). Tests run offline, take milliseconds, and don't break when the SDK ships a minor version bump.

Adapter vs framework: when to skip LangChain

A reasonable question: why not use LangChain or LlamaIndex, which already wrap the LLM call? For a focused agent codebase where you understand the prompt flow and want deterministic behavior, hand-rolled adapters win on three axes:

  • Surface area: ~120 lines of your code over ~50k lines of framework code you don't control
  • Debuggability: when a call fails, the stack trace points at five frames, not fifty
  • Cost visibility: your accounting integrates directly with your metrics backend, no scraping framework callbacks

Frameworks make sense when you genuinely need their composability \u2014 agents-of-agents, complex retrieval graphs, swappable vector stores. For a codebase that calls Sonnet from four or five use cases, the adapter pattern is roughly 80% of the framework value at 5% of the dependency cost.

Migration recipe for an existing codebase

If your codebase already has Anthropic calls scattered around:

  1. Grep for Anthropic( and messages.create( \u2014 you now have a list of every callsite.
  2. Build the adapter module first; do not modify callsites yet.
  3. Pick the smallest callsite. Replace its body with adapter.complete(prompt_id=..., system=..., user=...). Run its tests.
  4. Repeat for each callsite. Each migration is one PR, isolated.
  5. Once all callsites use the adapter, delete the direct from anthropic import Anthropic lines outside the adapter module. Add a lint rule (ruff custom check or a CI grep) that fails on direct imports anywhere else.

The lint rule is what makes this stick. Without it, the next contributor adds a sixth direct import and you lose the invariant.

What doesn't belong in the adapter

Resist these temptations:

  • Prompt templating \u2014 Jinja, f-strings, prompt composition belong at the use-case layer. The adapter receives finished system + user strings.
  • Tool-call parsing \u2014 if you're using tool use, parse the response at the use case. The adapter returns text and usage; what the text means is domain logic.
  • Caching \u2014 prompt caching is a transport-level concern for Anthropic's API (the cache_control field), and that does belong in the adapter. But application-level result caching ("we already asked this question yesterday") belongs at the use case where you know the cache key semantics.
  • Streaming \u2014 if you need streaming, add a second method stream(...) that returns an iterator of deltas. Don't try to make complete async-generator-shaped.

The payoff

Six months in, a codebase with one adapter and forty use cases will have: one place to edit when SDK signatures change, one place to flip when a new model ships, one place to add structured logging, one place to enforce a per-tenant budget cap. A codebase without the adapter will have those same concerns scattered across forty files, and each migration takes a Friday afternoon instead of fifteen minutes.

The adapter pattern isn't novel. The reason to write it down for agent codebases is that the temptation to call the SDK directly is high \u2014 the SDK is genuinely pleasant to use \u2014 and that pleasantness is exactly what makes the scatter happen.

References: