aiagent.
aiagent57 min read

Step 1: Scaffold a minimal MCP server with SSE transport and one slow tool

What we're doing this step

The whole point of this series is to reproduce — and then fix — a specific failure mode: an MCP client wired to SSE transport that hangs (or disconnects with a confusing error) when a tool takes longer to return than the underlying socket's read budget. The Model Context Protocol is a JSON-RPC dialect that AI agents use to talk to tool servers; the SSE (Server-Sent Events) transport carries those messages over a long-lived HTTP stream so the client can keep receiving updates without polling. That model is elegant — until a tool takes longer to execute than the client's socket read timeout, at which point the connection silently dies and the tool call appears to wedge forever from the agent's perspective.

To investigate that, we first need a server with a tool that takes a controllable amount of time to respond. So step 1 is intentionally boring: we build the smallest possible MCP server, register exactly one tool called slow_echo that sleeps for delay_seconds and then returns the input message, and wire the SSE transport behind a uvicorn-style ASGI app. No retries, no heartbeats, no cancellation — just enough surface area to provoke the hang in step 2. The smaller this scaffold is, the more confident we can be that any later hang is caused by the transport rather than by something we added.

Setup

We use the official Python MCP SDK (mcp[cli]) so we get FastMCP's declarative tool registration plus a ready-made SSE app. FastMCP is a thin layer over the lower-level mcp.server module that lets you register tools as regular Python coroutines and have the SDK derive the JSON schema, generate the protocol-level tool listing, and wire up the ASGI endpoints automatically. The project layout is the standard src/ layout with pyproject.toml, a tests/ folder, and uv as the dependency manager. Two top-level files matter:

  • codebase/pyproject.toml — pins mcp[cli]>=1.2.0 and uvicorn>=0.30.0 as runtime dependencies, plus pytest + pytest-asyncio as dev extras. It also declares the package as mcp_slow_server under the src/ layout and exposes a mcp-slow-server console script that points at mcp_slow_server.server:main.
  • codebase/src/mcp_slow_server/server.py — the single module that defines the slow_echo coroutine, a build_server() factory, and a CLI entrypoint that boots SSE on 127.0.0.1:8765 by default.

The package is initialised via uv venv plus uv pip install -e .[dev], which gives us an editable install along with the test dependencies. No global state, no plugins, no extra middleware. That matters later: when the hang shows up, we want zero doubt that some custom timeout handler is masking the real failure mode. We also keep the src/ layout rather than a flat mcp_slow_server/ directory next to tests/, because the flat layout occasionally lets Python import the source from the working directory instead of the installed wheel — which would hide packaging mistakes that real users would hit.

Two configuration knobs in pyproject.toml are worth pointing out: asyncio_mode = "auto" lets us drop the @pytest.mark.asyncio decorator from every coroutine test (since every interesting MCP call is async, the decorator soup would dominate the test file), and testpaths = ["tests"] keeps pytest from accidentally collecting example scripts we may add later under examples/.

Implementation

The core of the module is one async function and one factory. Start with the tool itself:

DEFAULT_DELAY_SECONDS = 5.0
DEFAULT_HOST = "127.0.0.1"
DEFAULT_PORT = 8765


async def slow_echo(
    message: str,
    delay_seconds: float = DEFAULT_DELAY_SECONDS,
) -> str:
    """Return ``message`` after sleeping for ``delay_seconds``."""
    if delay_seconds < 0:
        raise ValueError("delay_seconds must be non-negative")
    await asyncio.sleep(delay_seconds)
    return message

The function is deliberately small. The signature is annotated so FastMCP can derive the JSON schema for the tool's input arguments without us hand-writing one — the parameter names message and delay_seconds are therefore part of the public protocol contract a client will see when it calls list_tools. We use asyncio.sleep (not time.sleep) because FastMCP's tool dispatcher runs inside the same event loop as the transport — blocking the loop would mask the timeout bug we are chasing by also blocking the SSE keep-alive that the transport relies on to detect dead clients. asyncio.sleep yields cleanly, which is the realistic shape of a slow tool in production: think "waiting on an external HTTP API" rather than "doing CPU work".

We reject negative delays early so the tool never silently returns instantly on a bad payload; that's the kind of subtle correctness issue that makes debugging the timeout much harder later. Imagine spending an afternoon chasing why your reproducer "sometimes" hangs and "sometimes" returns immediately, only to realise a stray -1 slipped through the JSON. A ValueError at the door costs us nothing and saves that debugging hour up front.

Next, the factory that wires the tool into a FastMCP instance:

def build_server(name: str = "slow-server") -> FastMCP:
    """Construct a ``FastMCP`` server with the ``slow_echo`` tool registered."""
    server = FastMCP(name)
    server.add_tool(
        slow_echo,
        name="slow_echo",
        description="Echo the given message after sleeping delay_seconds.",
    )
    return server

Keeping construction in a factory (rather than at module import time) is what lets the tests instantiate fresh servers with different names and inspect them without booting the SSE transport. This separation between building the server and running it shows up again in step 3 when we need to invoke tools in-process to confirm the bug lives in transport-level reads, not in the tool itself. It also keeps the module side-effect-free, which matters because pytest collects test modules by importing them — if FastMCP were constructed at import time, a misconfigured environment variable could blow up the whole test session before any test ran.

Finally, the CLI entrypoint:

def main(argv: list[str] | None = None) -> None:
    args = _parse_args(argv)
    server = build_server()
    server.settings.host = args.host
    server.settings.port = args.port
    server.run(transport="sse")

server.run(transport="sse") is the FastMCP one-liner that mounts the Server-Sent Events endpoint at /sse, exposes a companion POST endpoint for the client to send JSON-RPC requests through, and starts a uvicorn worker on the configured host and port. We expose --host and --port (with env-variable fallbacks MCP_HOST / MCP_PORT) so the reproducer scripts in later steps can pin the bind address without editing the source. The default 127.0.0.1:8765 is arbitrary but stable — every subsequent step hardcodes it.

One detail to watch: FastMCP keeps host/port on a settings object rather than as constructor arguments, so we mutate server.settings.host and server.settings.port before calling run(). Trying to pass them as kwargs to FastMCP(...) silently does nothing, which is exactly the kind of papercut that wastes 20 minutes the first time you hit it.

Test it

Run the test suite from the codebase/ directory:

pytest

Expected output:

============================= test session starts ==============================
platform darwin -- Python 3.12.5, pytest-9.0.3, pluggy-1.6.0
rootdir: codebase
configfile: pyproject.toml
testpaths: tests
plugins: asyncio-1.4.0, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 8 items

tests/test_server.py ........                                            [100%]

============================== 8 passed in 1.22s ===============================

The eight tests cover the four invariants we care about for step 1: the slow_echo coroutine returns its input, actually sleeps for the requested duration, rejects negative delays, and is reachable through FastMCP's tool API under the name slow_echo. Two more assertions confirm the host/port constants are sensible and that server.sse_app() returns a callable ASGI application — that last one is the smoke test that the SSE transport is wired up at all.

To prove the transport itself boots, start the server in another terminal:

uv run mcp-slow-server --host 127.0.0.1 --port 8765

Uvicorn announces itself, the SSE endpoint goes live at http://127.0.0.1:8765/sse, and the process blocks waiting for connections. Hit Ctrl+C to stop it — there is no client yet, and that is fine. We will write the client in step 2 and only then call into the slow tool over the wire.

What we got

We now have a runnable MCP server with one tool whose latency we can dial up by a single argument, plus a passing test suite that pins the behaviour. The scaffold has no retry policy, no cancellation handling, and no heartbeat — which is exactly what we want, because the read timeout we will trigger in step 2 needs a server that cannot mask the failure with any clever recovery. The surface area is small enough that when the hang reproduces in step 2 we will know the bug lives in the transport layer, not in our tool code. From here, step 2 will write a small SSE client that connects to this server, calls slow_echo with a delay longer than its socket read timeout, and lets us watch the hang happen on the wire.

Repository

The companion code for this article: https://github.com/vytharion/mcp-sse-read-timeout-tool-hang

The state of the code after this step: d840d10

Key commits to step through:

  • d840d10 — step 1: scaffold the minimal FastMCP SSE server with one slow tool

What we're doing this step

Step 1 left us with a server we can dial up to be arbitrarily slow but without anything on the other side of the wire. Step 2 closes that loop: we add a tiny MCP client that connects to the SSE endpoint, runs the JSON-RPC initialize handshake, and invokes the slow_echo tool with a caller-supplied delay_seconds plus a caller-supplied sse_read_timeout. With those two knobs we can dial the tool's runtime above the client's read budget and observe the exact failure mode this article exists to explain — the SSE transport tears down on the read side while a tool call is in flight, leaving the client either raising a transport-level exception or appearing to wedge on call_tool forever. That ambiguity ("sometimes a clean timeout, sometimes a hang") is itself part of the bug, and pinning it in a test now is what gives later steps a stable target to fix without us second-guessing whether the fix worked.

Setup

Four new files land in codebase/:

  • src/mcp_slow_server/client.py — the call_slow_echo and list_remote_tools coroutines that wrap mcp.client.sse.sse_client and mcp.ClientSession.
  • src/mcp_slow_server/__main__.py — a thin shim so python -m mcp_slow_server boots the server without the RuntimeWarning you get when __init__ already pulls in submodules.
  • tests/conftest.py — a session-scoped fixture that boots the FastMCP SSE app inside a daemon-thread uvicorn.Server, picks a free port, and yields a http://127.0.0.1:<port>/sse URL to tests.
  • tests/test_client.py — five new tests covering the happy path, the latency observation, the deliberate hang, and a follow-up "fresh client still works after a doomed one" check.

The mcp SDK is already pinned from step 1, so no pyproject.toml churn is needed. The only new runtime concept is the SSE read budget: sse_client accepts a sse_read_timeout parameter that httpx uses to decide how long to wait for the next event on the SSE stream before aborting the read. That is the exact knob the bug rides on, so we expose it as a parameter on call_slow_echo rather than hardcoding it.

We also keep tests/conftest.py deliberately heavyweight: it spins a real uvicorn worker on a background thread instead of exercising the tool in-process through server.call_tool. The in-process path is faster and simpler, but it skips the SSE transport entirely — and the SSE transport is where the bug lives. A test that "passes" by skipping the broken code path is worse than no test.

Implementation

The client wrapper is a single async function whose only job is to open the SSE connection, run the handshake, fire one tool call, and close the connection. Splitting it any finer would make the bug harder to see, not easier — readers should be able to point at one place and say "that's the await that hangs".

async def call_slow_echo(
    sse_url: str,
    message: str,
    delay_seconds: float,
    *,
    connect_timeout: float = DEFAULT_CONNECT_TIMEOUT,
    sse_read_timeout: float = DEFAULT_SSE_READ_TIMEOUT,
) -> SlowEchoResult:
    async with sse_client(
        sse_url,
        timeout=connect_timeout,
        sse_read_timeout=sse_read_timeout,
    ) as (read_stream, write_stream):
        async with ClientSession(read_stream, write_stream) as session:
            await session.initialize()
            call_result = await session.call_tool(
                "slow_echo",
                {"message": message, "delay_seconds": delay_seconds},
            )
            return SlowEchoResult(
                text=_extract_text(call_result.content),
                is_error=bool(call_result.isError),
            )

A few design choices are worth calling out. connect_timeout and sse_read_timeout are split into two separate parameters because they map onto two distinct httpx timeout phases: connect-and-write versus read-the-next-SSE-event. Folding them into one number would force the caller to widen both budgets when they only care about one, which is exactly the kind of papercut that hides the bug behind ambient slack. The return type is a frozen dataclass rather than a tuple so the test assertions read as result.is_error is False rather than result[1] is False — small thing, but it pays off the third time you stare at a failed test output.

The _extract_text helper exists because call_tool returns a list of TextContent (and possibly other) items. Concatenating only the text attributes keeps the assertion side "hello" in result.text instead of digging into .content[0].text every call site.

The interesting test is the deliberate hang reproducer:

async def test_short_sse_read_timeout_kills_a_slow_call(
    sse_server_url: str,
) -> None:
    tool_delay = 3.0
    short_read_budget = 0.4
    observation_budget = short_read_budget + 2.5

    start = time.perf_counter()
    with pytest.raises(BaseException) as exc_info:
        await asyncio.wait_for(
            call_slow_echo(
                sse_server_url,
                message="will-not-arrive",
                delay_seconds=tool_delay,
                sse_read_timeout=short_read_budget,
            ),
            timeout=observation_budget,
        )
    elapsed = time.perf_counter() - start

    assert elapsed < tool_delay + 1.0
    assert exc_info.value is not None

Three numbers do the work. tool_delay = 3.0 is the time the server sleeps inside slow_echo. short_read_budget = 0.4 is the SSE read deadline the client gives httpx. Because the read budget is shorter than the tool delay by a comfortable margin, httpx will tear down the read stream while the tool is still sleeping. observation_budget is the outermost asyncio.wait_for cap; it bounds how long pytest is allowed to watch the hang. If the bug ever mutates from "noisy exception" to "indefinite block", we still get a deterministic test failure rather than a stuck CI worker.

The BaseException catch is intentional. Depending on which task wins the race inside anyio's task group, the propagated exception can be httpx.ReadTimeout, anyio.EndOfStream, anyio.ClosedResourceError, or an asyncio.TimeoutError if the outer wait_for fires first. The point of this test is not to declare a winner. The point is to pin the failure: under the configured budgets, the call must fail in strictly less than tool_delay + 1.0 seconds with some exception, not return the wrong value silently. Future steps tighten this — once we know how we want timeouts to surface, we can narrow the assertion to the specific exception class we picked. For now, "pin that it breaks" is the goal.

The companion test test_short_sse_read_timeout_does_not_corrupt_subsequent_call covers the worry that one wedged SSE stream might poison the server process for everyone else. We deliberately blow up the first call, then open a brand new sse_client session with a generous budget and confirm the second call still works. If this test ever flips, the bug is bigger than "one client's read budget" — it would mean a wedged client leaks state into the server, which would change which layer the fix has to live in.

The uvicorn fixture is the other non-trivial piece. We pick a free port up front so parallel pytest-xdist runs do not collide, build the FastMCP app, hand its ASGI callable to a uvicorn.Server, and spin a daemon thread that calls server.run(). We poll server.started for up to five seconds so tests do not race the boot-up, and on teardown we set should_exit = True and join the thread. The fixture is per-test rather than session-scoped because the doomed-call test deliberately leaves a half-dead SSE stream behind, and we want every test to start from a clean server.

Test it

Run the suite from the codebase/ directory:

pytest

Expected output:

============================= test session starts ==============================
platform darwin -- Python 3.12.5, pytest-9.0.3, pluggy-1.6.0
rootdir: codebase
configfile: pyproject.toml
testpaths: tests
plugins: asyncio-1.4.0, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 13 items

tests/test_client.py .....                                               [ 38%]
tests/test_server.py ........                                            [100%]

============================== 13 passed in 3.52s ==============================

Thirteen tests now: eight from step 1 plus five new client tests. The two failure-shape tests (test_short_sse_read_timeout_kills_a_slow_call and test_short_sse_read_timeout_does_not_corrupt_subsequent_call) are the ones that pin the bug. Both pass because the bug fires — pytest.raises(BaseException) succeeds when the SSE read deadline expires and tears the stream down. If a future change ever made call_slow_echo silently return after tool_delay seconds despite the short read budget, those two tests would fail and tell us the failure mode shifted.

You can also drive it interactively. Start the server in one terminal:

uv run mcp-slow-server --host 127.0.0.1 --port 8765

And in a Python REPL in another:

import asyncio
from mcp_slow_server.client import call_slow_echo

asyncio.run(call_slow_echo(
    "http://127.0.0.1:8765/sse",
    message="hello",
    delay_seconds=3.0,
    sse_read_timeout=0.5,
))

That call will not return cleanly. Either you get an ExceptionGroup out of anyio with httpx.ReadTimeout inside it, or the call appears to sit forever waiting on session.call_tool. Both are the bug.

What we got

We now have a client that can talk to the step 1 server, plus a pytest suite that pins both the happy path and the read-timeout failure shape. The reproducer is deterministic enough to land in CI — the outer asyncio.wait_for caps wall-clock budget at well under ten seconds, so the suite never wedges even if the bug morphs from "raises quickly" into "hangs forever". The five new tests give us the regression net we will need in step 3 when we introduce the first real fix attempt: any change that makes the slow call silently return the wrong value, or that leaks a wedged stream into the next session, will flip a test red. That is the foundation the rest of the article builds on.

Repository

The companion code for this article: https://github.com/vytharion/mcp-sse-read-timeout-tool-hang

The state of the code after this step: 1296ceb

Key commits to step through:

  • d840d10 — step 1: scaffold the minimal FastMCP SSE server with one slow tool
  • 1296ceb — step 2: wire up an MCP client and reproduce the read-timeout hang on tool call

What we're doing this step

Step 2 left us with a reproducer that fails — sometimes loudly, sometimes by appearing to wedge — but it does not yet tell us where the failure happens. The call_slow_echo coroutine awaits four distinct things in sequence: opening the SSE transport, opening a ClientSession, running the JSON-RPC initialize handshake, and finally call_tool. Any one of those awaits could be the one that never returns, and without instrumentation we are guessing. Step 3 is the diagnostic step: we introduce a tiny in-process tracer that drops a monotonic-clock checkpoint before and after each of those phases, keeps the checkpoints in a caller-owned object that survives an exception, and exposes a gap_after helper so a test can assert "the stall sat on tool_call_start for ~N seconds before the read timeout fired". That single observation — the last recorded stage on a doomed call is always tool_call_start — is what tells us, with no more hand-waving, that the bug lives on the read side of the tool-call response, not in the handshake, not in the transport open, and not in our tool dispatch. Every later step in this series is going to assume that fact, so we pin it in a test now.

Setup

Two new files land in codebase/, plus a one-line re-export update:

  • src/mcp_slow_server/tracing.py — a SseTrace dataclass, a TraceEvent record type, the STAGE_* string constants, and a traced_call_slow_echo coroutine that mirrors call_slow_echo but records a checkpoint at every phase boundary.
  • tests/test_tracing.py — eight new tests covering the unit-level trace bookkeeping (empty trace, append order, gap arithmetic, monotonic timestamps) plus the four wire-level cases we actually care about: happy-path stages in order, the tool-gap matches the requested delay, the doomed call stalls exactly at tool_call_start, and trace.last_stage localises the failure for the assertion.
  • src/mcp_slow_server/__init__.py — re-exports the new public names so callers can import everything from the package root instead of reaching into the submodule.

No new runtime dependencies. The tracer is pure stdlib: time.monotonic for clock readings, a frozen dataclass for events, a mutable dataclass for the trace itself. We deliberately do not pull in structlog or opentelemetry here. The goal of the tracer is to make this one bug observable from a test assertion, not to bolt a production observability stack onto a 200-line reproducer. If we ever graduate this code into something real, swapping the recorder for an OTel span emitter is a one-function change — the stage names already read like span names by design.

Implementation

The core type is a stage-stamped event log keyed off a monotonic clock that starts when the trace is constructed:

@dataclass
class SseTrace:
    events: list[TraceEvent] = field(default_factory=list)
    _started_at: float = field(default_factory=time.monotonic)

    def record(self, stage: str, detail: str = "") -> None:
        elapsed = time.monotonic() - self._started_at
        self.events.append(
            TraceEvent(elapsed_seconds=elapsed, stage=stage, detail=detail),
        )

    @property
    def stages(self) -> list[str]:
        return [event.stage for event in self.events]

    @property
    def last_stage(self) -> str | None:
        if not self.events:
            return None
        return self.events[-1].stage

    def gap_after(self, stage: str) -> float | None:
        for index, event in enumerate(self.events):
            if event.stage != stage:
                continue
            if index + 1 >= len(self.events):
                return None
            return self.events[index + 1].elapsed_seconds - event.elapsed_seconds
        return None

    def __iter__(self) -> Iterator[TraceEvent]:
        return iter(self.events)

stages and __iter__ are convenience views over events. The tests read STAGE_TOOL_CALL_START in trace.stages instead of walking the event list themselves, and the REPL example below uses for event in trace to print the timeline — both stay readable without coupling the caller to the underlying list shape.

Two design choices are load-bearing. First, the trace is caller-owned — the test constructs the SseTrace, hands it into traced_call_slow_echo, and inspects it after the call returns or raises. If the tracer owned the trace internally and returned it on success, the doomed-call test would have nothing to inspect because the function never returns. Caller-owned state survives the exception, which is exactly the property we need for a hang-shaped bug. Second, gap_after returns None for both "stage never recorded" and "stage was the last event". Folding those into a single None keeps the assertion side readable (if trace.gap_after("tool_call_start") is None: ...) and makes "we stalled right after the dispatch" — which is what a hang looks like — a single, named condition rather than a multi-branch check.

STAGE_* constants are module-level strings rather than an Enum because the trace gets dumped to logs and compared in test assertions, and bare strings round-trip through both without coercion noise:

STAGE_CONNECT_OPEN = "connect_open"
STAGE_TRANSPORT_READY = "transport_ready"
STAGE_SESSION_OPEN = "session_open"
STAGE_INITIALIZE_START = "initialize_start"
STAGE_INITIALIZE_DONE = "initialize_done"
STAGE_TOOL_CALL_START = "tool_call_start"
STAGE_TOOL_CALL_DONE = "tool_call_done"
STAGE_SESSION_CLOSE = "session_close"
STAGE_CONNECT_CLOSE = "connect_close"

The instrumented coroutine is a near-line-for-line mirror of step 2's call_slow_echo, with trace.record(...) calls sandwiching every await that could be the stall point:

async def traced_call_slow_echo(
    sse_url: str,
    message: str,
    delay_seconds: float,
    *,
    trace: SseTrace,
    connect_timeout: float = DEFAULT_CONNECT_TIMEOUT,
    sse_read_timeout: float = DEFAULT_SSE_READ_TIMEOUT,
) -> SlowEchoResult:
    trace.record(STAGE_CONNECT_OPEN, f"read_timeout={sse_read_timeout}")
    async with sse_client(
        sse_url,
        timeout=connect_timeout,
        sse_read_timeout=sse_read_timeout,
    ) as (read_stream, write_stream):
        trace.record(STAGE_TRANSPORT_READY)
        async with ClientSession(read_stream, write_stream) as session:
            trace.record(STAGE_SESSION_OPEN)
            trace.record(STAGE_INITIALIZE_START)
            await session.initialize()
            trace.record(STAGE_INITIALIZE_DONE)
            trace.record(
                STAGE_TOOL_CALL_START,
                f"tool=slow_echo delay={delay_seconds}",
            )
            call_result = await session.call_tool(
                "slow_echo",
                {"message": message, "delay_seconds": delay_seconds},
            )
            trace.record(STAGE_TOOL_CALL_DONE, f"is_error={call_result.isError}")
        trace.record(STAGE_SESSION_CLOSE)
    trace.record(STAGE_CONNECT_CLOSE)
    return SlowEchoResult(
        text=_extract_text(call_result.content),
        is_error=bool(call_result.isError),
    )

There is deliberately no try/except here. The codebase rule forbids nested try/except blocks anyway, but more importantly: if we caught the exception inside the tracer we would have to either re-raise it (no value added) or swallow it (which would hide the bug). The async with blocks already guarantee sse_client and ClientSession get torn down on exception, and the caller's outer asyncio.wait_for still bounds wall-clock time. So when the read timeout fires inside session.call_tool, the exception propagates straight out, the async with exits skip every subsequent trace.record, and we are left with a trace whose last_stage is tool_call_start — which is exactly the assertable evidence we wanted.

The test that closes the loop is the one that earns step 3 its keep:

async def test_trace_localizes_stall_to_tool_call_when_read_budget_too_short(
    sse_server_url: str,
) -> None:
    trace = SseTrace()
    tool_delay = 3.0
    short_read_budget = 0.4
    observation_budget = short_read_budget + 2.5

    with pytest.raises(BaseException):
        await asyncio.wait_for(
            traced_call_slow_echo(
                sse_server_url,
                message="never-arrives",
                delay_seconds=tool_delay,
                sse_read_timeout=short_read_budget,
                trace=trace,
            ),
            timeout=observation_budget,
        )

    assert STAGE_INITIALIZE_DONE in trace.stages
    assert STAGE_TOOL_CALL_START in trace.stages
    assert STAGE_TOOL_CALL_DONE not in trace.stages
    assert STAGE_SESSION_CLOSE not in trace.stages

The four assertions are arranged as a triangulation, not a redundancy. INITIALIZE_DONE in stages proves the handshake completed, so we know the SSE transport itself was healthy up to the tool dispatch. TOOL_CALL_START in stages proves the dispatch happened — the client did manage to write the JSON-RPC request. TOOL_CALL_DONE not in stages proves the response never landed within the read budget. SESSION_CLOSE not in stages proves the failure happened during the tool call, not in some unrelated teardown step that ran after a successful return. Together they pinpoint the bug to one specific await on one specific line, in a form a future regression can break loudly against. The companion test test_trace_last_stage_is_tool_call_start_on_doomed_call collapses that triangulation to a single assertion (trace.last_stage == STAGE_TOOL_CALL_START) which is what the prose in later steps will quote when describing the bug.

Test it

Run the suite from the codebase/ directory:

pytest

Expected output:

============================= test session starts ==============================
platform darwin -- Python 3.12.5, pytest-9.0.3, pluggy-1.6.0
rootdir: codebase
configfile: pyproject.toml
testpaths: tests
plugins: asyncio-1.4.0, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 21 items

tests/test_client.py .....                                               [ 23%]
tests/test_server.py ........                                            [ 61%]
tests/test_tracing.py ........                                           [100%]

============================== 21 passed in 6.27s ==============================

Twenty-one tests now: eight from step 1, five from step 2, and eight new tracing tests. The two that matter most for diagnosis are test_trace_localizes_stall_to_tool_call_when_read_budget_too_short and test_trace_last_stage_is_tool_call_start_on_doomed_call. Both pass for the same reason step 2's reproducer passed — because the bug fires — but they additionally encode where the bug fires. If a future change shifted the failure to, say, the handshake, those assertions would flip red and tell us the bug mode had moved.

You can also inspect a trace interactively. Run the server in one terminal:

uv run mcp-slow-server --host 127.0.0.1 --port 8765

And from a REPL in another:

import asyncio
from mcp_slow_server import SseTrace, traced_call_slow_echo

trace = SseTrace()
try:
    asyncio.run(traced_call_slow_echo(
        "http://127.0.0.1:8765/sse",
        message="hi",
        delay_seconds=3.0,
        sse_read_timeout=0.5,
        trace=trace,
    ))
except BaseException as exc:
    print(f"raised: {type(exc).__name__}")

for event in trace:
    print(f"{event.elapsed_seconds:6.3f}s  {event.stage}  {event.detail}")
print(f"last_stage = {trace.last_stage}")

The printed timeline lands every phase up through tool_call_start, then nothing — the trace's last event sits on the dispatch, the wall clock between that line and the raised exception is roughly the read budget, and last_stage reads back as tool_call_start. That is the observation step 4 onward will work to eliminate.

What we got

We added an in-process tracer that turns the read-timeout hang from a folklore event ("sometimes it raises, sometimes it wedges") into a named, assertable failure shape (trace.last_stage == "tool_call_start"). The tracer has zero runtime dependencies, costs a handful of time.monotonic reads per call, and is wired into a caller-owned object so it survives the very exceptions we are trying to diagnose. Eight new tests pin the behaviour: four unit tests on the trace bookkeeping itself, two happy-path wire tests that confirm the full nine-stage timeline is recorded in order with realistic gaps, and two doomed-call tests that lock the bug's location into the test suite. With the bug now pinned to a single await on a single line, step 4 can stop arguing about whether the failure is transport-level versus tool-level and start working on the actual fix: server-side heartbeats plus per-tool timeout cancellation that keep the SSE stream alive while a slow tool is still running.

Repository

The companion code for this article: https://github.com/vytharion/mcp-sse-read-timeout-tool-hang

The state of the code after this step: 4202474

Key commits to step through:

  • d840d10 — step 1: scaffold the minimal FastMCP SSE server with one slow tool
  • 1296ceb — step 2: wire up an MCP client and reproduce the read-timeout hang on tool call
  • 4202474 — step 3: instrument the SSE stream to observe where the connection stalls

What we're doing this step

Step 3 pinned the bug to a single observable fact: on a doomed call, the trace's last_stage is always tool_call_start. The SSE transport itself was healthy through the handshake, the request was written, and then the client's idle read clock fired before anything came back the other way. That diagnosis points at exactly two interventions, and step 4 lands both of them at once because they only make sense together. First, the server has to put something on the SSE stream while a slow tool is still running — even a one-byte progress notification is enough to reset the client's idle read clock, because httpx measures "connection looks dead" as "no event has arrived in N seconds", not "the tool has not returned in N seconds". Second, the server needs a hard wall-clock budget on the tool itself, so a genuinely runaway tool gets cancelled with a structured error instead of pumping heartbeats forever. The first control keeps the transport honest; the second keeps the tool honest. We deliberately implement them as two independent helpers wired through a single run_with_heartbeat coroutine, register a new slow_echo_with_heartbeat tool that uses both, leave the original slow_echo untouched so the step 2 and step 3 reproducers still pin the bug, and add fourteen new tests that lock in the new behaviour — including a wire-level test that proves a healthy follow-up call still works after a previous call blew its own timeout.

Setup

Two new files land in codebase/, plus a tool registration in the existing server.py:

  • src/mcp_slow_server/heartbeat.py — the ToolTimeoutError exception, the HeartbeatEmitter callable alias, the internal _heartbeat_loop task, and the public run_with_heartbeat coroutine that wraps a unit of work with both controls.
  • src/mcp_slow_server/server.py — adds the slow_echo_with_heartbeat tool function, a _progress_emitter_for(ctx, total) helper that builds a report_progress-backed emitter, and registers the new tool in build_server alongside the legacy slow_echo.
  • tests/test_heartbeat.py and tests/test_heartbeat_tool.py — split the new test surface in two: the first file unit-tests the run_with_heartbeat primitive in isolation (cadence, cancellation, exception passthrough, no-op emitter, validation), and the second exercises the registered tool end-to-end, including over the real SSE transport on the uvicorn fixture.

No new runtime dependencies. Heartbeats are an asyncio.Event plus asyncio.wait_for; the per-tool timeout is another asyncio.wait_for. We deliberately do not add a job-scheduler library or an OTel exporter. The point of step 4 is to close the hang with the smallest possible amount of new surface area — anything richer can be layered on once the read-timeout failure mode is dead.

Implementation

The heart of the change is a single helper that runs a unit of work while a background task fires an emit callback on a fixed cadence, and stops cleanly whether the work returns, raises, or blows its timeout:

async def run_with_heartbeat(
    work: Awaitable[T],
    *,
    emit: HeartbeatEmitter,
    heartbeat_interval: float = DEFAULT_HEARTBEAT_INTERVAL_SECONDS,
    tool_timeout: float = DEFAULT_TOOL_TIMEOUT_SECONDS,
    tool_name: str = "tool",
) -> T:
    _validate_intervals(heartbeat_interval, tool_timeout)

    stop_event = asyncio.Event()
    heartbeat_task = asyncio.create_task(
        _heartbeat_loop(emit, heartbeat_interval, stop_event),
    )
    try:
        return await asyncio.wait_for(work, timeout=tool_timeout)
    except asyncio.TimeoutError as exc:
        raise ToolTimeoutError(tool_name, tool_timeout) from exc
    finally:
        stop_event.set()
        await _drain_task(heartbeat_task)

Three design choices are doing the heavy lifting here. First, the two budgets are passed as separate parameters, not as a single number. A 30-second tool running on a 1-second heartbeat is a legitimate configuration — the tool is slow on purpose, but the transport must still see traffic every second. Folding them into one knob would force the caller to widen the wrong budget every time they wanted to tune the other, which is exactly the kind of papercut that makes operators disable heartbeats entirely. Second, asyncio.TimeoutError is re-raised as a named ToolTimeoutError with the original chained on as __cause__. The named subclass carries tool_name and timeout_seconds, which makes the failure greppable in logs and distinguishes it from any other asyncio.TimeoutError that happens to bubble up — for example, one fired by the client's outer wait_for. Third, the finally block always sets stop_event and always awaits the heartbeat task through _drain_task. We never leak a background task even if the work raises, and a failing emit callback is intentionally swallowed inside _drain_task so a misbehaving heartbeat cannot mask the real outcome of the work.

The heartbeat loop itself is one of the few places in the codebase that needs asyncio.wait_for inside a try/except — and only one level deep, because the codebase rule bans nested try/except:

async def _heartbeat_loop(
    emit: HeartbeatEmitter,
    interval_seconds: float,
    stop_event: asyncio.Event,
) -> int:
    count = 0
    while not stop_event.is_set():
        try:
            await asyncio.wait_for(
                stop_event.wait(),
                timeout=interval_seconds,
            )
            return count
        except asyncio.TimeoutError:
            count += 1
            await emit(count)
    return count

The pattern is "wait for the stop signal, with a deadline". If the stop signal arrives first, wait_for returns and we exit. If the deadline arrives first, wait_for raises TimeoutError, we treat that as "another interval elapsed", bump the counter, and emit. The counter is what the emit callback sees, which lets a report_progress-backed emitter pass a monotonically increasing progress value without having to keep its own state. The shape also means the loop sleeps as little as possible on shutdown — when stop_event is set during the wait, wait_for unblocks immediately instead of running out the rest of the interval.

The MCP tool that consumes this helper is intentionally small:

async def slow_echo_with_heartbeat(
    message: str,
    delay_seconds: float = DEFAULT_DELAY_SECONDS,
    heartbeat_interval: float = DEFAULT_HEARTBEAT_INTERVAL_SECONDS,
    tool_timeout: float = DEFAULT_TOOL_TIMEOUT_SECONDS,
    ctx: Context | None = None,
) -> str:
    emit = _progress_emitter_for(ctx, total=delay_seconds)
    work = slow_echo(message, delay_seconds=delay_seconds)
    return await run_with_heartbeat(
        work,
        emit=emit,
        heartbeat_interval=heartbeat_interval,
        tool_timeout=tool_timeout,
        tool_name=SLOW_ECHO_HEARTBEAT_TOOL_NAME,
    )

The ctx: Context | None annotation is the FastMCP idiom for auto-injected context. FastMCP looks at the type hint, recognises Context, and injects the live request context at call time without exposing the parameter on the wire schema. The test_build_server_heartbeat_tool_schema_hides_context test enforces that — properties lists message, delay_seconds, heartbeat_interval, and tool_timeout but never ctx. The _progress_emitter_for helper closes over ctx and total, so the emitter passed into run_with_heartbeat is just a one-argument async callable that doesn't need to know whether the wire peer exists:

def _progress_emitter_for(
    ctx: Context | None,
    total: float | None,
) -> HeartbeatEmitter:
    if ctx is None:
        return make_noop_emitter()

    async def _emit(count: int) -> None:
        await ctx.report_progress(progress=float(count), total=total)

    return _emit

The ctx is None branch is what lets the same code path run inside a unit test without a wire peer. It's not a workaround — it's the explicit contract that lets slow_echo_with_heartbeat be tested directly with await slow_echo_with_heartbeat(...) instead of having to spin a uvicorn worker for every assertion. The wire-level tests still exist for the round-trip path, but the cadence and cancellation behaviour is unit-tested cheaply.

The most important test on the new surface is the one that proves a follow-up call still works after a previous call blew its own timeout:

async def test_heartbeat_tool_on_wire_returns_tool_error_when_timeout_fires(
    sse_server_url: str,
) -> None:
    doomed = await asyncio.wait_for(
        _call_heartbeat_tool(
            sse_server_url,
            message="blown",
            delay_seconds=2.0,
            heartbeat_interval=0.1,
            tool_timeout=0.3,
            sse_read_timeout=5.0,
        ),
        timeout=HANG_OBSERVATION_BUDGET_SECONDS,
    )
    assert doomed.is_error is True

    healthy = await asyncio.wait_for(
        call_slow_echo(
            sse_server_url,
            message="recovered",
            delay_seconds=0.1,
            sse_read_timeout=5.0,
        ),
        timeout=HANG_OBSERVATION_BUDGET_SECONDS,
    )
    assert healthy.is_error is False
    assert "recovered" in healthy.text

This is the regression target. Before step 4, a misbehaving tool could leak a wedged SSE stream into the server process, and the next client session might inherit the wreckage. After step 4, the timeout cancels the work cleanly, the SSE response carries a structured isError=True payload, and the server is immediately ready for the next call. The two asyncio.wait_for wrappers around each remote call are belt-and-braces: the inner assertions speak to the tool behaviour, and the outer budgets guarantee the test fails fast instead of wedging CI if a future change reintroduces the original hang.

Test it

Run the suite from the codebase/ directory:

pytest

Expected output:

============================= test session starts ==============================
platform darwin -- Python 3.12.5, pytest-9.0.3, pluggy-1.6.0
rootdir: codebase
configfile: pyproject.toml
testpaths: tests
plugins: asyncio-1.4.0, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 44 items

tests/test_client.py .....                                               [ 11%]
tests/test_heartbeat.py ..............                                   [ 43%]
tests/test_heartbeat_tool.py .........                                   [ 63%]
tests/test_server.py ........                                            [ 81%]
tests/test_tracing.py ........                                           [100%]

============================= 44 passed in 11.59s ==============================

Forty-four tests now: eight from step 1, five from step 2, eight from step 3, plus twenty-three new tests across test_heartbeat.py (fourteen unit tests on the primitive) and test_heartbeat_tool.py (nine tests on the registered tool, two of which exercise the real SSE round trip). The two that pay off the step are test_heartbeat_tool_keeps_sse_alive_past_short_read_budget — which runs a 1.2-second tool over a generous read budget and confirms the end-to-end call returns — and the test_heartbeat_tool_on_wire_returns_tool_error_when_timeout_fires case shown above, which is what locks in "a doomed call no longer poisons the server".

You can also drive it interactively. Start the server in one terminal:

uv run mcp-slow-server --host 127.0.0.1 --port 8765

And in a Python REPL in another:

import asyncio
from mcp import ClientSession
from mcp.client.sse import sse_client

async def main():
    async with sse_client(
        "http://127.0.0.1:8765/sse",
        timeout=5.0,
        sse_read_timeout=2.0,
    ) as (read_stream, write_stream):
        async with ClientSession(read_stream, write_stream) as session:
            await session.initialize()
            result = await session.call_tool(
                "slow_echo_with_heartbeat",
                {
                    "message": "alive",
                    "delay_seconds": 10.0,
                    "heartbeat_interval": 0.5,
                    "tool_timeout": 15.0,
                },
            )
            print("isError =", result.isError)
            print("content =", result.content)

asyncio.run(main())

A 10-second tool runs on a 2-second SSE read budget without tearing the stream down, because heartbeats arrive every 500ms and reset the client's idle read clock. Swap the call to plain slow_echo with the same numbers and the read-timeout hang from earlier steps fires again — proof that the fix is in the combination of "the tool emits heartbeats" and "the budget is bounded", not in any change to the client.

What we got

The read-timeout hang is closed. Tools that legitimately need to run longer than the SSE read budget now register on the heartbeat-aware path, fire progress notifications every heartbeat_interval seconds to keep the transport alive, and are cancelled with a structured ToolTimeoutError if they overrun tool_timeout. The original slow_echo is intentionally untouched so the reproducers from step 2 and step 3 still pin the original failure — that gives us a paired "broken tool / fixed tool" surface to point at in regression tests. Twenty-three new tests cover the primitive (cadence, cancellation, exception passthrough, no-op emitter, intolerance of zero or negative budgets, failing-emit resilience), the registered MCP tool (direct invocation, schema shape, FastMCP context injection, wire-level listing, wire-level happy path, wire-level timeout that returns a structured error and leaves the server ready for the next call), and the entire 44-test suite passes in roughly eleven seconds. The bug that opened this article — "the client either raises noisily or wedges forever when a tool outruns the SSE read budget" — has been replaced with two well-defined outcomes: either the tool finishes and the result returns, or the tool blows its own budget and the client sees an isError=True content payload on the same wire that stayed healthy through every heartbeat in between.

Repository

The companion code for this article: https://github.com/vytharion/mcp-sse-read-timeout-tool-hang

The state of the code after this step: 17088ab

Key commits to step through:

  • d840d10 — step 1: scaffold the minimal FastMCP SSE server with one slow tool
  • 1296ceb — step 2: wire up an MCP client and reproduce the read-timeout hang on tool call
  • 4202474 — step 3: instrument the SSE stream to observe where the connection stalls
  • 17088ab — step 4: add server-side heartbeats and per-tool timeout cancellation

What we're doing this step

Step 4 closed the read-timeout hang from the server side. Heartbeats keep the SSE stream alive while a legitimately slow tool is still working, and a per-tool wall-clock budget cancels a runaway tool with a structured ToolTimeoutError so the next call gets a clean wire. That is enough when every tool registered on the server cooperates with the heartbeat-aware path — but a real deployment will always carry at least one tool that doesn't, or face an upstream that drops bytes for reasons neither the server nor the client can negotiate around. Step 5 fixes the client side of that contract. The work splits into three jobs that the existing call_slow_echo does not do. First, detect that the read budget elapsed regardless of which third-party class wins the teardown race — httpx.ReadTimeout, anyio.EndOfStream, anyio.ClosedResourceError, or a raw asyncio.TimeoutError from the client's own outer wait_for — and collapse all of them into one named ClientReadTimeoutError so the caller never has to import a transport exception class to pattern-match on the failure. Second, abort the in-flight tool on the server promptly, by deliberately letting the async with sse_client(...) block tear down on the exception path so the SSE stream closes and the server observes a disconnect. Third, retry transient read-timeouts under a bounded exponential-backoff schedule, where every retry opens a fresh sse_client session so a doomed attempt never poisons the next one. The new helpers live in src/mcp_slow_server/resilient_client.py, the original call_slow_echo stays untouched so the step 2 reproducer still pins the unmitigated hang, and twenty-three new tests lock in the unit-level shape plus three full wire-level scenarios on the uvicorn fixture.

Setup

One new module lands in codebase/ and one new test file pairs with it:

  • src/mcp_slow_server/resilient_client.py — exposes ClientReadTimeoutError, RetryPolicy, call_slow_echo_once, retry_on_read_timeout, and the public entry point call_slow_echo_resilient. Internal helpers (_matches_read_timeout, _is_read_timeout) recursively unwrap ExceptionGroup shapes so anyio's nested task-group teardowns classify correctly.
  • tests/test_resilient_client.py — twenty-three tests split into three layers: constant/dataclass sanity (six), pure-Python classification of the read-timeout shape (six), unit-level retry-loop behaviour against in-memory async stubs (five), and wire-level scenarios against the uvicorn fixture (six, including the "follow-up call still works" regression target).

No new runtime dependencies. We pull asyncio.wait_for and asyncio.sleep from stdlib, reuse mcp.client.sse.sse_client / mcp.ClientSession from the existing client, and lean on the test-time sse_server_url fixture already wired up in step 2's conftest.py. We deliberately do not add a retry library like tenacity — the policy fits in two dataclass fields and one arithmetic expression, and the cost of a third-party retry surface (decorator state, hidden sleeps, jitter dispatch tables) is not worth the abstraction at this scale.

Implementation

The named failure type is the contract the caller depends on. It carries the four pieces of metadata that downstream consumers — logs, alerts, retry decisions — always want, and stringifies into a human-readable message that already has them spliced in:

class ClientReadTimeoutError(Exception):
    def __init__(
        self,
        tool_name: str,
        sse_url: str,
        attempt: int,
        timeout_seconds: float,
    ) -> None:
        super().__init__(
            f"Client read-timeout on tool {tool_name!r} after "
            f"{timeout_seconds:.3f}s (attempt {attempt}, url={sse_url})",
        )
        self.tool_name = tool_name
        self.sse_url = sse_url
        self.attempt = attempt
        self.timeout_seconds = timeout_seconds

The attempt field is what turns a single timeout into a useful trace inside the retry loop — "attempt 3 of 3" reads differently from "attempt 1 of 3" even when the underlying transport exception is the same. The tool_name and sse_url mean a single except ClientReadTimeoutError as exc: block in caller code has everything it needs for a structured log line; nobody has to drag the SSE URL down from outer scope.

Classification of the transport zoo is the second piece. anyio task groups wrap stream-closure exceptions inside an ExceptionGroup, so a single except clause cannot rely on isinstance against the leaf class. We walk the exceptions attribute recursively so arbitrary group nesting unwraps correctly:

_READ_TIMEOUT_EXCEPTION_NAMES = frozenset(
    {
        "ReadTimeout",
        "ReadError",
        "EndOfStream",
        "ClosedResourceError",
        "BrokenResourceError",
    },
)


def _matches_read_timeout(exc: BaseException) -> bool:
    if isinstance(exc, asyncio.TimeoutError):
        return True
    return type(exc).__name__ in _READ_TIMEOUT_EXCEPTION_NAMES


def _is_read_timeout(exc: BaseException) -> bool:
    if _matches_read_timeout(exc):
        return True
    sub_exceptions = getattr(exc, "exceptions", None)
    if sub_exceptions is None:
        return False
    return any(_is_read_timeout(sub) for sub in sub_exceptions)

Matching by class name rather than isinstance is intentional. The module never imports httpx or anyio — it works against whatever versions the MCP SDK has pulled in, and survives minor version moves in either package without a code change. The trade-off is real: somebody could in principle write a custom class EndOfStream(Exception) in unrelated code and have it classified as a read-timeout. That trade is the right one. The classes we care about are widely-known names inside two well-known packages, and the alternative — pinning hard imports — drags two transport dependencies into a module whose entire purpose is to hide them from the caller. The two tests test_is_read_timeout_unwraps_exception_group and test_is_read_timeout_unwraps_nested_exception_groups cover the recursion explicitly; test_is_read_timeout_rejects_unrelated_exceptions and test_is_read_timeout_rejects_exception_group_of_unrelated cover the negative direction.

The single-attempt call is where the abort actually happens. The call_tool await is wrapped in asyncio.wait_for with an explicit client_call_timeout deadline that defaults to sse_read_timeout:

async def call_slow_echo_once(
    sse_url: str,
    message: str,
    delay_seconds: float,
    *,
    attempt: int = 1,
    connect_timeout: float = DEFAULT_CONNECT_TIMEOUT,
    sse_read_timeout: float = DEFAULT_SSE_READ_TIMEOUT,
    client_call_timeout: float | None = None,
) -> SlowEchoResult:
    deadline = (
        client_call_timeout
        if client_call_timeout is not None
        else sse_read_timeout
    )
    try:
        async with sse_client(
            sse_url,
            timeout=connect_timeout,
            sse_read_timeout=sse_read_timeout,
        ) as (read_stream, write_stream):
            async with ClientSession(read_stream, write_stream) as session:
                await session.initialize()
                call_result = await asyncio.wait_for(
                    session.call_tool(
                        "slow_echo",
                        {"message": message, "delay_seconds": delay_seconds},
                    ),
                    timeout=deadline,
                )
                return SlowEchoResult(
                    text=_extract_text(call_result.content),
                    is_error=bool(call_result.isError),
                )
    except BaseException as exc:
        if _is_read_timeout(exc):
            raise ClientReadTimeoutError(
                tool_name="slow_echo",
                sse_url=sse_url,
                attempt=attempt,
                timeout_seconds=deadline,
            ) from exc
        raise

Two design decisions inside this function are doing the heavy lifting. First, the deadline is an explicit parameter, not "just reuse sse_read_timeout". sse_read_timeout controls how long the SSE client tolerates an idle stream — that is the right number when heartbeats are flowing, and a terrible number when they are not, because the per-call deadline wants to be slightly tighter than the transport's own idle clock so the abort lands cleanly through the context manager teardown rather than as a noisy transport error. Defaulting client_call_timeout to sse_read_timeout keeps the simple case simple; exposing it as a separate parameter lets the test suite (and real callers) tune the two clocks independently. Second, the async with sse_client(...) block is outside the wait_for, not inside. When wait_for fires its asyncio.TimeoutError, the exception unwinds through the ClientSession and sse_client context manager exits, which is exactly what tears the SSE stream down and gives the server the disconnect signal. If sse_client were inside wait_for, the cancellation would race the context manager's own cleanup and we would leak a half-open transport. The single-level try/except around the whole block obeys the codebase's no-nested-try rule and still catches every shape — BaseException because anyio task-group teardowns are technically BaseExceptionGroup, and _is_read_timeout walks the group for us.

The retry loop on top is intentionally boring:

async def retry_on_read_timeout(
    work: Callable[[int], Awaitable[T]],
    *,
    retry_policy: RetryPolicy | None = None,
) -> T:
    policy = retry_policy if retry_policy is not None else RetryPolicy()
    last_error: ClientReadTimeoutError | None = None
    for attempt in range(1, policy.max_attempts + 1):
        try:
            return await work(attempt)
        except ClientReadTimeoutError as exc:
            last_error = exc
            if attempt >= policy.max_attempts:
                break
            await asyncio.sleep(policy.backoff_for(attempt))
    assert last_error is not None
    raise last_error

The loop only retries on ClientReadTimeoutError — every other exception propagates immediately. That is enforced by test_retry_on_read_timeout_does_not_retry_other_exceptions, which raises a RuntimeError from inside work and asserts the loop sees exactly one attempt. The backoff itself sleeps after the just-failed attempt, so attempt 1 sleeps backoff_for(1) before attempt 2 starts; that ordering is locked in by test_retry_on_read_timeout_sleeps_between_attempts, which measures elapsed wall-clock time against 0.05 + 0.1 (initial 0.05s doubled once) and confirms we are in the right ballpark.

The RetryPolicy dataclass is frozen and validates in __post_init__, so a misconfigured caller fails at construction time instead of mid-loop:

@dataclass(frozen=True)
class RetryPolicy:
    max_attempts: int = DEFAULT_MAX_ATTEMPTS
    initial_backoff_seconds: float = DEFAULT_INITIAL_BACKOFF_SECONDS
    backoff_multiplier: float = DEFAULT_BACKOFF_MULTIPLIER

    def __post_init__(self) -> None:
        if self.max_attempts < 1:
            raise ValueError("max_attempts must be >= 1")
        if self.initial_backoff_seconds < 0:
            raise ValueError("initial_backoff_seconds must be non-negative")
        if self.backoff_multiplier < 1.0:
            raise ValueError("backoff_multiplier must be >= 1.0")

    def backoff_for(self, attempt: int) -> float:
        if attempt < 1:
            raise ValueError("attempt must be >= 1")
        return self.initial_backoff_seconds * (
            self.backoff_multiplier ** (attempt - 1)
        )

Refusing max_attempts=0 is the obvious one. Refusing backoff_multiplier < 1.0 is the less obvious one — a shrinking backoff would accelerate retries against a persistent failure, which is the opposite of what every retry library on earth wants. We catch that in test_retry_policy_rejects_shrinking_backoff so the constraint never quietly regresses.

The top-level entry point composes the three pieces into the surface callers actually want:

async def call_slow_echo_resilient(
    sse_url: str,
    message: str,
    delay_seconds: float,
    *,
    connect_timeout: float = DEFAULT_CONNECT_TIMEOUT,
    sse_read_timeout: float = DEFAULT_SSE_READ_TIMEOUT,
    client_call_timeout: float | None = None,
    retry_policy: RetryPolicy | None = None,
) -> SlowEchoResult:
    async def _attempt(attempt: int) -> SlowEchoResult:
        return await call_slow_echo_once(
            sse_url,
            message,
            delay_seconds,
            attempt=attempt,
            connect_timeout=connect_timeout,
            sse_read_timeout=sse_read_timeout,
            client_call_timeout=client_call_timeout,
        )

    return await retry_on_read_timeout(_attempt, retry_policy=retry_policy)

The closure _attempt is recreated implicitly each time the loop recurses into work(attempt), but the inner work it does — opening a fresh sse_client, a fresh ClientSession, calling initialize, calling call_tool — runs from scratch every attempt. There is no shared state between attempts. That is the whole point of the retry contract on this kind of failure: we cannot resume a torn-down SSE stream, so we must open a new one.

The regression target is the test that proves a doomed call does not leave the server wedged for the next caller:

async def test_call_slow_echo_resilient_aborted_call_leaves_server_healthy(
    sse_server_url: str,
) -> None:
    tool_delay = 2.0
    short_budget = 0.4
    policy = RetryPolicy(
        max_attempts=2,
        initial_backoff_seconds=0.0,
        backoff_multiplier=1.0,
    )

    with pytest.raises(ClientReadTimeoutError):
        await asyncio.wait_for(
            call_slow_echo_resilient(
                sse_server_url,
                message="aborted",
                delay_seconds=tool_delay,
                sse_read_timeout=short_budget,
                client_call_timeout=short_budget,
                retry_policy=policy,
            ),
            timeout=HANG_OBSERVATION_BUDGET_SECONDS,
        )

    healthy = await asyncio.wait_for(
        call_slow_echo(
            sse_server_url,
            message="follow-up",
            delay_seconds=0.1,
            sse_read_timeout=5.0,
        ),
        timeout=HANG_OBSERVATION_BUDGET_SECONDS,
    )
    assert healthy.is_error is False
    assert "follow-up" in healthy.text

The first call deliberately picks a tool delay (2.0s) much longer than the call budget (0.4s), runs two attempts with no backoff, and asserts both abort fast and the resilient wrapper re-raises a ClientReadTimeoutError. The second call uses the plain call_slow_echo against the same server, with a generous read budget, and asserts the round-trip succeeds. The pairing matters: if the abort on the doomed call had leaked a half-open transport into the server process, the follow-up would either hang (caught by the outer asyncio.wait_for) or come back as an error. The fact that it returns is_error=False with the expected payload is what locks in the contract.

Test it

Run the suite from the codebase/ directory:

pytest

Expected output:

============================= test session starts ==============================
platform darwin -- Python 3.12.5, pytest-9.0.3, pluggy-1.6.0
rootdir: codebase
configfile: pyproject.toml
testpaths: tests
plugins: asyncio-1.4.0, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 67 items

tests/test_client.py .....                                               [  7%]
tests/test_heartbeat.py ..............                                   [ 28%]
tests/test_heartbeat_tool.py .........                                   [ 41%]
tests/test_resilient_client.py .......................                   [ 76%]
tests/test_server.py ........                                            [ 88%]
tests/test_tracing.py ........                                           [100%]

============================= 67 passed in 15.90s ==============================

Sixty-seven tests now: forty-four carried over from step 4, plus twenty-three new tests in tests/test_resilient_client.py. Six tests cover constant sanity and dataclass validation. Six cover the _is_read_timeout classifier — including the two ExceptionGroup unwrap tests that exercise the recursion against arbitrary group nesting. Five unit-test the retry_on_read_timeout loop against in-memory async stubs (first-attempt success, no-retry-on-other-exc, retries-until-success, exhaust-and-reraise, sleeps-between-attempts). The remaining six exercise the wire path on the uvicorn fixture: the happy path, the named-timeout-on-blown-budget shape, the exhaust-retries case that asserts exc.attempt == 2, and the follow-up-call-still-works regression target. The whole suite still finishes in roughly sixteen seconds.

You can also drive it interactively. Start the server in one terminal:

uv run mcp-slow-server --host 127.0.0.1 --port 8765

And in a Python REPL in another:

import asyncio
from mcp_slow_server.resilient_client import (
    RetryPolicy,
    call_slow_echo_resilient,
)

async def main():
    result = await call_slow_echo_resilient(
        "http://127.0.0.1:8765/sse",
        message="resilient",
        delay_seconds=3.0,
        sse_read_timeout=0.5,
        client_call_timeout=0.5,
        retry_policy=RetryPolicy(
            max_attempts=3,
            initial_backoff_seconds=0.1,
            backoff_multiplier=2.0,
        ),
    )
    print("text =", result.text)

asyncio.run(main())

This call asks the unmodified slow_echo to sleep for three seconds against a half-second budget. You will see three rapid attempts — roughly 0.5s + 0.1s + 0.5s + 0.2s + 0.5s ≈ 1.8s of total wall-clock time, not nine — followed by a single ClientReadTimeoutError with attempt=3. The shape of the failure is exactly what the test suite asserts: the abort lands inside the call budget, the retries do not stretch the failure out, and the server process is immediately ready for a follow-up.

What we got

The remaining client-side gap is closed. A caller now imports one public surface — call_slow_echo_resilient plus RetryPolicy and the named ClientReadTimeoutError — and gets the three behaviours the bug demanded: a single named exception for every shape of SSE teardown, a prompt in-flight abort that propagates through the context manager chain and lets the server clean up its tool task, and a bounded exponential-backoff retry that opens a fresh transport every attempt so a doomed try cannot poison the next one. The original call_slow_echo is deliberately left untouched so the step 2 reproducer still pins the unmitigated hang in regression — that gives the suite a paired "broken / resilient" surface on the same wire. Twenty-three new tests cover the classifier (including ExceptionGroup recursion), the dataclass validation, the retry loop against in-memory stubs, and the round-trip behaviour on the uvicorn fixture — including the "follow-up call still works" target that proves a doomed resilient call does not leak transport state into the next session. The entire 67-test suite passes in roughly sixteen seconds. The arc that opened the article — "the client either raises a transport exception nobody can catch or wedges forever when a tool outruns the SSE read budget" — now resolves cleanly: either the tool returns its payload, or the resilient wrapper raises a single named ClientReadTimeoutError that carries the tool name, the SSE URL, the attempt number, and the exact deadline that elapsed, on a transport that was torn down promptly enough for the next call to start on a clean wire.

Repository

The companion code for this article: https://github.com/vytharion/mcp-sse-read-timeout-tool-hang

The state of the code after this step: 37410bc

Key commits to step through:

  • d840d10 — step 1: scaffold the minimal FastMCP SSE server with one slow tool
  • 1296ceb — step 2: wire up an MCP client and reproduce the read-timeout hang on tool call
  • 4202474 — step 3: instrument the SSE stream to observe where the connection stalls
  • 17088ab — step 4: add server-side heartbeats and per-tool timeout cancellation
  • 37410bc — step 5: add client-side read-timeout handling with graceful retry and tool-call abort

What we're doing this step

Steps 4 and 5 each closed one half of the SSE read-timeout / tool-hang bug. The server now emits a heartbeat fast enough to keep httpx's idle read clock from firing while a legitimately slow tool is still working, and cancels a runaway tool with a structured error the moment its per-tool wall-clock budget elapses. The client now bounds every call_tool await with an explicit deadline, lets the sse_client context manager tear down cleanly on the timeout path so the server observes a real disconnect, and collapses the third-party transport exception zoo into one named ClientReadTimeoutError that callers can pattern-match on without importing httpx or anyio. The two fixes are each well-tested in isolation — twenty-three unit tests on the resilient client, fourteen on the heartbeat loop, plus the wire-level scenarios on the uvicorn fixture — but nothing yet drives both halves through a single round-trip and asserts that the composed behaviour matches the contract. Step 6 is that round-trip. We add one new public helper, call_heartbeat_tool_resilient, that wires the heartbeat-aware tool into the same abort/retry wrapper, refactor the single-attempt plumbing so the slow-echo and heartbeat paths share the named-timeout contract instead of copy-pasting it, and write a ten-test regression suite in tests/test_end_to_end.py that pins the five-part timeout contract — heartbeat liveness, server-side cancellation, client-side abort, post-failure recovery, and the still-reproducing diagnostic from step 2 — against the real uvicorn fixture. The point of the step is not to add new behaviour; it is to lock the existing behaviour in so a future refactor that drops a heartbeat, widens a deadline, raises a default, or silently swallows the named timeout fails loudly in CI rather than walking the fix back into the original hang.

Setup

One new test file lands and one production module grows two new public entry points:

  • tests/test_end_to_end.py — ten regression tests organised by contract clause (C1 through C5) plus three cross-cutting invariants on the package defaults. Every assertion is either a wall-clock bound measured with time.perf_counter, a structured error shape (ClientReadTimeoutError with the expected attempt, timeout_seconds, and tool_name), or a post-failure liveness check that re-uses the same uvicorn fixture.
  • src/mcp_slow_server/resilient_client.py — two new public functions (call_heartbeat_tool_once, call_heartbeat_tool_resilient) plus two internal helpers (_open_session_and_call, _call_tool_with_abort) that factor out the SSE-session-open + wait_for-wrapped call_tool plumbing so the slow-echo path and the heartbeat path share the same named-timeout abort contract.
  • src/mcp_slow_server/__init__.py — re-export the two new functions so callers import them off the package root.

No new runtime dependencies. The end-to-end tests reuse the sse_server_url fixture from step 2's conftest.py, the DEFAULT_HEARTBEAT_INTERVAL_SECONDS / DEFAULT_TOOL_TIMEOUT_SECONDS constants from step 4, and the RetryPolicy / ClientReadTimeoutError surface from step 5. The whole point is to exercise what we already have through one composed seam, not to introduce a new abstraction.

Implementation

The refactor inside resilient_client.py is the first move. The previous call_slow_echo_once had the SSE-session-open and the wait_for-wrapped call_tool inlined, which was fine when only one tool needed the abort contract but starts to copy-paste itself the moment a second tool wants the same treatment. We pull the wire shape into a single helper that takes the tool name and an arguments dict:

async def _open_session_and_call(
    sse_url: str,
    tool_name: str,
    arguments: dict,
    *,
    connect_timeout: float,
    sse_read_timeout: float,
    deadline: float,
    progress_callback: ProgressCallback | None,
) -> SlowEchoResult:
    async with sse_client(
        sse_url,
        timeout=connect_timeout,
        sse_read_timeout=sse_read_timeout,
    ) as (read_stream, write_stream):
        async with ClientSession(read_stream, write_stream) as session:
            await session.initialize()
            call_result = await asyncio.wait_for(
                session.call_tool(
                    tool_name,
                    arguments,
                    progress_callback=progress_callback,
                ),
                timeout=deadline,
            )
            return SlowEchoResult(
                text=_extract_text(call_result.content),
                is_error=bool(call_result.isError),
            )

The progress_callback parameter is the non-obvious piece that the end-to-end test exposed. The MCP SDK only injects a _meta.progressToken into the outgoing call_tool request when the caller passes a progress_callback. Without that token, the server's ctx.report_progress notifications are silently dropped at the protocol layer, which means the heartbeat fix from step 4 would compile and pass its unit tests while still failing on the wire — no heartbeats actually reach the SSE stream, so httpx's idle clock fires exactly as it did before the fix. Wiring progress_callback through the helper, and defaulting the heartbeat path to a no-op callback that just exists to flip the opt-in switch, is what makes the composed behaviour actually hold.

The named-timeout abort contract gets factored into a second helper:

async def _call_tool_with_abort(
    sse_url: str,
    tool_name: str,
    arguments: dict,
    *,
    attempt: int,
    connect_timeout: float,
    sse_read_timeout: float,
    client_call_timeout: float | None,
    progress_callback: ProgressCallback | None = None,
) -> SlowEchoResult:
    deadline = (
        client_call_timeout
        if client_call_timeout is not None
        else sse_read_timeout
    )
    try:
        return await _open_session_and_call(
            sse_url,
            tool_name,
            arguments,
            connect_timeout=connect_timeout,
            sse_read_timeout=sse_read_timeout,
            deadline=deadline,
            progress_callback=progress_callback,
        )
    except BaseException as exc:
        if _is_read_timeout(exc):
            raise ClientReadTimeoutError(
                tool_name=tool_name,
                sse_url=sse_url,
                attempt=attempt,
                timeout_seconds=deadline,
            ) from exc
        raise

This is the same shape step 5 already pinned — BaseException catch-all so anyio's BaseExceptionGroup teardowns route through _is_read_timeout, single try with no nested except per the codebase's no-nested-try rule, deadline defaults to sse_read_timeout when the caller does not override it. The only difference is that tool_name is now a parameter rather than a hard-coded "slow_echo", so the heartbeat tool can reuse the exact same contract. Two thin wrappers — call_slow_echo_once and call_heartbeat_tool_once — now delegate to _call_tool_with_abort with their tool name and arguments fixed. The slow-echo wrapper does not pass a progress callback (step 2's behaviour is preserved verbatim so the C5 reproducer still pins the unmitigated hang); the heartbeat wrapper passes _noop_progress to flip the protocol-level opt-in on.

The composed entry point that the end-to-end tests drive is deliberately minimal:

async def call_heartbeat_tool_resilient(
    sse_url: str,
    message: str,
    delay_seconds: float,
    *,
    heartbeat_interval: float = DEFAULT_HEARTBEAT_INTERVAL_SECONDS,
    tool_timeout: float = DEFAULT_TOOL_TIMEOUT_SECONDS,
    connect_timeout: float = DEFAULT_CONNECT_TIMEOUT,
    sse_read_timeout: float = DEFAULT_SSE_READ_TIMEOUT,
    client_call_timeout: float | None = None,
    retry_policy: RetryPolicy | None = None,
) -> SlowEchoResult:
    async def _attempt(attempt: int) -> SlowEchoResult:
        return await call_heartbeat_tool_once(
            sse_url,
            message,
            delay_seconds,
            attempt=attempt,
            heartbeat_interval=heartbeat_interval,
            tool_timeout=tool_timeout,
            connect_timeout=connect_timeout,
            sse_read_timeout=sse_read_timeout,
            client_call_timeout=client_call_timeout,
        )

    return await retry_on_read_timeout(_attempt, retry_policy=retry_policy)

The closure body opens a fresh SSE session per attempt — same no-shared-state-across-retries discipline as the slow-echo wrapper — and the retry_on_read_timeout loop from step 5 handles the bounded exponential backoff. The whole composed function is fifteen lines because every behaviour it depends on already exists; step 6 is wiring, not invention.

The regression suite is where the contract gets pinned. The file's module docstring spells out five named clauses and the tests are grouped under the clause they pin. C1 — heartbeats unblock honest slow tools is covered by two tests. The first asks the heartbeat tool to sleep for 0.8s against an 0.4s SSE read budget with an 0.1s heartbeat cadence, then asserts the call returns successfully with is_error=False, that the wall-clock elapsed time is at least tool_delay - 0.05 (so we know the response actually came back, not that we cut the call short), and that the elapsed time is less than tool_delay + 2.0 (so we know we did not silently retry). The second locks the no-spurious-retry invariant on the happy path by configuring a deliberately huge initial_backoff_seconds=1.5 and asserting the elapsed time stays well under that — if any phantom retry fired, the first backoff sleep would blow the bound.

C2 — server-side tool timeout still cancels runaway work asks the heartbeat tool for a 1.5s sleep with a 0.3s tool_timeout. Wire-level budgets are deliberately generous (sse_read_timeout=5.0, client_call_timeout=5.0) so the only timer that should fire is the server's own per-tool budget. The contract: the result lands with is_error=True (cancellation surfaces as a structured tool error, not a transport exception) and elapsed time is well below the natural delay.

C3 — client-side abort still bounds the wait asks the heartbeat tool for a 3.0s sleep but simulates a misconfigured server by setting the heartbeat interval to 2.0s — well above the 0.4s SSE read budget. With no help from the server, the client deadline must win, so we assert pytest.raises(ClientReadTimeoutError), that exc.attempt == policy.max_attempts (we exhausted retries), that exc.timeout_seconds == short_read_budget (the deadline metadata matches the configured budget), and that elapsed time stays below the natural tool delay. A single-attempt variant pins the same contract without retries hiding the timing shape.

C4 — a doomed call leaves the server healthy is the composite recovery test. We fire a doomed resilient call (slow heartbeat, short read budget, short client deadline, two attempts), assert it raises ClientReadTimeoutError, then immediately fire a second resilient call against the same fixture with sane parameters and assert it returns successfully with the expected payload. If the abort path had leaked a half-open transport or a wedged tool task back into the server process, the second call would either hang (caught by the outer asyncio.wait_for) or come back with is_error=True.

C5 — the diagnostic regression still reproduces is the most counter-intuitive test in the file: it asserts the original unfixed path still fails. The whole tutorial hangs off step 2's call_slow_echo + plain slow_echo reproducer — if a future refactor silently raises the default sse_read_timeout or routes the plain client through the heartbeat tool, the educational value of step 2 evaporates. We pin the failure shape (any BaseException raised before the natural delay completes) so that change shows up as a red test rather than a silent rewrite of history.

Three cross-cutting invariants live above the clauses. The first asserts DEFAULT_HEARTBEAT_INTERVAL_SECONDS < DEFAULT_SSE_READ_TIMEOUT — if a future change pushes the heartbeat interval above the read budget, the heartbeat-based fix becomes vacuous because the server would never get a chance to refresh the read clock before httpx tore the stream down. The second asserts DEFAULT_TOOL_TIMEOUT_SECONDS > DEFAULT_HEARTBEAT_INTERVAL_SECONDS — a tool timeout shorter than the heartbeat would never tick. The third asserts the retry constants stay finite and bounded (DEFAULT_MAX_ATTEMPTS >= 1, DEFAULT_INITIAL_BACKOFF_SECONDS >= 0, DEFAULT_BACKOFF_MULTIPLIER >= 1.0).

Every test has an outer asyncio.wait_for(..., timeout=HANG_OBSERVATION_BUDGET_SECONDS) wrapper set to 15 seconds. The number is picked to be larger than the longest deliberately-slow tool call in the file but small enough that a regression to the indefinite-hang failure mode manifests as a fast red test rather than a wedged CI job.

Test it

Run the full suite from the codebase/ directory:

uv run pytest tests/

Expected output:

============================= test session starts ==============================
platform darwin -- Python 3.12.5, pytest-9.0.3, pluggy-1.6.0
rootdir: codebase
configfile: pyproject.toml
plugins: asyncio-1.4.0, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 77 items

tests/test_client.py .....                                               [  6%]
tests/test_end_to_end.py ..........                                      [ 19%]
tests/test_heartbeat.py ..............                                   [ 37%]
tests/test_heartbeat_tool.py .........                                   [ 49%]
tests/test_resilient_client.py .......................                   [ 79%]
tests/test_server.py ........                                            [ 89%]
tests/test_tracing.py ........                                           [100%]

============================= 77 passed in 23.52s ==============================

Seventy-seven tests in total now: sixty-seven carried over from step 5, plus ten new end-to-end regression tests in tests/test_end_to_end.py. Three pin the cross-cutting default invariants (heartbeat interval below read timeout, tool timeout above heartbeat interval, retry constants finite). Two pin C1 (heartbeats survive a tight read budget; no spurious retry on the happy path). One pins C2 (server-side tool_timeout surfaces as is_error=True within its budget, not the natural delay). Two pin C3 (client deadline beats tool delay on both the resilient and single-attempt paths). One pins C4 (doomed call → fresh call recovery). One pins C5 (original slow_echo reproducer still fails). The whole suite finishes in roughly twenty-three seconds on the reference Python 3.12 + uv setup.

You can also drive the composed behaviour interactively. Start the server in one terminal:

uv run mcp-slow-server --host 127.0.0.1 --port 8765

And in a Python REPL in another:

import asyncio
from mcp_slow_server import (
    RetryPolicy,
    call_heartbeat_tool_resilient,
)

async def main():
    result = await call_heartbeat_tool_resilient(
        "http://127.0.0.1:8765/sse",
        message="composed",
        delay_seconds=2.0,
        heartbeat_interval=0.1,
        tool_timeout=5.0,
        sse_read_timeout=0.5,
        client_call_timeout=3.0,
        retry_policy=RetryPolicy(max_attempts=3),
    )
    print("is_error =", result.is_error)
    print("text     =", result.text)

asyncio.run(main())

The tool sleeps for two seconds against a half-second SSE read budget — a configuration that would have torn the stream down with the unmitigated step-2 client. With the composed fix wired up, the server emits heartbeats every 0.1s that refresh httpx's idle clock, the tool returns its payload after the natural delay, and the client prints is_error = False with the expected message text. Flip delay_seconds to a value above tool_timeout (say, delay_seconds=8.0 with tool_timeout=0.5) and you will get is_error=True within about half a second — server-side cancellation, not a hang. Flip heartbeat_interval above sse_read_timeout and you will get a ClientReadTimeoutError with attempt=3 — client-side abort, also not a hang. The same three knobs that the regression suite turns drive the same three observable shapes from the REPL.

What we got

The five-part timeout contract is now pinned end-to-end against the real uvicorn fixture. A caller imports one new public surface — call_heartbeat_tool_resilient — and gets the composed behaviour the bug demanded: a heartbeat-aware server-side tool that keeps the SSE stream alive while a legitimately slow call is working, a per-tool wall-clock budget that cancels a runaway tool with a structured is_error=True result, a client-side deadline that bounds the wait when the heartbeat cannot keep up and collapses the transport exception zoo into a single named ClientReadTimeoutError, a bounded exponential-backoff retry that opens a fresh transport every attempt so a doomed try cannot poison the next one, and a wire path that leaves the server process immediately ready for the next caller after either failure mode fires. The non-obvious progress-callback wiring that the MCP SDK requires to actually emit report_progress notifications onto the wire is now baked into the heartbeat helper, so nobody can compose the two halves of the fix and silently lose the heartbeat flow. The original call_slow_echo and slow_echo paths are deliberately untouched so the step 2 reproducer still pins the unmitigated hang in regression, and the new C5 test locks that behaviour against accidental masking. Ten new tests turn each clause of the contract into a wall-clock bound, a structured error shape, or a post-failure liveness check, and three cross-cutting default invariants guard the constants the fix depends on. The whole 77-test suite passes in about twenty-three seconds, which means a future refactor that drops a heartbeat, widens a deadline, swallows the named timeout, or routes the plain client through the heartbeat tool fails fast in CI rather than silently walking the fix back into the indefinite-hang failure mode that motivated this tutorial.

Repository

The companion code for this article: https://github.com/vytharion/mcp-sse-read-timeout-tool-hang

The state of the code after this step: 855aa8a

Key commits to step through:

  • d840d10 — step 1: scaffold the minimal FastMCP SSE server with one slow tool
  • 1296ceb — step 2: wire up an MCP client and reproduce the read-timeout hang on tool call
  • 4202474 — step 3: instrument the SSE stream to observe where the connection stalls
  • 17088ab — step 4: add server-side heartbeats and per-tool timeout cancellation
  • 37410bc — step 5: add client-side read-timeout handling with graceful retry and tool-call abort
  • 855aa8a — step 6: verify the fix end-to-end and write a regression test that pins the timeout contract