Step 1: Scaffold a minimal MCP server with SSE transport and one slow tool
What we're doing this step
The whole point of this series is to reproduce — and then fix — a specific failure mode: an MCP client wired to SSE transport that hangs (or disconnects with a confusing error) when a tool takes longer to return than the underlying socket's read budget. The Model Context Protocol is a JSON-RPC dialect that AI agents use to talk to tool servers; the SSE (Server-Sent Events) transport carries those messages over a long-lived HTTP stream so the client can keep receiving updates without polling. That model is elegant — until a tool takes longer to execute than the client's socket read timeout, at which point the connection silently dies and the tool call appears to wedge forever from the agent's perspective.
To investigate that, we first need a server with a tool that takes a
controllable amount of time to respond. So step 1 is intentionally
boring: we build the smallest possible MCP server, register exactly one
tool called slow_echo that sleeps for delay_seconds and then returns
the input message, and wire the SSE transport behind a uvicorn-style
ASGI app. No retries, no heartbeats, no cancellation — just enough
surface area to provoke the hang in step 2. The smaller this scaffold
is, the more confident we can be that any later hang is caused by the
transport rather than by something we added.
Setup
We use the official Python MCP SDK (mcp[cli]) so we get FastMCP's
declarative tool registration plus a ready-made SSE app. FastMCP is a
thin layer over the lower-level mcp.server module that lets you
register tools as regular Python coroutines and have the SDK derive the
JSON schema, generate the protocol-level tool listing, and wire up the
ASGI endpoints automatically. The project layout is the standard src/
layout with pyproject.toml, a tests/ folder, and uv as the
dependency manager. Two top-level files matter:
codebase/pyproject.toml— pinsmcp[cli]>=1.2.0anduvicorn>=0.30.0as runtime dependencies, pluspytest+pytest-asyncioas dev extras. It also declares the package asmcp_slow_serverunder thesrc/layout and exposes amcp-slow-serverconsole script that points atmcp_slow_server.server:main.codebase/src/mcp_slow_server/server.py— the single module that defines theslow_echocoroutine, abuild_server()factory, and a CLI entrypoint that boots SSE on127.0.0.1:8765by default.
The package is initialised via uv venv plus uv pip install -e .[dev], which gives us an editable install along with the test
dependencies. No global state, no plugins, no extra middleware. That
matters later: when the hang shows up, we want zero doubt that some
custom timeout handler is masking the real failure mode. We also keep
the src/ layout rather than a flat mcp_slow_server/ directory next
to tests/, because the flat layout occasionally lets Python import
the source from the working directory instead of the installed wheel —
which would hide packaging mistakes that real users would hit.
Two configuration knobs in pyproject.toml are worth pointing out:
asyncio_mode = "auto" lets us drop the @pytest.mark.asyncio
decorator from every coroutine test (since every interesting MCP call
is async, the decorator soup would dominate the test file), and
testpaths = ["tests"] keeps pytest from accidentally collecting
example scripts we may add later under examples/.
Implementation
The core of the module is one async function and one factory. Start with the tool itself:
DEFAULT_DELAY_SECONDS = 5.0
DEFAULT_HOST = "127.0.0.1"
DEFAULT_PORT = 8765
async def slow_echo(
message: str,
delay_seconds: float = DEFAULT_DELAY_SECONDS,
) -> str:
"""Return ``message`` after sleeping for ``delay_seconds``."""
if delay_seconds < 0:
raise ValueError("delay_seconds must be non-negative")
await asyncio.sleep(delay_seconds)
return message
The function is deliberately small. The signature is annotated so
FastMCP can derive the JSON schema for the tool's input arguments
without us hand-writing one — the parameter names message and
delay_seconds are therefore part of the public protocol contract a
client will see when it calls list_tools. We use asyncio.sleep (not
time.sleep) because FastMCP's tool dispatcher runs inside the same
event loop as the transport — blocking the loop would mask the timeout
bug we are chasing by also blocking the SSE keep-alive that the
transport relies on to detect dead clients. asyncio.sleep yields
cleanly, which is the realistic shape of a slow tool in production:
think "waiting on an external HTTP API" rather than "doing CPU work".
We reject negative delays early so the tool never silently returns
instantly on a bad payload; that's the kind of subtle correctness issue
that makes debugging the timeout much harder later. Imagine spending an
afternoon chasing why your reproducer "sometimes" hangs and "sometimes"
returns immediately, only to realise a stray -1 slipped through the
JSON. A ValueError at the door costs us nothing and saves that
debugging hour up front.
Next, the factory that wires the tool into a FastMCP instance:
def build_server(name: str = "slow-server") -> FastMCP:
"""Construct a ``FastMCP`` server with the ``slow_echo`` tool registered."""
server = FastMCP(name)
server.add_tool(
slow_echo,
name="slow_echo",
description="Echo the given message after sleeping delay_seconds.",
)
return server
Keeping construction in a factory (rather than at module import time)
is what lets the tests instantiate fresh servers with different names
and inspect them without booting the SSE transport. This separation
between building the server and running it shows up again in step 3
when we need to invoke tools in-process to confirm the bug lives in
transport-level reads, not in the tool itself. It also keeps the
module side-effect-free, which matters because pytest collects test
modules by importing them — if FastMCP were constructed at import
time, a misconfigured environment variable could blow up the whole
test session before any test ran.
Finally, the CLI entrypoint:
def main(argv: list[str] | None = None) -> None:
args = _parse_args(argv)
server = build_server()
server.settings.host = args.host
server.settings.port = args.port
server.run(transport="sse")
server.run(transport="sse") is the FastMCP one-liner that mounts the
Server-Sent Events endpoint at /sse, exposes a companion POST endpoint
for the client to send JSON-RPC requests through, and starts a uvicorn
worker on the configured host and port. We expose --host and
--port (with env-variable fallbacks MCP_HOST / MCP_PORT) so the
reproducer scripts in later steps can pin the bind address without
editing the source. The default 127.0.0.1:8765 is arbitrary but
stable — every subsequent step hardcodes it.
One detail to watch: FastMCP keeps host/port on a settings object
rather than as constructor arguments, so we mutate server.settings.host
and server.settings.port before calling run(). Trying to pass them
as kwargs to FastMCP(...) silently does nothing, which is exactly the
kind of papercut that wastes 20 minutes the first time you hit it.
Test it
Run the test suite from the codebase/ directory:
pytest
Expected output:
============================= test session starts ==============================
platform darwin -- Python 3.12.5, pytest-9.0.3, pluggy-1.6.0
rootdir: codebase
configfile: pyproject.toml
testpaths: tests
plugins: asyncio-1.4.0, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 8 items
tests/test_server.py ........ [100%]
============================== 8 passed in 1.22s ===============================
The eight tests cover the four invariants we care about for step 1:
the slow_echo coroutine returns its input, actually sleeps for the
requested duration, rejects negative delays, and is reachable through
FastMCP's tool API under the name slow_echo. Two more assertions
confirm the host/port constants are sensible and that server.sse_app()
returns a callable ASGI application — that last one is the smoke test
that the SSE transport is wired up at all.
To prove the transport itself boots, start the server in another terminal:
uv run mcp-slow-server --host 127.0.0.1 --port 8765
Uvicorn announces itself, the SSE endpoint goes live at
http://127.0.0.1:8765/sse, and the process blocks waiting for
connections. Hit Ctrl+C to stop it — there is no client yet, and that
is fine. We will write the client in step 2 and only then call into the
slow tool over the wire.
What we got
We now have a runnable MCP server with one tool whose latency we can
dial up by a single argument, plus a passing test suite that pins the
behaviour. The scaffold has no retry policy, no cancellation handling,
and no heartbeat — which is exactly what we want, because the read
timeout we will trigger in step 2 needs a server that cannot mask the
failure with any clever recovery. The surface area is small enough that
when the hang reproduces in step 2 we will know the bug lives in the
transport layer, not in our tool code. From here, step 2 will write a
small SSE client that connects to this server, calls slow_echo with a
delay longer than its socket read timeout, and lets us watch the hang
happen on the wire.
Repository
The companion code for this article: https://github.com/vytharion/mcp-sse-read-timeout-tool-hang
The state of the code after this step: d840d10
Key commits to step through:
d840d10— step 1: scaffold the minimal FastMCP SSE server with one slow tool
What we're doing this step
Step 1 left us with a server we can dial up to be arbitrarily slow but
without anything on the other side of the wire. Step 2 closes that
loop: we add a tiny MCP client that connects to the SSE endpoint,
runs the JSON-RPC initialize handshake, and invokes the slow_echo
tool with a caller-supplied delay_seconds plus a caller-supplied
sse_read_timeout. With those two knobs we can dial the tool's runtime
above the client's read budget and observe the exact failure mode this
article exists to explain — the SSE transport tears down on the read
side while a tool call is in flight, leaving the client either raising a
transport-level exception or appearing to wedge on call_tool forever.
That ambiguity ("sometimes a clean timeout, sometimes a hang") is itself
part of the bug, and pinning it in a test now is what gives later steps
a stable target to fix without us second-guessing whether the fix worked.
Setup
Four new files land in codebase/:
src/mcp_slow_server/client.py— thecall_slow_echoandlist_remote_toolscoroutines that wrapmcp.client.sse.sse_clientandmcp.ClientSession.src/mcp_slow_server/__main__.py— a thin shim sopython -m mcp_slow_serverboots the server without theRuntimeWarningyou get when__init__already pulls in submodules.tests/conftest.py— a session-scoped fixture that boots the FastMCP SSE app inside a daemon-threaduvicorn.Server, picks a free port, and yields ahttp://127.0.0.1:<port>/sseURL to tests.tests/test_client.py— five new tests covering the happy path, the latency observation, the deliberate hang, and a follow-up "fresh client still works after a doomed one" check.
The mcp SDK is already pinned from step 1, so no pyproject.toml
churn is needed. The only new runtime concept is the SSE read budget:
sse_client accepts a sse_read_timeout parameter that httpx uses to
decide how long to wait for the next event on the SSE stream before
aborting the read. That is the exact knob the bug rides on, so we
expose it as a parameter on call_slow_echo rather than hardcoding it.
We also keep tests/conftest.py deliberately heavyweight: it spins a
real uvicorn worker on a background thread instead of exercising the
tool in-process through server.call_tool. The in-process path is
faster and simpler, but it skips the SSE transport entirely — and the
SSE transport is where the bug lives. A test that "passes" by skipping
the broken code path is worse than no test.
Implementation
The client wrapper is a single async function whose only job is to open the SSE connection, run the handshake, fire one tool call, and close the connection. Splitting it any finer would make the bug harder to see, not easier — readers should be able to point at one place and say "that's the await that hangs".
async def call_slow_echo(
sse_url: str,
message: str,
delay_seconds: float,
*,
connect_timeout: float = DEFAULT_CONNECT_TIMEOUT,
sse_read_timeout: float = DEFAULT_SSE_READ_TIMEOUT,
) -> SlowEchoResult:
async with sse_client(
sse_url,
timeout=connect_timeout,
sse_read_timeout=sse_read_timeout,
) as (read_stream, write_stream):
async with ClientSession(read_stream, write_stream) as session:
await session.initialize()
call_result = await session.call_tool(
"slow_echo",
{"message": message, "delay_seconds": delay_seconds},
)
return SlowEchoResult(
text=_extract_text(call_result.content),
is_error=bool(call_result.isError),
)
A few design choices are worth calling out. connect_timeout and
sse_read_timeout are split into two separate parameters because they
map onto two distinct httpx timeout phases: connect-and-write versus
read-the-next-SSE-event. Folding them into one number would force the
caller to widen both budgets when they only care about one, which is
exactly the kind of papercut that hides the bug behind ambient slack.
The return type is a frozen dataclass rather than a tuple so the test
assertions read as result.is_error is False rather than result[1] is False — small thing, but it pays off the third time you stare at a
failed test output.
The _extract_text helper exists because call_tool returns a list of
TextContent (and possibly other) items. Concatenating only the
text attributes keeps the assertion side "hello" in result.text
instead of digging into .content[0].text every call site.
The interesting test is the deliberate hang reproducer:
async def test_short_sse_read_timeout_kills_a_slow_call(
sse_server_url: str,
) -> None:
tool_delay = 3.0
short_read_budget = 0.4
observation_budget = short_read_budget + 2.5
start = time.perf_counter()
with pytest.raises(BaseException) as exc_info:
await asyncio.wait_for(
call_slow_echo(
sse_server_url,
message="will-not-arrive",
delay_seconds=tool_delay,
sse_read_timeout=short_read_budget,
),
timeout=observation_budget,
)
elapsed = time.perf_counter() - start
assert elapsed < tool_delay + 1.0
assert exc_info.value is not None
Three numbers do the work. tool_delay = 3.0 is the time the server
sleeps inside slow_echo. short_read_budget = 0.4 is the SSE read
deadline the client gives httpx. Because the read budget is shorter
than the tool delay by a comfortable margin, httpx will tear down the
read stream while the tool is still sleeping. observation_budget is
the outermost asyncio.wait_for cap; it bounds how long pytest is
allowed to watch the hang. If the bug ever mutates from "noisy
exception" to "indefinite block", we still get a deterministic test
failure rather than a stuck CI worker.
The BaseException catch is intentional. Depending on which task wins
the race inside anyio's task group, the propagated exception can be
httpx.ReadTimeout, anyio.EndOfStream, anyio.ClosedResourceError,
or an asyncio.TimeoutError if the outer wait_for fires first. The
point of this test is not to declare a winner. The point is to pin
the failure: under the configured budgets, the call must fail in
strictly less than tool_delay + 1.0 seconds with some exception,
not return the wrong value silently. Future steps tighten this — once
we know how we want timeouts to surface, we can narrow the assertion
to the specific exception class we picked. For now, "pin that it
breaks" is the goal.
The companion test test_short_sse_read_timeout_does_not_corrupt_subsequent_call
covers the worry that one wedged SSE stream might poison the server
process for everyone else. We deliberately blow up the first call,
then open a brand new sse_client session with a generous budget and
confirm the second call still works. If this test ever flips, the bug
is bigger than "one client's read budget" — it would mean a wedged
client leaks state into the server, which would change which layer the
fix has to live in.
The uvicorn fixture is the other non-trivial piece. We pick a free
port up front so parallel pytest-xdist runs do not collide, build
the FastMCP app, hand its ASGI callable to a uvicorn.Server, and
spin a daemon thread that calls server.run(). We poll
server.started for up to five seconds so tests do not race the
boot-up, and on teardown we set should_exit = True and join the
thread. The fixture is per-test rather than session-scoped because
the doomed-call test deliberately leaves a half-dead SSE stream
behind, and we want every test to start from a clean server.
Test it
Run the suite from the codebase/ directory:
pytest
Expected output:
============================= test session starts ==============================
platform darwin -- Python 3.12.5, pytest-9.0.3, pluggy-1.6.0
rootdir: codebase
configfile: pyproject.toml
testpaths: tests
plugins: asyncio-1.4.0, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 13 items
tests/test_client.py ..... [ 38%]
tests/test_server.py ........ [100%]
============================== 13 passed in 3.52s ==============================
Thirteen tests now: eight from step 1 plus five new client tests. The
two failure-shape tests (test_short_sse_read_timeout_kills_a_slow_call
and test_short_sse_read_timeout_does_not_corrupt_subsequent_call) are
the ones that pin the bug. Both pass because the bug fires —
pytest.raises(BaseException) succeeds when the SSE read deadline
expires and tears the stream down. If a future change ever made
call_slow_echo silently return after tool_delay seconds despite the
short read budget, those two tests would fail and tell us the failure
mode shifted.
You can also drive it interactively. Start the server in one terminal:
uv run mcp-slow-server --host 127.0.0.1 --port 8765
And in a Python REPL in another:
import asyncio
from mcp_slow_server.client import call_slow_echo
asyncio.run(call_slow_echo(
"http://127.0.0.1:8765/sse",
message="hello",
delay_seconds=3.0,
sse_read_timeout=0.5,
))
That call will not return cleanly. Either you get an ExceptionGroup
out of anyio with httpx.ReadTimeout inside it, or the call appears to
sit forever waiting on session.call_tool. Both are the bug.
What we got
We now have a client that can talk to the step 1 server, plus a pytest
suite that pins both the happy path and the read-timeout failure
shape. The reproducer is deterministic enough to land in CI — the
outer asyncio.wait_for caps wall-clock budget at well under ten
seconds, so the suite never wedges even if the bug morphs from "raises
quickly" into "hangs forever". The five new tests give us the
regression net we will need in step 3 when we introduce the first
real fix attempt: any change that makes the slow call silently return
the wrong value, or that leaks a wedged stream into the next session,
will flip a test red. That is the foundation the rest of the article
builds on.
Repository
The companion code for this article: https://github.com/vytharion/mcp-sse-read-timeout-tool-hang
The state of the code after this step: 1296ceb
Key commits to step through:
d840d10— step 1: scaffold the minimal FastMCP SSE server with one slow tool1296ceb— step 2: wire up an MCP client and reproduce the read-timeout hang on tool call
What we're doing this step
Step 2 left us with a reproducer that fails — sometimes loudly, sometimes
by appearing to wedge — but it does not yet tell us where the failure
happens. The call_slow_echo coroutine awaits four distinct things in
sequence: opening the SSE transport, opening a ClientSession, running
the JSON-RPC initialize handshake, and finally call_tool. Any one of
those awaits could be the one that never returns, and without
instrumentation we are guessing. Step 3 is the diagnostic step: we
introduce a tiny in-process tracer that drops a monotonic-clock checkpoint
before and after each of those phases, keeps the checkpoints in a
caller-owned object that survives an exception, and exposes a gap_after
helper so a test can assert "the stall sat on tool_call_start for ~N
seconds before the read timeout fired". That single observation — the
last recorded stage on a doomed call is always tool_call_start — is
what tells us, with no more hand-waving, that the bug lives on the
read side of the tool-call response, not in the handshake, not in the
transport open, and not in our tool dispatch. Every later step in this
series is going to assume that fact, so we pin it in a test now.
Setup
Two new files land in codebase/, plus a one-line re-export update:
src/mcp_slow_server/tracing.py— aSseTracedataclass, aTraceEventrecord type, theSTAGE_*string constants, and atraced_call_slow_echocoroutine that mirrorscall_slow_echobut records a checkpoint at every phase boundary.tests/test_tracing.py— eight new tests covering the unit-level trace bookkeeping (empty trace, append order, gap arithmetic, monotonic timestamps) plus the four wire-level cases we actually care about: happy-path stages in order, the tool-gap matches the requested delay, the doomed call stalls exactly attool_call_start, andtrace.last_stagelocalises the failure for the assertion.src/mcp_slow_server/__init__.py— re-exports the new public names so callers can import everything from the package root instead of reaching into the submodule.
No new runtime dependencies. The tracer is pure stdlib: time.monotonic
for clock readings, a frozen dataclass for events, a mutable
dataclass for the trace itself. We deliberately do not pull in
structlog or opentelemetry here. The goal of the tracer is to make
this one bug observable from a test assertion, not to bolt a
production observability stack onto a 200-line reproducer. If we ever
graduate this code into something real, swapping the recorder for an
OTel span emitter is a one-function change — the stage names already
read like span names by design.
Implementation
The core type is a stage-stamped event log keyed off a monotonic clock that starts when the trace is constructed:
@dataclass
class SseTrace:
events: list[TraceEvent] = field(default_factory=list)
_started_at: float = field(default_factory=time.monotonic)
def record(self, stage: str, detail: str = "") -> None:
elapsed = time.monotonic() - self._started_at
self.events.append(
TraceEvent(elapsed_seconds=elapsed, stage=stage, detail=detail),
)
@property
def stages(self) -> list[str]:
return [event.stage for event in self.events]
@property
def last_stage(self) -> str | None:
if not self.events:
return None
return self.events[-1].stage
def gap_after(self, stage: str) -> float | None:
for index, event in enumerate(self.events):
if event.stage != stage:
continue
if index + 1 >= len(self.events):
return None
return self.events[index + 1].elapsed_seconds - event.elapsed_seconds
return None
def __iter__(self) -> Iterator[TraceEvent]:
return iter(self.events)
stages and __iter__ are convenience views over events. The tests
read STAGE_TOOL_CALL_START in trace.stages instead of walking the
event list themselves, and the REPL example below uses for event in trace to print the timeline — both stay readable without coupling the
caller to the underlying list shape.
Two design choices are load-bearing. First, the trace is
caller-owned — the test constructs the SseTrace, hands it into
traced_call_slow_echo, and inspects it after the call returns or
raises. If the tracer owned the trace internally and returned it on
success, the doomed-call test would have nothing to inspect because the
function never returns. Caller-owned state survives the exception,
which is exactly the property we need for a hang-shaped bug. Second,
gap_after returns None for both "stage never recorded" and "stage
was the last event". Folding those into a single None keeps the
assertion side readable (if trace.gap_after("tool_call_start") is None: ...) and makes "we stalled right after the dispatch" — which is
what a hang looks like — a single, named condition rather than a
multi-branch check.
STAGE_* constants are module-level strings rather than an Enum
because the trace gets dumped to logs and compared in test assertions,
and bare strings round-trip through both without coercion noise:
STAGE_CONNECT_OPEN = "connect_open"
STAGE_TRANSPORT_READY = "transport_ready"
STAGE_SESSION_OPEN = "session_open"
STAGE_INITIALIZE_START = "initialize_start"
STAGE_INITIALIZE_DONE = "initialize_done"
STAGE_TOOL_CALL_START = "tool_call_start"
STAGE_TOOL_CALL_DONE = "tool_call_done"
STAGE_SESSION_CLOSE = "session_close"
STAGE_CONNECT_CLOSE = "connect_close"
The instrumented coroutine is a near-line-for-line mirror of step 2's
call_slow_echo, with trace.record(...) calls sandwiching every
await that could be the stall point:
async def traced_call_slow_echo(
sse_url: str,
message: str,
delay_seconds: float,
*,
trace: SseTrace,
connect_timeout: float = DEFAULT_CONNECT_TIMEOUT,
sse_read_timeout: float = DEFAULT_SSE_READ_TIMEOUT,
) -> SlowEchoResult:
trace.record(STAGE_CONNECT_OPEN, f"read_timeout={sse_read_timeout}")
async with sse_client(
sse_url,
timeout=connect_timeout,
sse_read_timeout=sse_read_timeout,
) as (read_stream, write_stream):
trace.record(STAGE_TRANSPORT_READY)
async with ClientSession(read_stream, write_stream) as session:
trace.record(STAGE_SESSION_OPEN)
trace.record(STAGE_INITIALIZE_START)
await session.initialize()
trace.record(STAGE_INITIALIZE_DONE)
trace.record(
STAGE_TOOL_CALL_START,
f"tool=slow_echo delay={delay_seconds}",
)
call_result = await session.call_tool(
"slow_echo",
{"message": message, "delay_seconds": delay_seconds},
)
trace.record(STAGE_TOOL_CALL_DONE, f"is_error={call_result.isError}")
trace.record(STAGE_SESSION_CLOSE)
trace.record(STAGE_CONNECT_CLOSE)
return SlowEchoResult(
text=_extract_text(call_result.content),
is_error=bool(call_result.isError),
)
There is deliberately no try/except here. The codebase rule forbids
nested try/except blocks anyway, but more importantly: if we caught
the exception inside the tracer we would have to either re-raise it
(no value added) or swallow it (which would hide the bug). The
async with blocks already guarantee sse_client and ClientSession
get torn down on exception, and the caller's outer asyncio.wait_for
still bounds wall-clock time. So when the read timeout fires inside
session.call_tool, the exception propagates straight out, the
async with exits skip every subsequent trace.record, and we are
left with a trace whose last_stage is tool_call_start — which is
exactly the assertable evidence we wanted.
The test that closes the loop is the one that earns step 3 its keep:
async def test_trace_localizes_stall_to_tool_call_when_read_budget_too_short(
sse_server_url: str,
) -> None:
trace = SseTrace()
tool_delay = 3.0
short_read_budget = 0.4
observation_budget = short_read_budget + 2.5
with pytest.raises(BaseException):
await asyncio.wait_for(
traced_call_slow_echo(
sse_server_url,
message="never-arrives",
delay_seconds=tool_delay,
sse_read_timeout=short_read_budget,
trace=trace,
),
timeout=observation_budget,
)
assert STAGE_INITIALIZE_DONE in trace.stages
assert STAGE_TOOL_CALL_START in trace.stages
assert STAGE_TOOL_CALL_DONE not in trace.stages
assert STAGE_SESSION_CLOSE not in trace.stages
The four assertions are arranged as a triangulation, not a redundancy.
INITIALIZE_DONE in stages proves the handshake completed, so we know
the SSE transport itself was healthy up to the tool dispatch.
TOOL_CALL_START in stages proves the dispatch happened — the client
did manage to write the JSON-RPC request. TOOL_CALL_DONE not in stages
proves the response never landed within the read budget.
SESSION_CLOSE not in stages proves the failure happened during the
tool call, not in some unrelated teardown step that ran after a
successful return. Together they pinpoint the bug to one specific
await on one specific line, in a form a future regression can break
loudly against. The companion test test_trace_last_stage_is_tool_call_start_on_doomed_call
collapses that triangulation to a single assertion (trace.last_stage == STAGE_TOOL_CALL_START) which is what the prose in later steps will
quote when describing the bug.
Test it
Run the suite from the codebase/ directory:
pytest
Expected output:
============================= test session starts ==============================
platform darwin -- Python 3.12.5, pytest-9.0.3, pluggy-1.6.0
rootdir: codebase
configfile: pyproject.toml
testpaths: tests
plugins: asyncio-1.4.0, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 21 items
tests/test_client.py ..... [ 23%]
tests/test_server.py ........ [ 61%]
tests/test_tracing.py ........ [100%]
============================== 21 passed in 6.27s ==============================
Twenty-one tests now: eight from step 1, five from step 2, and eight
new tracing tests. The two that matter most for diagnosis are
test_trace_localizes_stall_to_tool_call_when_read_budget_too_short
and test_trace_last_stage_is_tool_call_start_on_doomed_call. Both
pass for the same reason step 2's reproducer passed — because the bug
fires — but they additionally encode where the bug fires. If a
future change shifted the failure to, say, the handshake, those
assertions would flip red and tell us the bug mode had moved.
You can also inspect a trace interactively. Run the server in one terminal:
uv run mcp-slow-server --host 127.0.0.1 --port 8765
And from a REPL in another:
import asyncio
from mcp_slow_server import SseTrace, traced_call_slow_echo
trace = SseTrace()
try:
asyncio.run(traced_call_slow_echo(
"http://127.0.0.1:8765/sse",
message="hi",
delay_seconds=3.0,
sse_read_timeout=0.5,
trace=trace,
))
except BaseException as exc:
print(f"raised: {type(exc).__name__}")
for event in trace:
print(f"{event.elapsed_seconds:6.3f}s {event.stage} {event.detail}")
print(f"last_stage = {trace.last_stage}")
The printed timeline lands every phase up through tool_call_start,
then nothing — the trace's last event sits on the dispatch, the wall
clock between that line and the raised exception is roughly the read
budget, and last_stage reads back as tool_call_start. That is the
observation step 4 onward will work to eliminate.
What we got
We added an in-process tracer that turns the read-timeout hang from a
folklore event ("sometimes it raises, sometimes it wedges") into a
named, assertable failure shape (trace.last_stage == "tool_call_start").
The tracer has zero runtime dependencies, costs a handful of time.monotonic
reads per call, and is wired into a caller-owned object so it survives
the very exceptions we are trying to diagnose. Eight new tests pin the
behaviour: four unit tests on the trace bookkeeping itself, two
happy-path wire tests that confirm the full nine-stage timeline is
recorded in order with realistic gaps, and two doomed-call tests that
lock the bug's location into the test suite. With the bug now pinned
to a single await on a single line, step 4 can stop arguing about
whether the failure is transport-level versus tool-level and start
working on the actual fix: server-side heartbeats plus per-tool
timeout cancellation that keep the SSE stream alive while a slow tool
is still running.
Repository
The companion code for this article: https://github.com/vytharion/mcp-sse-read-timeout-tool-hang
The state of the code after this step: 4202474
Key commits to step through:
d840d10— step 1: scaffold the minimal FastMCP SSE server with one slow tool1296ceb— step 2: wire up an MCP client and reproduce the read-timeout hang on tool call4202474— step 3: instrument the SSE stream to observe where the connection stalls
What we're doing this step
Step 3 pinned the bug to a single observable fact: on a doomed call,
the trace's last_stage is always tool_call_start. The SSE transport
itself was healthy through the handshake, the request was written, and
then the client's idle read clock fired before anything came back the
other way. That diagnosis points at exactly two interventions, and step
4 lands both of them at once because they only make sense together.
First, the server has to put something on the SSE stream while a slow
tool is still running — even a one-byte progress notification is enough
to reset the client's idle read clock, because httpx measures
"connection looks dead" as "no event has arrived in N seconds", not
"the tool has not returned in N seconds". Second, the server needs a
hard wall-clock budget on the tool itself, so a genuinely runaway tool
gets cancelled with a structured error instead of pumping heartbeats
forever. The first control keeps the transport honest; the second
keeps the tool honest. We deliberately implement them as two
independent helpers wired through a single run_with_heartbeat
coroutine, register a new slow_echo_with_heartbeat tool that uses
both, leave the original slow_echo untouched so the step 2 and step 3
reproducers still pin the bug, and add fourteen new tests that lock in
the new behaviour — including a wire-level test that proves a healthy
follow-up call still works after a previous call blew its own timeout.
Setup
Two new files land in codebase/, plus a tool registration in the
existing server.py:
src/mcp_slow_server/heartbeat.py— theToolTimeoutErrorexception, theHeartbeatEmittercallable alias, the internal_heartbeat_looptask, and the publicrun_with_heartbeatcoroutine that wraps a unit of work with both controls.src/mcp_slow_server/server.py— adds theslow_echo_with_heartbeattool function, a_progress_emitter_for(ctx, total)helper that builds areport_progress-backed emitter, and registers the new tool inbuild_serveralongside the legacyslow_echo.tests/test_heartbeat.pyandtests/test_heartbeat_tool.py— split the new test surface in two: the first file unit-tests therun_with_heartbeatprimitive in isolation (cadence, cancellation, exception passthrough, no-op emitter, validation), and the second exercises the registered tool end-to-end, including over the real SSE transport on the uvicorn fixture.
No new runtime dependencies. Heartbeats are an asyncio.Event plus
asyncio.wait_for; the per-tool timeout is another asyncio.wait_for.
We deliberately do not add a job-scheduler library or an OTel exporter.
The point of step 4 is to close the hang with the smallest possible
amount of new surface area — anything richer can be layered on once
the read-timeout failure mode is dead.
Implementation
The heart of the change is a single helper that runs a unit of work
while a background task fires an emit callback on a fixed cadence,
and stops cleanly whether the work returns, raises, or blows its
timeout:
async def run_with_heartbeat(
work: Awaitable[T],
*,
emit: HeartbeatEmitter,
heartbeat_interval: float = DEFAULT_HEARTBEAT_INTERVAL_SECONDS,
tool_timeout: float = DEFAULT_TOOL_TIMEOUT_SECONDS,
tool_name: str = "tool",
) -> T:
_validate_intervals(heartbeat_interval, tool_timeout)
stop_event = asyncio.Event()
heartbeat_task = asyncio.create_task(
_heartbeat_loop(emit, heartbeat_interval, stop_event),
)
try:
return await asyncio.wait_for(work, timeout=tool_timeout)
except asyncio.TimeoutError as exc:
raise ToolTimeoutError(tool_name, tool_timeout) from exc
finally:
stop_event.set()
await _drain_task(heartbeat_task)
Three design choices are doing the heavy lifting here. First, the two
budgets are passed as separate parameters, not as a single number. A
30-second tool running on a 1-second heartbeat is a legitimate
configuration — the tool is slow on purpose, but the transport must
still see traffic every second. Folding them into one knob would force
the caller to widen the wrong budget every time they wanted to tune
the other, which is exactly the kind of papercut that makes operators
disable heartbeats entirely. Second, asyncio.TimeoutError is
re-raised as a named ToolTimeoutError with the original chained on
as __cause__. The named subclass carries tool_name and
timeout_seconds, which makes the failure greppable in logs and
distinguishes it from any other asyncio.TimeoutError that happens
to bubble up — for example, one fired by the client's outer
wait_for. Third, the finally block always sets stop_event and
always awaits the heartbeat task through _drain_task. We never leak
a background task even if the work raises, and a failing emit
callback is intentionally swallowed inside _drain_task so a
misbehaving heartbeat cannot mask the real outcome of the work.
The heartbeat loop itself is one of the few places in the codebase
that needs asyncio.wait_for inside a try/except — and only one
level deep, because the codebase rule bans nested try/except:
async def _heartbeat_loop(
emit: HeartbeatEmitter,
interval_seconds: float,
stop_event: asyncio.Event,
) -> int:
count = 0
while not stop_event.is_set():
try:
await asyncio.wait_for(
stop_event.wait(),
timeout=interval_seconds,
)
return count
except asyncio.TimeoutError:
count += 1
await emit(count)
return count
The pattern is "wait for the stop signal, with a deadline". If the
stop signal arrives first, wait_for returns and we exit. If the
deadline arrives first, wait_for raises TimeoutError, we treat
that as "another interval elapsed", bump the counter, and emit. The
counter is what the emit callback sees, which lets a
report_progress-backed emitter pass a monotonically increasing
progress value without having to keep its own state. The shape also
means the loop sleeps as little as possible on shutdown — when
stop_event is set during the wait, wait_for unblocks immediately
instead of running out the rest of the interval.
The MCP tool that consumes this helper is intentionally small:
async def slow_echo_with_heartbeat(
message: str,
delay_seconds: float = DEFAULT_DELAY_SECONDS,
heartbeat_interval: float = DEFAULT_HEARTBEAT_INTERVAL_SECONDS,
tool_timeout: float = DEFAULT_TOOL_TIMEOUT_SECONDS,
ctx: Context | None = None,
) -> str:
emit = _progress_emitter_for(ctx, total=delay_seconds)
work = slow_echo(message, delay_seconds=delay_seconds)
return await run_with_heartbeat(
work,
emit=emit,
heartbeat_interval=heartbeat_interval,
tool_timeout=tool_timeout,
tool_name=SLOW_ECHO_HEARTBEAT_TOOL_NAME,
)
The ctx: Context | None annotation is the FastMCP idiom for
auto-injected context. FastMCP looks at the type hint, recognises
Context, and injects the live request context at call time without
exposing the parameter on the wire schema. The
test_build_server_heartbeat_tool_schema_hides_context test enforces
that — properties lists message, delay_seconds,
heartbeat_interval, and tool_timeout but never ctx. The
_progress_emitter_for helper closes over ctx and total, so the
emitter passed into run_with_heartbeat is just a one-argument async
callable that doesn't need to know whether the wire peer exists:
def _progress_emitter_for(
ctx: Context | None,
total: float | None,
) -> HeartbeatEmitter:
if ctx is None:
return make_noop_emitter()
async def _emit(count: int) -> None:
await ctx.report_progress(progress=float(count), total=total)
return _emit
The ctx is None branch is what lets the same code path run inside a
unit test without a wire peer. It's not a workaround — it's the
explicit contract that lets slow_echo_with_heartbeat be tested
directly with await slow_echo_with_heartbeat(...) instead of having
to spin a uvicorn worker for every assertion. The wire-level tests
still exist for the round-trip path, but the cadence and cancellation
behaviour is unit-tested cheaply.
The most important test on the new surface is the one that proves a follow-up call still works after a previous call blew its own timeout:
async def test_heartbeat_tool_on_wire_returns_tool_error_when_timeout_fires(
sse_server_url: str,
) -> None:
doomed = await asyncio.wait_for(
_call_heartbeat_tool(
sse_server_url,
message="blown",
delay_seconds=2.0,
heartbeat_interval=0.1,
tool_timeout=0.3,
sse_read_timeout=5.0,
),
timeout=HANG_OBSERVATION_BUDGET_SECONDS,
)
assert doomed.is_error is True
healthy = await asyncio.wait_for(
call_slow_echo(
sse_server_url,
message="recovered",
delay_seconds=0.1,
sse_read_timeout=5.0,
),
timeout=HANG_OBSERVATION_BUDGET_SECONDS,
)
assert healthy.is_error is False
assert "recovered" in healthy.text
This is the regression target. Before step 4, a misbehaving tool could
leak a wedged SSE stream into the server process, and the next client
session might inherit the wreckage. After step 4, the timeout
cancels the work cleanly, the SSE response carries a structured
isError=True payload, and the server is immediately ready for the
next call. The two asyncio.wait_for wrappers around each remote call
are belt-and-braces: the inner assertions speak to the tool behaviour,
and the outer budgets guarantee the test fails fast instead of wedging
CI if a future change reintroduces the original hang.
Test it
Run the suite from the codebase/ directory:
pytest
Expected output:
============================= test session starts ==============================
platform darwin -- Python 3.12.5, pytest-9.0.3, pluggy-1.6.0
rootdir: codebase
configfile: pyproject.toml
testpaths: tests
plugins: asyncio-1.4.0, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 44 items
tests/test_client.py ..... [ 11%]
tests/test_heartbeat.py .............. [ 43%]
tests/test_heartbeat_tool.py ......... [ 63%]
tests/test_server.py ........ [ 81%]
tests/test_tracing.py ........ [100%]
============================= 44 passed in 11.59s ==============================
Forty-four tests now: eight from step 1, five from step 2, eight from
step 3, plus twenty-three new tests across test_heartbeat.py
(fourteen unit tests on the primitive) and test_heartbeat_tool.py
(nine tests on the registered tool, two of which exercise the real
SSE round trip). The two that pay off the step are
test_heartbeat_tool_keeps_sse_alive_past_short_read_budget — which
runs a 1.2-second tool over a generous read budget and confirms the
end-to-end call returns — and the
test_heartbeat_tool_on_wire_returns_tool_error_when_timeout_fires
case shown above, which is what locks in "a doomed call no longer
poisons the server".
You can also drive it interactively. Start the server in one terminal:
uv run mcp-slow-server --host 127.0.0.1 --port 8765
And in a Python REPL in another:
import asyncio
from mcp import ClientSession
from mcp.client.sse import sse_client
async def main():
async with sse_client(
"http://127.0.0.1:8765/sse",
timeout=5.0,
sse_read_timeout=2.0,
) as (read_stream, write_stream):
async with ClientSession(read_stream, write_stream) as session:
await session.initialize()
result = await session.call_tool(
"slow_echo_with_heartbeat",
{
"message": "alive",
"delay_seconds": 10.0,
"heartbeat_interval": 0.5,
"tool_timeout": 15.0,
},
)
print("isError =", result.isError)
print("content =", result.content)
asyncio.run(main())
A 10-second tool runs on a 2-second SSE read budget without tearing
the stream down, because heartbeats arrive every 500ms and reset the
client's idle read clock. Swap the call to plain slow_echo with
the same numbers and the read-timeout hang from earlier steps fires
again — proof that the fix is in the combination of "the tool emits
heartbeats" and "the budget is bounded", not in any change to the
client.
What we got
The read-timeout hang is closed. Tools that legitimately need to run
longer than the SSE read budget now register on the heartbeat-aware
path, fire progress notifications every heartbeat_interval seconds
to keep the transport alive, and are cancelled with a structured
ToolTimeoutError if they overrun tool_timeout. The original
slow_echo is intentionally untouched so the reproducers from step 2
and step 3 still pin the original failure — that gives us a paired
"broken tool / fixed tool" surface to point at in regression tests.
Twenty-three new tests cover the primitive (cadence, cancellation,
exception passthrough, no-op emitter, intolerance of zero or negative
budgets, failing-emit resilience), the registered MCP tool (direct
invocation, schema shape, FastMCP context injection, wire-level
listing, wire-level happy path, wire-level timeout that returns a
structured error and leaves the server ready for the next call), and
the entire 44-test suite passes in roughly eleven seconds. The bug
that opened this article — "the client either raises noisily or wedges
forever when a tool outruns the SSE read budget" — has been replaced
with two well-defined outcomes: either the tool finishes and the
result returns, or the tool blows its own budget and the client sees
an isError=True content payload on the same wire that stayed healthy
through every heartbeat in between.
Repository
The companion code for this article: https://github.com/vytharion/mcp-sse-read-timeout-tool-hang
The state of the code after this step: 17088ab
Key commits to step through:
d840d10— step 1: scaffold the minimal FastMCP SSE server with one slow tool1296ceb— step 2: wire up an MCP client and reproduce the read-timeout hang on tool call4202474— step 3: instrument the SSE stream to observe where the connection stalls17088ab— step 4: add server-side heartbeats and per-tool timeout cancellation
What we're doing this step
Step 4 closed the read-timeout hang from the server side. Heartbeats
keep the SSE stream alive while a legitimately slow tool is still
working, and a per-tool wall-clock budget cancels a runaway tool with a
structured ToolTimeoutError so the next call gets a clean wire. That
is enough when every tool registered on the server cooperates with the
heartbeat-aware path — but a real deployment will always carry at least
one tool that doesn't, or face an upstream that drops bytes for reasons
neither the server nor the client can negotiate around. Step 5 fixes
the client side of that contract. The work splits into three jobs
that the existing call_slow_echo does not do. First, detect that
the read budget elapsed regardless of which third-party class wins the
teardown race — httpx.ReadTimeout, anyio.EndOfStream,
anyio.ClosedResourceError, or a raw asyncio.TimeoutError from the
client's own outer wait_for — and collapse all of them into one named
ClientReadTimeoutError so the caller never has to import a transport
exception class to pattern-match on the failure. Second, abort the
in-flight tool on the server promptly, by deliberately letting the
async with sse_client(...) block tear down on the exception path so
the SSE stream closes and the server observes a disconnect. Third,
retry transient read-timeouts under a bounded exponential-backoff
schedule, where every retry opens a fresh sse_client session so a
doomed attempt never poisons the next one. The new helpers live in
src/mcp_slow_server/resilient_client.py, the original call_slow_echo
stays untouched so the step 2 reproducer still pins the unmitigated
hang, and twenty-three new tests lock in the unit-level shape plus
three full wire-level scenarios on the uvicorn fixture.
Setup
One new module lands in codebase/ and one new test file pairs with
it:
src/mcp_slow_server/resilient_client.py— exposesClientReadTimeoutError,RetryPolicy,call_slow_echo_once,retry_on_read_timeout, and the public entry pointcall_slow_echo_resilient. Internal helpers (_matches_read_timeout,_is_read_timeout) recursively unwrapExceptionGroupshapes so anyio's nested task-group teardowns classify correctly.tests/test_resilient_client.py— twenty-three tests split into three layers: constant/dataclass sanity (six), pure-Python classification of the read-timeout shape (six), unit-level retry-loop behaviour against in-memory async stubs (five), and wire-level scenarios against the uvicorn fixture (six, including the "follow-up call still works" regression target).
No new runtime dependencies. We pull asyncio.wait_for and
asyncio.sleep from stdlib, reuse mcp.client.sse.sse_client /
mcp.ClientSession from the existing client, and lean on the
test-time sse_server_url fixture already wired up in step 2's
conftest.py. We deliberately do not add a retry library like
tenacity — the policy fits in two dataclass fields and one
arithmetic expression, and the cost of a third-party retry surface
(decorator state, hidden sleeps, jitter dispatch tables) is not worth
the abstraction at this scale.
Implementation
The named failure type is the contract the caller depends on. It carries the four pieces of metadata that downstream consumers — logs, alerts, retry decisions — always want, and stringifies into a human-readable message that already has them spliced in:
class ClientReadTimeoutError(Exception):
def __init__(
self,
tool_name: str,
sse_url: str,
attempt: int,
timeout_seconds: float,
) -> None:
super().__init__(
f"Client read-timeout on tool {tool_name!r} after "
f"{timeout_seconds:.3f}s (attempt {attempt}, url={sse_url})",
)
self.tool_name = tool_name
self.sse_url = sse_url
self.attempt = attempt
self.timeout_seconds = timeout_seconds
The attempt field is what turns a single timeout into a useful trace
inside the retry loop — "attempt 3 of 3" reads differently from
"attempt 1 of 3" even when the underlying transport exception is the
same. The tool_name and sse_url mean a single
except ClientReadTimeoutError as exc: block in caller code has
everything it needs for a structured log line; nobody has to drag the
SSE URL down from outer scope.
Classification of the transport zoo is the second piece. anyio task
groups wrap stream-closure exceptions inside an ExceptionGroup, so a
single except clause cannot rely on isinstance against the leaf
class. We walk the exceptions attribute recursively so arbitrary
group nesting unwraps correctly:
_READ_TIMEOUT_EXCEPTION_NAMES = frozenset(
{
"ReadTimeout",
"ReadError",
"EndOfStream",
"ClosedResourceError",
"BrokenResourceError",
},
)
def _matches_read_timeout(exc: BaseException) -> bool:
if isinstance(exc, asyncio.TimeoutError):
return True
return type(exc).__name__ in _READ_TIMEOUT_EXCEPTION_NAMES
def _is_read_timeout(exc: BaseException) -> bool:
if _matches_read_timeout(exc):
return True
sub_exceptions = getattr(exc, "exceptions", None)
if sub_exceptions is None:
return False
return any(_is_read_timeout(sub) for sub in sub_exceptions)
Matching by class name rather than isinstance is intentional. The
module never imports httpx or anyio — it works against whatever
versions the MCP SDK has pulled in, and survives minor version moves
in either package without a code change. The trade-off is real:
somebody could in principle write a custom class EndOfStream(Exception)
in unrelated code and have it classified as a read-timeout. That trade
is the right one. The classes we care about are widely-known names
inside two well-known packages, and the alternative — pinning hard
imports — drags two transport dependencies into a module whose entire
purpose is to hide them from the caller. The two tests
test_is_read_timeout_unwraps_exception_group and
test_is_read_timeout_unwraps_nested_exception_groups cover the recursion
explicitly; test_is_read_timeout_rejects_unrelated_exceptions and
test_is_read_timeout_rejects_exception_group_of_unrelated cover the
negative direction.
The single-attempt call is where the abort actually happens. The
call_tool await is wrapped in asyncio.wait_for with an explicit
client_call_timeout deadline that defaults to sse_read_timeout:
async def call_slow_echo_once(
sse_url: str,
message: str,
delay_seconds: float,
*,
attempt: int = 1,
connect_timeout: float = DEFAULT_CONNECT_TIMEOUT,
sse_read_timeout: float = DEFAULT_SSE_READ_TIMEOUT,
client_call_timeout: float | None = None,
) -> SlowEchoResult:
deadline = (
client_call_timeout
if client_call_timeout is not None
else sse_read_timeout
)
try:
async with sse_client(
sse_url,
timeout=connect_timeout,
sse_read_timeout=sse_read_timeout,
) as (read_stream, write_stream):
async with ClientSession(read_stream, write_stream) as session:
await session.initialize()
call_result = await asyncio.wait_for(
session.call_tool(
"slow_echo",
{"message": message, "delay_seconds": delay_seconds},
),
timeout=deadline,
)
return SlowEchoResult(
text=_extract_text(call_result.content),
is_error=bool(call_result.isError),
)
except BaseException as exc:
if _is_read_timeout(exc):
raise ClientReadTimeoutError(
tool_name="slow_echo",
sse_url=sse_url,
attempt=attempt,
timeout_seconds=deadline,
) from exc
raise
Two design decisions inside this function are doing the heavy lifting.
First, the deadline is an explicit parameter, not "just reuse
sse_read_timeout". sse_read_timeout controls how long the SSE
client tolerates an idle stream — that is the right number when
heartbeats are flowing, and a terrible number when they are not,
because the per-call deadline wants to be slightly tighter than the
transport's own idle clock so the abort lands cleanly through the
context manager teardown rather than as a noisy transport error.
Defaulting client_call_timeout to sse_read_timeout keeps the simple
case simple; exposing it as a separate parameter lets the test suite
(and real callers) tune the two clocks independently. Second, the
async with sse_client(...) block is outside the wait_for, not
inside. When wait_for fires its asyncio.TimeoutError, the exception
unwinds through the ClientSession and sse_client context manager
exits, which is exactly what tears the SSE stream down and gives the
server the disconnect signal. If sse_client were inside wait_for,
the cancellation would race the context manager's own cleanup and we
would leak a half-open transport. The single-level try/except around
the whole block obeys the codebase's no-nested-try rule and still
catches every shape — BaseException because anyio task-group teardowns
are technically BaseExceptionGroup, and _is_read_timeout walks the
group for us.
The retry loop on top is intentionally boring:
async def retry_on_read_timeout(
work: Callable[[int], Awaitable[T]],
*,
retry_policy: RetryPolicy | None = None,
) -> T:
policy = retry_policy if retry_policy is not None else RetryPolicy()
last_error: ClientReadTimeoutError | None = None
for attempt in range(1, policy.max_attempts + 1):
try:
return await work(attempt)
except ClientReadTimeoutError as exc:
last_error = exc
if attempt >= policy.max_attempts:
break
await asyncio.sleep(policy.backoff_for(attempt))
assert last_error is not None
raise last_error
The loop only retries on ClientReadTimeoutError — every other
exception propagates immediately. That is enforced by
test_retry_on_read_timeout_does_not_retry_other_exceptions, which
raises a RuntimeError from inside work and asserts the loop sees
exactly one attempt. The backoff itself sleeps after the just-failed
attempt, so attempt 1 sleeps backoff_for(1) before attempt 2 starts;
that ordering is locked in by
test_retry_on_read_timeout_sleeps_between_attempts, which measures
elapsed wall-clock time against 0.05 + 0.1 (initial 0.05s doubled
once) and confirms we are in the right ballpark.
The RetryPolicy dataclass is frozen and validates in __post_init__,
so a misconfigured caller fails at construction time instead of mid-loop:
@dataclass(frozen=True)
class RetryPolicy:
max_attempts: int = DEFAULT_MAX_ATTEMPTS
initial_backoff_seconds: float = DEFAULT_INITIAL_BACKOFF_SECONDS
backoff_multiplier: float = DEFAULT_BACKOFF_MULTIPLIER
def __post_init__(self) -> None:
if self.max_attempts < 1:
raise ValueError("max_attempts must be >= 1")
if self.initial_backoff_seconds < 0:
raise ValueError("initial_backoff_seconds must be non-negative")
if self.backoff_multiplier < 1.0:
raise ValueError("backoff_multiplier must be >= 1.0")
def backoff_for(self, attempt: int) -> float:
if attempt < 1:
raise ValueError("attempt must be >= 1")
return self.initial_backoff_seconds * (
self.backoff_multiplier ** (attempt - 1)
)
Refusing max_attempts=0 is the obvious one. Refusing
backoff_multiplier < 1.0 is the less obvious one — a shrinking
backoff would accelerate retries against a persistent failure, which
is the opposite of what every retry library on earth wants. We catch
that in test_retry_policy_rejects_shrinking_backoff so the constraint
never quietly regresses.
The top-level entry point composes the three pieces into the surface callers actually want:
async def call_slow_echo_resilient(
sse_url: str,
message: str,
delay_seconds: float,
*,
connect_timeout: float = DEFAULT_CONNECT_TIMEOUT,
sse_read_timeout: float = DEFAULT_SSE_READ_TIMEOUT,
client_call_timeout: float | None = None,
retry_policy: RetryPolicy | None = None,
) -> SlowEchoResult:
async def _attempt(attempt: int) -> SlowEchoResult:
return await call_slow_echo_once(
sse_url,
message,
delay_seconds,
attempt=attempt,
connect_timeout=connect_timeout,
sse_read_timeout=sse_read_timeout,
client_call_timeout=client_call_timeout,
)
return await retry_on_read_timeout(_attempt, retry_policy=retry_policy)
The closure _attempt is recreated implicitly each time the loop
recurses into work(attempt), but the inner work it does — opening a
fresh sse_client, a fresh ClientSession, calling initialize,
calling call_tool — runs from scratch every attempt. There is no
shared state between attempts. That is the whole point of the retry
contract on this kind of failure: we cannot resume a torn-down SSE
stream, so we must open a new one.
The regression target is the test that proves a doomed call does not leave the server wedged for the next caller:
async def test_call_slow_echo_resilient_aborted_call_leaves_server_healthy(
sse_server_url: str,
) -> None:
tool_delay = 2.0
short_budget = 0.4
policy = RetryPolicy(
max_attempts=2,
initial_backoff_seconds=0.0,
backoff_multiplier=1.0,
)
with pytest.raises(ClientReadTimeoutError):
await asyncio.wait_for(
call_slow_echo_resilient(
sse_server_url,
message="aborted",
delay_seconds=tool_delay,
sse_read_timeout=short_budget,
client_call_timeout=short_budget,
retry_policy=policy,
),
timeout=HANG_OBSERVATION_BUDGET_SECONDS,
)
healthy = await asyncio.wait_for(
call_slow_echo(
sse_server_url,
message="follow-up",
delay_seconds=0.1,
sse_read_timeout=5.0,
),
timeout=HANG_OBSERVATION_BUDGET_SECONDS,
)
assert healthy.is_error is False
assert "follow-up" in healthy.text
The first call deliberately picks a tool delay (2.0s) much longer
than the call budget (0.4s), runs two attempts with no backoff, and
asserts both abort fast and the resilient wrapper re-raises a
ClientReadTimeoutError. The second call uses the plain
call_slow_echo against the same server, with a generous read budget,
and asserts the round-trip succeeds. The pairing matters: if the abort
on the doomed call had leaked a half-open transport into the server
process, the follow-up would either hang (caught by the outer
asyncio.wait_for) or come back as an error. The fact that it returns
is_error=False with the expected payload is what locks in the
contract.
Test it
Run the suite from the codebase/ directory:
pytest
Expected output:
============================= test session starts ==============================
platform darwin -- Python 3.12.5, pytest-9.0.3, pluggy-1.6.0
rootdir: codebase
configfile: pyproject.toml
testpaths: tests
plugins: asyncio-1.4.0, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 67 items
tests/test_client.py ..... [ 7%]
tests/test_heartbeat.py .............. [ 28%]
tests/test_heartbeat_tool.py ......... [ 41%]
tests/test_resilient_client.py ....................... [ 76%]
tests/test_server.py ........ [ 88%]
tests/test_tracing.py ........ [100%]
============================= 67 passed in 15.90s ==============================
Sixty-seven tests now: forty-four carried over from step 4, plus
twenty-three new tests in tests/test_resilient_client.py. Six tests
cover constant sanity and dataclass validation. Six cover the
_is_read_timeout classifier — including the two ExceptionGroup
unwrap tests that exercise the recursion against arbitrary group
nesting. Five unit-test the retry_on_read_timeout loop against
in-memory async stubs (first-attempt success, no-retry-on-other-exc,
retries-until-success, exhaust-and-reraise, sleeps-between-attempts).
The remaining six exercise the wire path on the uvicorn fixture: the
happy path, the named-timeout-on-blown-budget shape, the
exhaust-retries case that asserts exc.attempt == 2, and the
follow-up-call-still-works regression target. The whole suite still
finishes in roughly sixteen seconds.
You can also drive it interactively. Start the server in one terminal:
uv run mcp-slow-server --host 127.0.0.1 --port 8765
And in a Python REPL in another:
import asyncio
from mcp_slow_server.resilient_client import (
RetryPolicy,
call_slow_echo_resilient,
)
async def main():
result = await call_slow_echo_resilient(
"http://127.0.0.1:8765/sse",
message="resilient",
delay_seconds=3.0,
sse_read_timeout=0.5,
client_call_timeout=0.5,
retry_policy=RetryPolicy(
max_attempts=3,
initial_backoff_seconds=0.1,
backoff_multiplier=2.0,
),
)
print("text =", result.text)
asyncio.run(main())
This call asks the unmodified slow_echo to sleep for three seconds
against a half-second budget. You will see three rapid attempts —
roughly 0.5s + 0.1s + 0.5s + 0.2s + 0.5s ≈ 1.8s of total wall-clock
time, not nine — followed by a single ClientReadTimeoutError with
attempt=3. The shape of the failure is exactly what the test suite
asserts: the abort lands inside the call budget, the retries do not
stretch the failure out, and the server process is immediately ready
for a follow-up.
What we got
The remaining client-side gap is closed. A caller now imports one
public surface — call_slow_echo_resilient plus RetryPolicy and the
named ClientReadTimeoutError — and gets the three behaviours the bug
demanded: a single named exception for every shape of SSE teardown, a
prompt in-flight abort that propagates through the context manager
chain and lets the server clean up its tool task, and a bounded
exponential-backoff retry that opens a fresh transport every attempt
so a doomed try cannot poison the next one. The original
call_slow_echo is deliberately left untouched so the step 2
reproducer still pins the unmitigated hang in regression — that gives
the suite a paired "broken / resilient" surface on the same wire.
Twenty-three new tests cover the classifier (including
ExceptionGroup recursion), the dataclass validation, the retry loop
against in-memory stubs, and the round-trip behaviour on the uvicorn
fixture — including the "follow-up call still works" target that
proves a doomed resilient call does not leak transport state into the
next session. The entire 67-test suite passes in roughly sixteen
seconds. The arc that opened the article — "the client either raises
a transport exception nobody can catch or wedges forever when a tool
outruns the SSE read budget" — now resolves cleanly: either the tool
returns its payload, or the resilient wrapper raises a single named
ClientReadTimeoutError that carries the tool name, the SSE URL, the
attempt number, and the exact deadline that elapsed, on a transport
that was torn down promptly enough for the next call to start on a
clean wire.
Repository
The companion code for this article: https://github.com/vytharion/mcp-sse-read-timeout-tool-hang
The state of the code after this step: 37410bc
Key commits to step through:
d840d10— step 1: scaffold the minimal FastMCP SSE server with one slow tool1296ceb— step 2: wire up an MCP client and reproduce the read-timeout hang on tool call4202474— step 3: instrument the SSE stream to observe where the connection stalls17088ab— step 4: add server-side heartbeats and per-tool timeout cancellation37410bc— step 5: add client-side read-timeout handling with graceful retry and tool-call abort
What we're doing this step
Steps 4 and 5 each closed one half of the SSE read-timeout / tool-hang
bug. The server now emits a heartbeat fast enough to keep httpx's idle
read clock from firing while a legitimately slow tool is still working,
and cancels a runaway tool with a structured error the moment its
per-tool wall-clock budget elapses. The client now bounds every
call_tool await with an explicit deadline, lets the sse_client
context manager tear down cleanly on the timeout path so the server
observes a real disconnect, and collapses the third-party transport
exception zoo into one named ClientReadTimeoutError that callers can
pattern-match on without importing httpx or anyio. The two fixes are
each well-tested in isolation — twenty-three unit tests on the
resilient client, fourteen on the heartbeat loop, plus the wire-level
scenarios on the uvicorn fixture — but nothing yet drives both halves
through a single round-trip and asserts that the composed behaviour
matches the contract. Step 6 is that round-trip. We add one new public
helper, call_heartbeat_tool_resilient, that wires the heartbeat-aware
tool into the same abort/retry wrapper, refactor the single-attempt
plumbing so the slow-echo and heartbeat paths share the named-timeout
contract instead of copy-pasting it, and write a ten-test regression
suite in tests/test_end_to_end.py that pins the five-part timeout
contract — heartbeat liveness, server-side cancellation, client-side
abort, post-failure recovery, and the still-reproducing diagnostic from
step 2 — against the real uvicorn fixture. The point of the step is
not to add new behaviour; it is to lock the existing behaviour in so a
future refactor that drops a heartbeat, widens a deadline, raises a
default, or silently swallows the named timeout fails loudly in CI
rather than walking the fix back into the original hang.
Setup
One new test file lands and one production module grows two new public entry points:
tests/test_end_to_end.py— ten regression tests organised by contract clause (C1 through C5) plus three cross-cutting invariants on the package defaults. Every assertion is either a wall-clock bound measured withtime.perf_counter, a structured error shape (ClientReadTimeoutErrorwith the expectedattempt,timeout_seconds, andtool_name), or a post-failure liveness check that re-uses the same uvicorn fixture.src/mcp_slow_server/resilient_client.py— two new public functions (call_heartbeat_tool_once,call_heartbeat_tool_resilient) plus two internal helpers (_open_session_and_call,_call_tool_with_abort) that factor out the SSE-session-open +wait_for-wrappedcall_toolplumbing so the slow-echo path and the heartbeat path share the same named-timeout abort contract.src/mcp_slow_server/__init__.py— re-export the two new functions so callers import them off the package root.
No new runtime dependencies. The end-to-end tests reuse the
sse_server_url fixture from step 2's conftest.py, the
DEFAULT_HEARTBEAT_INTERVAL_SECONDS / DEFAULT_TOOL_TIMEOUT_SECONDS
constants from step 4, and the RetryPolicy /
ClientReadTimeoutError surface from step 5. The whole point is to
exercise what we already have through one composed seam, not to
introduce a new abstraction.
Implementation
The refactor inside resilient_client.py is the first move. The
previous call_slow_echo_once had the SSE-session-open and the
wait_for-wrapped call_tool inlined, which was fine when only one
tool needed the abort contract but starts to copy-paste itself the
moment a second tool wants the same treatment. We pull the wire shape
into a single helper that takes the tool name and an arguments dict:
async def _open_session_and_call(
sse_url: str,
tool_name: str,
arguments: dict,
*,
connect_timeout: float,
sse_read_timeout: float,
deadline: float,
progress_callback: ProgressCallback | None,
) -> SlowEchoResult:
async with sse_client(
sse_url,
timeout=connect_timeout,
sse_read_timeout=sse_read_timeout,
) as (read_stream, write_stream):
async with ClientSession(read_stream, write_stream) as session:
await session.initialize()
call_result = await asyncio.wait_for(
session.call_tool(
tool_name,
arguments,
progress_callback=progress_callback,
),
timeout=deadline,
)
return SlowEchoResult(
text=_extract_text(call_result.content),
is_error=bool(call_result.isError),
)
The progress_callback parameter is the non-obvious piece that the
end-to-end test exposed. The MCP SDK only injects a
_meta.progressToken into the outgoing call_tool request when the
caller passes a progress_callback. Without that token, the server's
ctx.report_progress notifications are silently dropped at the
protocol layer, which means the heartbeat fix from step 4 would
compile and pass its unit tests while still failing on the wire — no
heartbeats actually reach the SSE stream, so httpx's idle clock fires
exactly as it did before the fix. Wiring progress_callback through
the helper, and defaulting the heartbeat path to a no-op callback that
just exists to flip the opt-in switch, is what makes the composed
behaviour actually hold.
The named-timeout abort contract gets factored into a second helper:
async def _call_tool_with_abort(
sse_url: str,
tool_name: str,
arguments: dict,
*,
attempt: int,
connect_timeout: float,
sse_read_timeout: float,
client_call_timeout: float | None,
progress_callback: ProgressCallback | None = None,
) -> SlowEchoResult:
deadline = (
client_call_timeout
if client_call_timeout is not None
else sse_read_timeout
)
try:
return await _open_session_and_call(
sse_url,
tool_name,
arguments,
connect_timeout=connect_timeout,
sse_read_timeout=sse_read_timeout,
deadline=deadline,
progress_callback=progress_callback,
)
except BaseException as exc:
if _is_read_timeout(exc):
raise ClientReadTimeoutError(
tool_name=tool_name,
sse_url=sse_url,
attempt=attempt,
timeout_seconds=deadline,
) from exc
raise
This is the same shape step 5 already pinned — BaseException
catch-all so anyio's BaseExceptionGroup teardowns route through
_is_read_timeout, single try with no nested except per the
codebase's no-nested-try rule, deadline defaults to sse_read_timeout
when the caller does not override it. The only difference is that
tool_name is now a parameter rather than a hard-coded "slow_echo",
so the heartbeat tool can reuse the exact same contract. Two thin
wrappers — call_slow_echo_once and call_heartbeat_tool_once — now
delegate to _call_tool_with_abort with their tool name and arguments
fixed. The slow-echo wrapper does not pass a progress callback (step
2's behaviour is preserved verbatim so the C5 reproducer still pins
the unmitigated hang); the heartbeat wrapper passes _noop_progress
to flip the protocol-level opt-in on.
The composed entry point that the end-to-end tests drive is deliberately minimal:
async def call_heartbeat_tool_resilient(
sse_url: str,
message: str,
delay_seconds: float,
*,
heartbeat_interval: float = DEFAULT_HEARTBEAT_INTERVAL_SECONDS,
tool_timeout: float = DEFAULT_TOOL_TIMEOUT_SECONDS,
connect_timeout: float = DEFAULT_CONNECT_TIMEOUT,
sse_read_timeout: float = DEFAULT_SSE_READ_TIMEOUT,
client_call_timeout: float | None = None,
retry_policy: RetryPolicy | None = None,
) -> SlowEchoResult:
async def _attempt(attempt: int) -> SlowEchoResult:
return await call_heartbeat_tool_once(
sse_url,
message,
delay_seconds,
attempt=attempt,
heartbeat_interval=heartbeat_interval,
tool_timeout=tool_timeout,
connect_timeout=connect_timeout,
sse_read_timeout=sse_read_timeout,
client_call_timeout=client_call_timeout,
)
return await retry_on_read_timeout(_attempt, retry_policy=retry_policy)
The closure body opens a fresh SSE session per attempt — same
no-shared-state-across-retries discipline as the slow-echo wrapper —
and the retry_on_read_timeout loop from step 5 handles the bounded
exponential backoff. The whole composed function is fifteen lines
because every behaviour it depends on already exists; step 6 is wiring,
not invention.
The regression suite is where the contract gets pinned. The file's
module docstring spells out five named clauses and the tests are
grouped under the clause they pin. C1 — heartbeats unblock honest
slow tools is covered by two tests. The first asks the heartbeat
tool to sleep for 0.8s against an 0.4s SSE read budget with an 0.1s
heartbeat cadence, then asserts the call returns successfully with
is_error=False, that the wall-clock elapsed time is at least
tool_delay - 0.05 (so we know the response actually came back, not
that we cut the call short), and that the elapsed time is less than
tool_delay + 2.0 (so we know we did not silently retry). The second
locks the no-spurious-retry invariant on the happy path by configuring
a deliberately huge initial_backoff_seconds=1.5 and asserting the
elapsed time stays well under that — if any phantom retry fired, the
first backoff sleep would blow the bound.
C2 — server-side tool timeout still cancels runaway work asks the
heartbeat tool for a 1.5s sleep with a 0.3s tool_timeout. Wire-level
budgets are deliberately generous (sse_read_timeout=5.0,
client_call_timeout=5.0) so the only timer that should fire is the
server's own per-tool budget. The contract: the result lands with
is_error=True (cancellation surfaces as a structured tool error, not
a transport exception) and elapsed time is well below the natural
delay.
C3 — client-side abort still bounds the wait asks the heartbeat
tool for a 3.0s sleep but simulates a misconfigured server by setting
the heartbeat interval to 2.0s — well above the 0.4s SSE read budget.
With no help from the server, the client deadline must win, so we
assert pytest.raises(ClientReadTimeoutError), that
exc.attempt == policy.max_attempts (we exhausted retries), that
exc.timeout_seconds == short_read_budget (the deadline metadata
matches the configured budget), and that elapsed time stays below the
natural tool delay. A single-attempt variant pins the same contract
without retries hiding the timing shape.
C4 — a doomed call leaves the server healthy is the composite
recovery test. We fire a doomed resilient call (slow heartbeat, short
read budget, short client deadline, two attempts), assert it raises
ClientReadTimeoutError, then immediately fire a second resilient
call against the same fixture with sane parameters and assert it
returns successfully with the expected payload. If the abort path had
leaked a half-open transport or a wedged tool task back into the
server process, the second call would either hang (caught by the outer
asyncio.wait_for) or come back with is_error=True.
C5 — the diagnostic regression still reproduces is the most
counter-intuitive test in the file: it asserts the original
unfixed path still fails. The whole tutorial hangs off step 2's
call_slow_echo + plain slow_echo reproducer — if a future
refactor silently raises the default sse_read_timeout or routes the
plain client through the heartbeat tool, the educational value of step
2 evaporates. We pin the failure shape (any BaseException raised
before the natural delay completes) so that change shows up as a red
test rather than a silent rewrite of history.
Three cross-cutting invariants live above the clauses. The first
asserts DEFAULT_HEARTBEAT_INTERVAL_SECONDS < DEFAULT_SSE_READ_TIMEOUT
— if a future change pushes the heartbeat interval above the read
budget, the heartbeat-based fix becomes vacuous because the server
would never get a chance to refresh the read clock before httpx tore
the stream down. The second asserts
DEFAULT_TOOL_TIMEOUT_SECONDS > DEFAULT_HEARTBEAT_INTERVAL_SECONDS —
a tool timeout shorter than the heartbeat would never tick. The third
asserts the retry constants stay finite and bounded
(DEFAULT_MAX_ATTEMPTS >= 1, DEFAULT_INITIAL_BACKOFF_SECONDS >= 0,
DEFAULT_BACKOFF_MULTIPLIER >= 1.0).
Every test has an outer asyncio.wait_for(..., timeout=HANG_OBSERVATION_BUDGET_SECONDS)
wrapper set to 15 seconds. The number is picked to be larger than the
longest deliberately-slow tool call in the file but small enough that
a regression to the indefinite-hang failure mode manifests as a fast
red test rather than a wedged CI job.
Test it
Run the full suite from the codebase/ directory:
uv run pytest tests/
Expected output:
============================= test session starts ==============================
platform darwin -- Python 3.12.5, pytest-9.0.3, pluggy-1.6.0
rootdir: codebase
configfile: pyproject.toml
plugins: asyncio-1.4.0, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 77 items
tests/test_client.py ..... [ 6%]
tests/test_end_to_end.py .......... [ 19%]
tests/test_heartbeat.py .............. [ 37%]
tests/test_heartbeat_tool.py ......... [ 49%]
tests/test_resilient_client.py ....................... [ 79%]
tests/test_server.py ........ [ 89%]
tests/test_tracing.py ........ [100%]
============================= 77 passed in 23.52s ==============================
Seventy-seven tests in total now: sixty-seven carried over from step 5,
plus ten new end-to-end regression tests in tests/test_end_to_end.py.
Three pin the cross-cutting default invariants (heartbeat interval
below read timeout, tool timeout above heartbeat interval, retry
constants finite). Two pin C1 (heartbeats survive a tight read budget;
no spurious retry on the happy path). One pins C2 (server-side
tool_timeout surfaces as is_error=True within its budget, not the
natural delay). Two pin C3 (client deadline beats tool delay on both
the resilient and single-attempt paths). One pins C4 (doomed call →
fresh call recovery). One pins C5 (original slow_echo reproducer
still fails). The whole suite finishes in roughly twenty-three seconds
on the reference Python 3.12 + uv setup.
You can also drive the composed behaviour interactively. Start the server in one terminal:
uv run mcp-slow-server --host 127.0.0.1 --port 8765
And in a Python REPL in another:
import asyncio
from mcp_slow_server import (
RetryPolicy,
call_heartbeat_tool_resilient,
)
async def main():
result = await call_heartbeat_tool_resilient(
"http://127.0.0.1:8765/sse",
message="composed",
delay_seconds=2.0,
heartbeat_interval=0.1,
tool_timeout=5.0,
sse_read_timeout=0.5,
client_call_timeout=3.0,
retry_policy=RetryPolicy(max_attempts=3),
)
print("is_error =", result.is_error)
print("text =", result.text)
asyncio.run(main())
The tool sleeps for two seconds against a half-second SSE read budget
— a configuration that would have torn the stream down with the
unmitigated step-2 client. With the composed fix wired up, the server
emits heartbeats every 0.1s that refresh httpx's idle clock, the tool
returns its payload after the natural delay, and the client prints
is_error = False with the expected message text. Flip
delay_seconds to a value above tool_timeout (say, delay_seconds=8.0
with tool_timeout=0.5) and you will get is_error=True within
about half a second — server-side cancellation, not a hang. Flip
heartbeat_interval above sse_read_timeout and you will get a
ClientReadTimeoutError with attempt=3 — client-side abort, also
not a hang. The same three knobs that the regression suite turns drive
the same three observable shapes from the REPL.
What we got
The five-part timeout contract is now pinned end-to-end against the
real uvicorn fixture. A caller imports one new public surface —
call_heartbeat_tool_resilient — and gets the composed behaviour the
bug demanded: a heartbeat-aware server-side tool that keeps the SSE
stream alive while a legitimately slow call is working, a per-tool
wall-clock budget that cancels a runaway tool with a structured
is_error=True result, a client-side deadline that bounds the wait
when the heartbeat cannot keep up and collapses the transport exception
zoo into a single named ClientReadTimeoutError, a bounded
exponential-backoff retry that opens a fresh transport every attempt
so a doomed try cannot poison the next one, and a wire path that
leaves the server process immediately ready for the next caller after
either failure mode fires. The non-obvious progress-callback wiring
that the MCP SDK requires to actually emit report_progress
notifications onto the wire is now baked into the heartbeat helper, so
nobody can compose the two halves of the fix and silently lose the
heartbeat flow. The original call_slow_echo and slow_echo paths
are deliberately untouched so the step 2 reproducer still pins the
unmitigated hang in regression, and the new C5 test locks that
behaviour against accidental masking. Ten new tests turn each clause
of the contract into a wall-clock bound, a structured error shape, or
a post-failure liveness check, and three cross-cutting default
invariants guard the constants the fix depends on. The whole 77-test
suite passes in about twenty-three seconds, which means a future
refactor that drops a heartbeat, widens a deadline, swallows the named
timeout, or routes the plain client through the heartbeat tool fails
fast in CI rather than silently walking the fix back into the
indefinite-hang failure mode that motivated this tutorial.
Repository
The companion code for this article: https://github.com/vytharion/mcp-sse-read-timeout-tool-hang
The state of the code after this step: 855aa8a
Key commits to step through:
d840d10— step 1: scaffold the minimal FastMCP SSE server with one slow tool1296ceb— step 2: wire up an MCP client and reproduce the read-timeout hang on tool call4202474— step 3: instrument the SSE stream to observe where the connection stalls17088ab— step 4: add server-side heartbeats and per-tool timeout cancellation37410bc— step 5: add client-side read-timeout handling with graceful retry and tool-call abort855aa8a— step 6: verify the fix end-to-end and write a regression test that pins the timeout contract