claude-code-mcp-sigterm-keepalive-pattern

Claude Code MCP SIGTERM Keepalive Pattern

If you have spent any time wiring custom tools into an AI coding agent, you have probably hit the moment where the agent appears to "lose its mind" mid-task: a tool call hangs, the next call returns an opaque transport error, and the agent decides — with the full confidence of an LLM — that the tool was simply never registered. Nine times out of ten, the root cause is not the agent and not the tool. It is the process lifecycle in between.

This piece is a sketch, not a deep dive. It maps out a single pattern — what I have come to call the SIGTERM keepalive pattern — that quietly determines whether your Model Context Protocol (MCP) servers behave like reliable infrastructure or like flaky background scripts under Claude Code and similar agent harnesses. The goal here is to give you the right vocabulary, the right mental model, and a couple of breadcrumbs to follow when you decide to harden your own MCP integrations.

Why MCP server lifecycle is a real problem

The Model Context Protocol is the open specification, originally introduced by Anthropic in late 2024, that lets agent hosts (Claude Code, Claude Desktop, third-party IDEs) talk to external tools through a uniform JSON-RPC interface. A common transport — and the one Claude Code defaults to for local servers — is stdio: the host spawns your server as a subprocess, the host writes JSON-RPC requests to the server's stdin, and the server writes JSON-RPC responses to its stdout. That is genuinely elegant for a "tools as plugins" model. It is also exactly the design that turns process lifecycle into the dominant source of flakiness.

Stdio is a contract between two processes. The host owns the lifetime of the subprocess: it spawned it, it can kill it. In practice "kill" almost always means SIGTERM first, then SIGKILL if the child does not exit within some grace window. The host needs to do this on shutdown, on configuration reload, on user-driven "restart this MCP server" actions, and during error recovery. The server, meanwhile, is somewhere mid-handler — maybe writing a file, maybe streaming back a large tool result, maybe inside an asyncio task graph with three open HTTP connections. If the server treats SIGTERM as "drop everything and exit immediately," the host sees half-written JSON-RPC frames, retries land on a dead pipe, and the agent gets the kind of error message LLMs are spectacularly bad at recovering from.

The SIGTERM keepalive pattern is the answer to this. It is not a single API or a library. It is a disciplined way of structuring the server so that termination is a graceful, observable event rather than a sudden disappearance.

What "keepalive" actually means here

The word "keepalive" usually conjures TCP, where it refers to periodic probes that prevent NAT boxes from forgetting your connection exists. That is not what I mean. In an MCP stdio server, "keepalive" is about three things, in this order of importance:

Drain, don't drop. When SIGTERM arrives, the server must finish whatever request it is currently handling, flush its stdout buffer, and only then exit. Anything else hands the host a truncated JSON frame.
Refuse new work cleanly. Between "SIGTERM received" and "process exits," the server must stop accepting new tool invocations and must respond to any in-flight ones with a recognizable error rather than silence.
Bound the grace window. The server picks a deadline (often 5–10 seconds), beyond which it gives up on a hung handler, logs the abandonment, and exits anyway. Hosts will SIGKILL eventually; better to exit on your own terms.

You will notice this is the same pattern that nginx, systemd-managed services, and Kubernetes pods all implement. That is not a coincidence. MCP servers are, structurally, network services where the network happens to be a pair of pipes. The same engineering discipline applies; it is just easier to forget because there is no obvious port number.

The default behavior is wrong

Most MCP server scaffolding you find on GitHub today does not implement this pattern out of the box. The Python reference implementations spawn an asyncio event loop, register the stdio transport, and run until the loop is cancelled. When SIGTERM arrives, the default Python signal handler raises KeyboardInterrupt in the main thread, which cancels every task in flight, including the one currently writing to stdout. The result: an unterminated JSON object on the wire, and a confused host.

The Node ecosystem has the same shape. A naive process.on('SIGTERM', () => process.exit(0)) does worse than nothing — it exits immediately, abandoning queued microtasks and any in-flight console.error log lines that were keeping you sane during debugging.

This is not a criticism of the SDKs. They are doing the reasonable thing for the 80% case. It is just that "agent-driven tool host that may restart you at any time" is the 20% case, and it is the case you are actually in.

The pattern, sketched

The shape of a SIGTERM-aware MCP server looks like this. I am writing the example in Python because that is where most of the recent MCP server growth has happened, but the structure ports cleanly to TypeScript or Go.

import asyncio
import signal
import sys

GRACE_SECONDS = 8
shutdown_event = asyncio.Event()
inflight: set[asyncio.Task] = set()

async def handle_request(req):
    task = asyncio.current_task()
    inflight.add(task)
    try:
        if shutdown_event.is_set():
            return {"error": "server is shutting down"}
        return await dispatch(req)
    finally:
        inflight.discard(task)

def _on_signal():
    shutdown_event.set()

async def main():
    loop = asyncio.get_running_loop()
    loop.add_signal_handler(signal.SIGTERM, _on_signal)
    loop.add_signal_handler(signal.SIGINT, _on_signal)
    server_task = asyncio.create_task(run_stdio_server(handle_request))
    await shutdown_event.wait()
    try:
        await asyncio.wait_for(
            asyncio.gather(*inflight, return_exceptions=True),
            timeout=GRACE_SECONDS,
        )
    except asyncio.TimeoutError:
        sys.stderr.write("grace window exceeded; abandoning inflight tasks\n")
    server_task.cancel()
    await asyncio.gather(server_task, return_exceptions=True)

asyncio.run(main())

There are a handful of things going on in those thirty-odd lines that matter more than they look:

The add_signal_handler call is the asyncio-native way of catching SIGTERM. Crucially, it does not raise an exception into a random task; it schedules a callback on the event loop. That callback flips a single asyncio.Event. Every handler in the server is now responsible for checking that event before starting expensive work — or, for shorter handlers, just letting them run to completion.

The inflight set is the bookkeeping that makes "drain" a verifiable operation rather than a hope. When the shutdown event fires, you gather() everything in flight, bounded by the grace window. If the grace window blows, you log loudly and exit anyway. The host's SIGKILL is about to arrive; better to leave on your own terms with a useful log line than to be reaped silently.

The server_task.cancel() at the end is the explicit cleanup: stop reading from stdin, stop the JSON-RPC framing loop, let asyncio drain its remaining I/O before asyncio.run() returns. This is what lets your server emit a final structured log line — "shut down cleanly, 3 requests served, 0 abandoned" — that turns post-mortem debugging from archaeology into reading.

Where this fits in the agent loop

If you zoom out from a single MCP server and look at the full agent loop, the keepalive pattern is one of three lifecycle disciplines you need. The other two:

The handshake discipline says that on startup, your server must complete the MCP initialize handshake before doing anything else. Hosts that send a tool invocation before initialize is acknowledged will treat the missing acknowledgement as a transport failure. The MCP specification at spec.modelcontextprotocol.io defines the initialize/initialized exchange in some detail; it is worth reading carefully before you ship.

The timeout discipline says that every tool implementation inside your server needs its own bounded execution time, independent of whatever timeout the agent might apply on its side. Agents will happily wait many seconds for a tool to return; users will not. A clear server-side timeout, surfaced as a structured error in the JSON-RPC response, is what keeps an agent from getting stuck in a loop of "the tool is still working, I'll wait." For Claude Code specifically, the documented behavior around long-running tools and user interrupts lives in the Claude Code documentation; the keepalive pattern is the bottom layer that makes those higher-level guarantees possible.

The keepalive pattern is the layer that connects these two: handshake brings the server up cleanly, timeouts keep individual tool calls honest, and keepalive brings the server down cleanly. Without all three, you have a tool catalog that works on average but fails on every interesting edge case.

What this gets you in practice

If you implement the SIGTERM keepalive pattern in your MCP servers, here is what changes day-to-day.

First, your agent transcripts get dramatically cleaner. The agent stops seeing "transport error" entries that it then attempts to reason about. Restarts — whether triggered by the user, by configuration changes, or by the host's internal supervision — become invisible to the agent loop. From the agent's perspective, tools just exist. That is exactly the abstraction you want.

Second, your logs become useful artifacts rather than noise. A server that exits with a structured "drained 3 requests, abandoned 0, grace window unused" message is one whose logs you will actually read. A server that disappears mid-write leaves you with stack traces from the host's transport layer and nothing else.

Third, you stop being afraid of restarts. Agent host configurations change all the time as you iterate — adding tools, adjusting permissions, switching environment variables. With a keepalive-aware server, restarting the host is a routine, sub-second operation. Without it, every restart is a small gamble against in-flight work.

Fourth, you build the habit. Once you have written one keepalive-aware MCP server, the pattern is muscle memory. You will start applying it to other places where a parent process supervises a child — long-running CLI tools, background daemons, ephemeral worker subprocesses spawned by your main agent host. The discipline generalizes.

Where to go from here

This piece is intentionally a sketch. The full implementation has details I have glossed: how to surface in-flight task counts as MCP server metadata, how to coordinate keepalive across multiple stdio MCP servers run by a single supervisor, whether to expose a /shutdown tool that lets the agent itself request a clean restart. Each of those is its own essay.

If you are starting from a blank repo, three concrete next steps. Read the MCP specification's "Lifecycle" section end-to-end — it is short, and it is the source of truth for what handshake messages your server must handle. Pick one existing MCP server you depend on, fork it, and instrument the signal handler with logging so you can watch what currently happens on restart. Then prototype the pattern above on top of one of your simplest tools, measure the difference in your agent transcripts over a week of normal use, and decide whether the cost of writing it is worth the resulting calm. Almost universally, in my experience, it is.

The interesting thing about agent tooling is that the LLM is the cheap, replaceable part. The plumbing around it — the MCP servers, the transports, the supervisors — is what actually has to be engineered. Lifecycle discipline is the largest single lever you have over how reliable that plumbing feels in practice. SIGTERM keepalive is one of the smallest, most concrete pieces of that discipline. It is worth doing.