Build an MCP stdio Server in Python

The Model Context Protocol gives an LLM client a uniform way to call external tools. Most published examples lean on the official SDK and skip the wire format, which is fine until something breaks at 2am and you have no idea whether the bug is in your tool, the framing, or the host. This walkthrough builds a small MCP stdio server in Python from the JSON-RPC layer up, so when something does break you can read the bytes and know exactly where to look.

By the end you'll have a server that registers two tools, handles initialization, returns structured errors, and survives malformed input from the host. About 150 lines of code, no SDK.

Why stdio over HTTP

MCP supports two transports: stdio (the host launches your server as a subprocess and talks over stdin/stdout) and HTTP+SSE. Stdio wins for local tools because there's no port to collide with, no auth to configure, and the host owns the process lifecycle. Spin up, do work, exit. HTTP makes sense when one server multiplexes many clients or runs on a different machine; for a personal toolchain that's overkill.

The tradeoff: stdio servers can't share state across hosts, and stdout is sacred. A stray print("debug") will corrupt the JSON-RPC stream and the host will silently disconnect. Every byte to stdout must be a valid Content-Length-framed JSON-RPC message. Logs go to stderr, always.

The framing layer

MCP rides JSON-RPC 2.0 over LSP-style framing. Each message is preceded by Content-Length: N\r\ \r\ and then exactly N bytes of UTF-8 JSON. No newlines between messages, no terminator after the body. Get this wrong and the host hangs forever waiting for bytes that aren't coming.

import sys
import json

def read_message(stream) -> dict | None:
    headers = {}
    while True:
        line = stream.readline()
        if not line:
            return None
        line = line.decode("utf-8").rstrip("\r\
")
        if line == "":
            break
        key, _, value = line.partition(":")
        headers[key.strip().lower()] = value.strip()

    length = int(headers.get("content-length", 0))
    if length == 0:
        return None
    body = stream.read(length)
    return json.loads(body.decode("utf-8"))

def write_message(stream, payload: dict) -> None:
    body = json.dumps(payload, ensure_ascii=False).encode("utf-8")
    header = f"Content-Length: {len(body)}\r\
\r\
".encode("ascii")
    stream.write(header)
    stream.write(body)
    stream.flush()

Two things to notice. First, stream.read(length) reads bytes, not characters — encoding matters because Content-Length counts UTF-8 bytes, not Python str length. A naive stream.readline() loop that decodes early will desync on any non-ASCII content. Second, flush() after every write. Python buffers stdout by default and the host won't see your response until the buffer is full or the process exits.

Tool registry

Tools are functions the host can call by name. Each tool has a name, a description, an input JSON Schema, and a handler. A simple dict-based registry beats a class hierarchy here:

TOOLS: dict[str, dict] = {}

def register(name: str, description: str, schema: dict):
    def decorator(fn):
        TOOLS[name] = {
            "name": name,
            "description": description,
            "inputSchema": schema,
            "handler": fn,
        }
        return fn
    return decorator

@register(
    "echo",
    "Return the input string unchanged.",
    {
        "type": "object",
        "properties": {"text": {"type": "string"}},
        "required": ["text"],
    },
)
def tool_echo(args: dict) -> str:
    return args["text"]

@register(
    "word_count",
    "Count words in the input string.",
    {
        "type": "object",
        "properties": {"text": {"type": "string"}},
        "required": ["text"],
    },
)
def tool_word_count(args: dict) -> int:
    return len(args["text"].split())

The schema is enforced by the host before the call reaches you, but defending in the handler is still wise. Hosts have bugs too.

Request dispatch

MCP requires three methods at minimum: initialize (handshake, returns server capabilities), tools/list (enumerate tools), and tools/call (invoke one). Dispatch is a flat dict lookup, never nested if/elif chains:

def handle_initialize(req_id, params):
    return {
        "jsonrpc": "2.0",
        "id": req_id,
        "result": {
            "protocolVersion": "2024-11-05",
            "capabilities": {"tools": {}},
            "serverInfo": {"name": "demo-server", "version": "0.1.0"},
        },
    }

def handle_tools_list(req_id, params):
    return {
        "jsonrpc": "2.0",
        "id": req_id,
        "result": {
            "tools": [
                {k: v for k, v in tool.items() if k != "handler"}
                for tool in TOOLS.values()
            ]
        },
    }

def handle_tools_call(req_id, params):
    name = params.get("name")
    args = params.get("arguments", {})
    tool = TOOLS.get(name)
    if tool is None:
        return error_envelope(req_id, -32602, f"Unknown tool: {name}")
    try:
        result = tool["handler"](args)
    except Exception as e:
        return error_envelope(req_id, -32603, f"Tool {name} failed: {e}")
    return {
        "jsonrpc": "2.0",
        "id": req_id,
        "result": {"content": [{"type": "text", "text": str(result)}]},
    }

DISPATCH = {
    "initialize": handle_initialize,
    "tools/list": handle_tools_list,
    "tools/call": handle_tools_call,
}

The content shape inside tools/call results matters: the host expects a list of content parts, each with a type discriminator. Returning a bare string here works in some hosts and silently breaks in others.

Error envelopes

JSON-RPC defines a small set of error codes you should respect. Hosts treat them differently — -32601 (method not found) tells the host you're a partial implementation and to skip that capability; a generic -32603 (internal error) shows up as a tool failure to the user.

def error_envelope(req_id, code: int, message: str) -> dict:
    return {
        "jsonrpc": "2.0",
        "id": req_id,
        "error": {"code": code, "message": message},
    }

Common codes worth knowing: -32700 parse error, -32600 invalid request, -32601 method not found, -32602 invalid params, -32603 internal error. Anything -32000 to -32099 is yours to define.

The main loop

import logging

logging.basicConfig(stream=sys.stderr, level=logging.INFO)
log = logging.getLogger("mcp")

def main():
    while True:
        msg = read_message(sys.stdin.buffer)
        if msg is None:
            log.info("stdin closed, exiting")
            return
        method = msg.get("method")
        req_id = msg.get("id")
        params = msg.get("params", {})

        handler = DISPATCH.get(method)
        if handler is None:
            response = error_envelope(req_id, -32601, f"Method not found: {method}")
        else:
            response = handler(req_id, params)

        if req_id is not None:
            write_message(sys.stdout.buffer, response)

if __name__ == "__main__":
    main()

Notifications (messages without an id field) get no response — that's the JSON-RPC contract and hosts will hang if you reply. Logging via logging.basicConfig(stream=sys.stderr) keeps stdout clean.

Wiring it to a host

Drop a JSON config block somewhere your host reads MCP server configs from. The exact path depends on the host:

{
  "mcpServers": {
    "demo-server": {
      "command": "python",
      "args": ["/absolute/path/to/server.py"]
    }
  }
}

Restart the host. The server should appear with two tools available. If it doesn't, run the host in verbose mode and watch for framing errors — that's 90% of first-run problems.

Comparing this to the official SDK

The Python mcp SDK from the upstream project gives you all of this in roughly 20 lines of decorator-flavored code. So why hand-roll? Two reasons.

First, the SDK is async-first. If your tool calls are CPU-bound or use a sync library, you're fighting the SDK rather than using it. A 150-line sync server is faster to debug than chasing event-loop interactions. Second, when something goes wrong with tool calls in production, knowing the wire format is the difference between a 5-minute fix and a 2-hour debugging session. Build it once from scratch, then move to the SDK with eyes open.

What to add next

Three obvious extensions: schema validation with jsonschema so you reject bad inputs before they reach the handler; structured logging that includes the request id so you can trace one call through stderr; and a graceful shutdown path that handles SIGTERM cleanly. The bones above support all three without restructuring.

References: