Agent WebSocket Delegation: Auth, Envelopes, and Timeout Fallback

A server-side agent — the one taking user requests, planning, and calling tools — eventually hits work it cannot do in-process. Local filesystem operations on a different machine. A long-running IDE skill. A GPU inference job pinned to a specific box. The clean answer is a remote agent: a daemon on the target machine that holds a persistent WebSocket back to the orchestrator and runs jobs on its behalf.

REST works for one-shot calls under a second. For anything bidirectional — progress events, partial results, server-initiated work — WebSocket beats long-polling on latency (sub-10ms per hop vs 200ms+) and beats Server-Sent Events because SSE is one-way. The hard parts are not the framing protocol; they are authentication, message envelopes that survive schema drift, and timeout handling that doesn't strand half-completed work.

This article walks through a pattern that has held up across several production-ish deployments: a Python orchestrator (FastAPI), a remote agent written in Rust or Python, and a strict message envelope shared between them.

Why delegation over direct execution

Putting the work inline in the orchestrator looks simpler until you list the failure modes. The orchestrator process restarts, kills any in-flight subprocess. Different machines mean different filesystem mounts and different installed binaries. Authorization scopes diverge — the orchestrator runs as a service account, but the remote work needs your dev user's keychain or ssh keys.

A delegation boundary fixes all three:

Process isolation — orchestrator restarts don't kill remote jobs. The remote agent can reconnect, reattach, and report final status.
Locality — files, GPUs, keychain entries stay where they belong. The remote agent reads them directly with no cross-mount nightmare.
Authorization — the remote agent runs as the user it needs to be. The orchestrator delegates the what, not the who.

The cost is two extra moving parts (a transport and an envelope schema) and one extra failure mode (the link itself). Both are tractable.

Authentication handshake

The standard mistake is to use a static bearer token in a query string. It logs in nginx access logs, it ends up in browser history if you ever proxy through one, and rotation requires restarting both endpoints simultaneously. Use a signed handshake instead.

Generate an ed25519 keypair per remote agent. The agent holds the private key; the orchestrator holds the public key. On connect, the agent signs a payload that includes a server-issued nonce, and the server verifies before promoting the socket to "authenticated."

import json
import secrets
from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PublicKey
from fastapi import WebSocket, WebSocketDisconnect

PUBLIC_KEYS: dict[str, Ed25519PublicKey] = load_agent_pubkeys()

async def auth_handshake(ws: WebSocket) -> str:
    nonce = secrets.token_hex(32)
    await ws.send_json({"kind": "auth_challenge", "nonce": nonce})

    raw = await ws.receive_json()
    if raw.get("kind") != "auth_response":
        raise WebSocketDisconnect(code=4001, reason="expected auth_response")

    agent_id = raw["agent_id"]
    signature = bytes.fromhex(raw["signature"])
    pubkey = PUBLIC_KEYS.get(agent_id)
    if pubkey is None:
        raise WebSocketDisconnect(code=4003, reason="unknown agent")

    pubkey.verify(signature, nonce.encode())
    return agent_id

Two design choices worth flagging:

Nonce is server-issued. A client-chosen nonce lets a compromised agent replay old signatures. Server nonces force a fresh signature per connection.
Verification happens before the socket is added to any registry. If verify raises, the socket closes with a 4001/4003 application-defined code and never appears in the connected-agents pool. About 30 lines of code; closes the entire class of unauthenticated-action bugs.

Rotate keys by adding the new public key alongside the old, deploying agents with the new private key one by one, then removing the old public key. Zero downtime, no coordinated restart.

The message envelope

This is where most home-grown protocols rot. The temptation is to send {"action": "run_skill", "name": "foo", "args": {...}} and call it done. Six months later you need progress events, you bolt on {"action": "skill_progress", ...}, and now the discriminator field is overloaded across request, response, and event categories.

Lock the envelope shape on day one:

from typing import Literal
from pydantic import BaseModel

class Envelope(BaseModel):
    kind: Literal[
        "skill_request", "skill_response", "skill_progress",
        "skill_error", "ping", "pong",
    ]
    request_id: str            # UUID4, set by sender
    correlation_id: str | None # set on responses/progress/errors only
    payload: dict              # kind-specific; validate downstream
    sent_at: float             # unix epoch seconds

Three properties that pay rent:

Discriminator is kind, not action. Reserving kind for the protocol layer leaves action (or skill_name) available inside payloads without collisions.
request_id and correlation_id are mandatory. When you have 40 in-flight skill calls and one returns out of order, correlation IDs are the only thing letting the orchestrator route the response to the right waiter.
payload is intentionally dict, not a typed union. Validate it against a per-kind Pydantic model in the handler, not at the envelope layer. Adding a new kind doesn't require touching the envelope class — just register a new handler.

Compare this to the alternative of a discriminated union across 12 message types at the envelope level: every new message kind ripples through every consumer. The dict-and-validate-late pattern is roughly 3× faster to extend, at the cost of catching schema errors one frame deeper.

Timeout fallback that doesn't leak

The naive timeout is asyncio.wait_for(future, timeout=30). When it fires, the future is cancelled, the waiter raises, and the orchestrator moves on. The problem is the remote agent has no idea anything changed. It finishes the job 5 seconds later, sends skill_response, and the orchestrator either drops it (best case) or routes it to a future caller with a colliding request_id (worst case).

A correct timeout has three steps:

import asyncio

async def call_remote(
    ws_send: callable,
    waiters: dict[str, asyncio.Future],
    request_id: str,
    payload: dict,
    timeout: float = 30.0,
) -> dict:
    fut: asyncio.Future = asyncio.get_running_loop().create_future()
    waiters[request_id] = fut

    await ws_send({
        "kind": "skill_request",
        "request_id": request_id,
        "correlation_id": None,
        "payload": payload,
        "sent_at": asyncio.get_running_loop().time(),
    })

    try:
        return await asyncio.wait_for(fut, timeout=timeout)
    except asyncio.TimeoutError:
        await ws_send({
            "kind": "skill_cancel",
            "request_id": request_id,
            "correlation_id": request_id,
            "payload": {"reason": "orchestrator_timeout"},
            "sent_at": asyncio.get_running_loop().time(),
        })
        raise
    finally:
        waiters.pop(request_id, None)

Three properties:

The waiter is registered before the request is sent. Otherwise the response can arrive before the orchestrator is listening for it, a real race when the remote machine is fast and the local event loop is briefly blocked.
On timeout, send a skill_cancel. The remote agent listens for it and aborts the in-progress job. Without this, you accumulate zombie work on the remote side proportional to your timeout rate.
waiters.pop in finally. Whether success, timeout, or unexpected error, the waiter dict doesn't grow. A waiter dict that grows is the first sign of a memory leak two weeks before it pages someone.

For long-running skills, prefer a heartbeat + extend-deadline protocol over a fixed timeout. The remote sends skill_progress every 5s, and each progress message resets a per-request deadline server-side. A job that takes 4 minutes never trips the timeout, but a job that dies silently still gets cancelled within 10s.

Putting it together

The orchestrator side ends up with three layers: a single WebSocket route that handles handshake + envelope dispatch, a per-kind handler registry, and a call_remote() helper that wraps the request/response correlation. The remote agent mirrors this — one connect loop, one envelope decoder, one skill dispatch table.

Total surface: ~400 lines of orchestrator code, ~300 lines of agent code. The protocol survives schema migrations because new kind values don't touch the envelope, and the failure modes (bad auth, dead link, slow skill) each have one canonical handler.

The pattern scales horizontally to N remote agents with no orchestrator changes; the connected-agents pool is just dict[agent_id, WebSocket]. Round-trip latency on a LAN sits around 8-12ms including JSON encode/decode, which is roughly 20× faster than the equivalent HTTP polling loop you'd otherwise write.

References: