WebSocket tests are slow by default. Setting up a real connection — TLS handshake, ASGI lifecycle, application startup — takes ~100 ms before your test fires its first message. Multiply by 475 tests and you're staring at a coffee break every CI run.

We got our suite to 110 seconds wall-clock on pytest-xdist with four workers. That's roughly a 4× speed-up over single-worker (~440 s). Two non-obvious moves did most of the work: shared-resource aggregators, and a per-worker startup hook that builds the test app exactly once.

Why WebSocket tests are slow

A typical WebSocket test:

async def test_disconnect_drains_session(app):
    async with app.websocket_session(user=fake_user) as ws:
        await ws.send_json({"type": "subscribe", "channel": "trip:42"})
        await ws.receive_json()    # ack
        await ws.close()
    # assertion: server-side session registry no longer has fake_user

Setup cost: spin up the ASGI lifespan (Postgres pool, Redis pool, Streams consumer-group registration, auth keyring load), create a test user, sign a JWT, open the WebSocket. That's 100–200 ms before anything domain-relevant runs.

Teardown cost: close cleanly, drain the consumer group, delete the test user. Another ~50 ms.

If every test gets its own app + session, you're at ~150 ms baseline. 475 tests = ~70 seconds of pure overhead. The actual test logic is fast.

Where you can't share

The reflex is "OK, share the app across tests." That works for stateless reads. It does not work for tests that mutate global state — session registries, consumer-group offsets, rate-limit counters. Sharing the app means test N+1 starts in whatever state test N left it in. Order-dependent failures, flaky on CI, the works.

You want isolated app instances per test, but with shared infrastructure (database, Redis pool, broker). That's where xdist makes things interesting.

The xdist worker isolation problem

pytest-xdist runs N worker processes. Each worker is a separate interpreter — no shared memory, no shared imports across workers. Whatever you initialize in a fixture lives only inside that worker.

Naive setup: each test gets its own app, each app gets its own DB pool, each app gets its own Redis pool. With 4 workers × ~120 tests per worker × ~150 ms setup, you save nothing — you've just parallelized the slow setup.

The fix: set up the expensive shared bits (DB pool, Redis pool, app instance) once per worker, not once per test. Each test gets a thin wrapper (a fresh app.websocket_session(...)) over the shared infrastructure.

Worker-startup hooks

pytest-xdist doesn't have a first-class "run once per worker" hook, but there's a clean workaround: a session-scoped fixture combined with the worker_id fixture xdist provides.

# conftest.py
@pytest.fixture(scope="session")
async def shared_app(worker_id):
    """Build the app + connection pools once per xdist worker."""
    pool  = await create_db_pool()
    redis = await create_redis_pool()
    app   = await build_app(pool=pool, redis=redis)
    await app.startup()
    yield app
    await app.shutdown()
    await pool.close()
    await redis.close()

scope="session" in xdist means "once per worker," not once globally. That's exactly what we want. worker_id is the xdist-provided fixture (gw0, gw1, ...) you use to namespace shared resources so workers don't collide.

Shared-resource aggregators

Some resources need to be aggregated across workers — most importantly, test databases. Each worker can't run its own Postgres; that's silly. They share one instance, but they need isolated tables.

The pattern: a small registry, scoped per worker, that hands out an isolated slice of the shared resource.

@pytest.fixture(scope="session")
def worker_db_schema(worker_id):
    """Each worker gets a fresh schema in the shared test DB."""
    schema = f"test_{worker_id}_{os.urandom(4).hex()}"
    with psycopg.connect(SHARED_TEST_DB_URL) as conn:
        conn.execute(f"CREATE SCHEMA {schema}")
        run_migrations(conn, schema)
    yield schema
    with psycopg.connect(SHARED_TEST_DB_URL) as conn:
        conn.execute(f"DROP SCHEMA {schema} CASCADE")

Each worker gets its own schema in the same database. Migrations run per-worker (fast on an empty schema). Tests run isolated. Teardown drops the schema.

For Redis, a key prefix per worker. For the WebSocket app's in-memory session registry, each worker has its own app instance — no aggregation needed there.

Putting it together

The fixture stack for a typical WebSocket test:

@pytest.fixture
async def test_user(shared_app, worker_db_schema):
    user = await create_test_user(shared_app, schema=worker_db_schema)
    yield user
    await delete_test_user(shared_app, user)

@pytest.fixture
async def ws_session(shared_app, test_user):
    async with shared_app.websocket_session(user=test_user) as ws:
        yield ws

shared_app costs 1–2 seconds once per worker — call it 8 seconds across 4 workers. worker_db_schema is another ~500 ms per worker — 2 more seconds. Total worker-startup overhead: ~10 seconds.

After that, each test is just create_test_user + websocket_session setup — maybe 15 ms each.

Results

The measured numbers from CI:

The 4× speed-up at 4 workers is better than ideal because we also eliminated per-test app construction. The speed-up from sharing alone (single worker, shared app) was ~2.5×; parallelism on top of that added another ~1.5×.

Lessons

Slow tests usually mean expensive setup, not expensive logic. Profile your fixture stack before you parallelize. The win we got from sharing the app was bigger than the win from parallelism.

Share the expensive bits, isolate the stateful bits. Connection pools and app instances can be per-worker. Session registries and DB tables cannot.

xdist's worker_id is your friend. Namespace everything with it — schemas, key prefixes, log files — and you get clean per-worker isolation for free, no extra coordination.