Scaling WebSocket Signaling with Redis Pub/Sub

A single Node.js signalling process holds its room map in memory, so two peers in the same room only find each other if they happen to land on the same process. The moment you put a second node behind a load balancer, a peer connected to node A stops receiving the offers a peer on node B emits into the same room. This guide is part of the WebSocket Signaling Implementation guide, and it solves exactly that problem: how to fan room messages out across every node so a horizontally scaled signalling tier behaves like one logical server.

Context & Trade-offs

There are two ways to make multi-node signalling correct, and they are not mutually exclusive.

Sticky sessions pin each client to one node for the life of its connection (via a load-balancer cookie or IP hash). Sticky routing keeps a single client’s frames on one process, which matters because the WebSocket upgrade and any per-connection state must stay put. But stickiness alone does not solve fan-out: two peers in the same room can still be sticky to two different nodes, so node A must still be able to deliver a message to a socket living on node B.

A Redis Pub/Sub backplane is what actually carries room messages between nodes. Every node subscribes to Redis channels; when a node receives a message destined for a room, it publishes to Redis, and every node holding a member of that room receives it and forwards to its local sockets. Redis Pub/Sub is fire-and-forget β€” sub-millisecond within a region, no persistence, no delivery guarantee if a subscriber is momentarily disconnected. That is acceptable for signalling, where a dropped offer is recovered by the reconnect-and-rejoin logic from the parent guide, and where lost candidates are re-gathered on an ICE restart.

The practical recommendation: use the Socket.IO Redis adapter (which manages the backplane for you) and keep sticky sessions on as well. Sticky sessions are still required for Socket.IO’s HTTP long-polling fallback, because a polling client makes multiple HTTP requests that must reach the same node to reconstruct the session. Plan one Redis connection pair (one publisher, one subscriber) per node β€” Redis Pub/Sub subscribers cannot issue normal commands on the same connection β€” and expect signalling delivery to stay sub-10 ms end-to-end as long as Redis and your nodes share a region.

Minimal Runnable Implementation

The Socket.IO Redis adapter replaces the per-process room map with one backed by Redis Pub/Sub. After io.adapter(...) is installed, socket.to(roomId).emit(...) transparently reaches members on every node β€” your application code from the WebSocket Signaling with Node.js & Socket.IO guide does not change.

// npm install socket.io @socket.io/redis-adapter redis
const { Server } = require('socket.io');
const { createClient } = require('redis');
const { createAdapter } = require('@socket.io/redis-adapter');

const io = new Server(3000, { cors: { origin: process.env.ALLOWED_ORIGIN ?? '*' } });

// Pub/Sub needs TWO connections: a subscriber cannot run normal commands.
const pubClient = createClient({ url: process.env.REDIS_URL });
const subClient = pubClient.duplicate();

(async () => {
  await Promise.all([pubClient.connect(), subClient.connect()]);
  io.adapter(createAdapter(pubClient, subClient)); // room emits now fan out
  console.log(`Signaling node ${process.pid} joined the Redis backplane`);
})();

const signaling = io.of('/signaling');

signaling.on('connection', (socket) => {
  socket.on('join', (roomId) => {
    if (typeof roomId !== 'string' || roomId.length > 128) return; // validate
    socket.join(roomId); // membership is tracked cluster-wide via the adapter
  });
  // These relays now reach peers on ANY node, not just this process:
  socket.on('offer',     (d) => socket.to(d.roomId).emit('offer', d));
  socket.on('answer',    (d) => socket.to(d.roomId).emit('answer', d));
  socket.on('candidate', (d) => socket.to(d.roomId).emit('candidate', d));
});

If you run raw ws instead of Socket.IO, you wire the same pattern by hand: each node SUBSCRIBEs to a room:<id> channel and PUBLISHes inbound messages, forwarding anything it receives to its local sockets for that room.

// Raw ws + node-redis: hand-rolled fan-out across nodes
const sub = createClient({ url: process.env.REDIS_URL });
const pub = sub.duplicate();
const localRooms = new Map(); // Map<roomId, Set<ws>> β€” this node's sockets only

await sub.connect(); await pub.connect();

function joinRoom(roomId, ws) {
  if (!localRooms.has(roomId)) {
    localRooms.set(roomId, new Set());
    // First local member: subscribe this node to the room's channel
    sub.subscribe(`room:${roomId}`, (raw) => {
      const { payload, originPid } = JSON.parse(raw);
      if (originPid === process.pid) return; // don't echo our own publishes
      for (const peer of localRooms.get(roomId) ?? [])
        if (peer.readyState === peer.OPEN) peer.send(payload);
    });
  }
  localRooms.get(roomId).add(ws);
}

function routeToRoom(roomId, payload, sender) {
  // Deliver locally, then fan out to other nodes via Redis
  for (const peer of localRooms.get(roomId) ?? [])
    if (peer !== sender && peer.readyState === peer.OPEN) peer.send(payload);
  pub.publish(`room:${roomId}`, JSON.stringify({ payload, originPid: process.pid }));
}

Reproduction Steps & Debugging Log Patterns

  1. Start Redis locally (redis-server) and launch two signalling nodes on ports 3000 and 3001, both pointed at the same REDIS_URL.
  2. Put a load balancer (or two browser tabs connecting directly, one to each port) in front, and join both clients to room-1.
  3. In a third terminal, watch the traffic with redis-cli PSUBSCRIBE '*'. When tab A emits an offer you should see the adapter publish on a Socket.IO request channel:
psubscribe socket.io#/signaling#room-1#
pmessage   socket.io#/signaling#room-1#  
[node :3001] relayed offer to room-1 (1 local member)
  1. Confirm tab B (on the other node) receives the offer. If it does not, the two nodes are not on the same backplane β€” check that both connected to the same Redis instance and that io.adapter(...) ran before any connection was accepted.
  2. Kill node 3000 mid-call. Tab A’s client should reconnect (Socket.IO backoff) and the load balancer should route it to node 3001; on rejoin it re-emits join, and Redis-backed membership restores fan-out. Watch iceConnectionState stay connected β€” the media path is independent of which signalling node you land on.

Common Implementation Mistakes

FAQ

Do I still need sticky sessions once the Redis adapter is installed?

For pure WebSocket transport, the adapter alone makes fan-out correct, so stickiness is optional. But keep it on if you allow the long-polling fallback, since polling requires every request from a client to reach the same node. Stickiness plus the adapter is the safe default.

How many concurrent connections before I need this?

A single Node process comfortably handles thousands of idle signalling sockets, but signalling is bursty β€” calls connect in waves. Add a second node and the Redis backplane as soon as you need redundancy or exceed the connection budget of one process; the change is a few lines once you are already on Socket.IO.

Does Redis Pub/Sub guarantee my offer is delivered?

No. It is best-effort with no persistence. That is fine because the signalling layer already recovers from loss: the client reconnects and re-emits its description, and the media plane re-gathers candidates on an ICE restart. If you need at-least-once delivery, Redis Streams or a durable bus like NATS JetStream is the heavier alternative.

Related: this builds on WebSocket Signaling with Node.js & Socket.IO, sits under the WebSocket Signaling Implementation guide, and pairs with Signaling State Machine Patterns for modelling reconnect-and-rejoin transitions across nodes.