Sharding Rooms Across SFU Nodes

This guide is part of the Load Balancing & Scaling SFUs guide, and it answers one specific question: given a pool of SFU nodes and a stream of joining participants, how do you decide which node owns a given room — and keep that decision stable, recoverable, and cheap to look up from every signaling instance at once?

The decision is a routing one, not a media one. Every participant of a room must resolve to the same node so the SFU can forward locally; the only hard parts are choosing the node deterministically, storing the mapping where all signaling instances can read it, and keeping rooms reachable when a node disappears or the pool changes size.

Context & Trade-offs

There are two viable assignment strategies, and they trade rebalance cost against placement quality.

Consistent hashing maps roomId to a node by hashing the room onto a ring of nodes. It needs no shared state to compute a placement — any instance hashing the same roomId lands on the same node — and when a node is added or removed, only 1/N of rooms remap rather than all of them. The cost is that it is capacity-blind: hashing ignores that one 50-person room costs 50 × 49 forwarded streams while a 2-person room costs 2, so a purely hash-based ring can pile heavy rooms onto one node.

Explicit affinity in a shared store records room → node once at first-join, chosen by live egress headroom, and pins every later joiner to it. This places rooms by real capacity but requires every lookup to hit the store and requires an explicit rebalance plan on node failure.

Most production tiers use a hybrid: consistent hashing to pick a candidate node cheaply, overridden by an explicit stored mapping when the hashed node is over its egress ceiling. The mapping lives in Redis — the same instance already coordinating Scaling WebSocket Signaling with Redis Pub/Sub — so a room lookup is a sub-millisecond GET shared across every signaling node. Keeping a room on one node matters because the alternative, splitting it, forces a cascade hop whose cost is detailed in SFU vs MCU Cost & Quality Trade-offs; cascade only when a room outgrows a single node.

The numbers that should drive the choice are concrete. A consistent-hash lookup is pure CPU — a single SHA-1 digest and a binary search over the ring, well under 50 µs — and needs no network round trip, which is why it makes a good first guess even when the authoritative answer lives in Redis. The Redis GET that confirms or pins affinity adds the network hop, typically sub-millisecond on a co-located instance, and that lookup sits on the join path of every participant, so it must stay a single round trip; never fan it out into a multi-key transaction per join. The TTL on the stored mapping (3600 s above) exists only to garbage-collect rooms that ended without a clean teardown — refresh it on activity so a long meeting never loses its pin mid-call. Set the vnode count once and leave it: 160 virtual nodes per physical node keeps per-node room-count spread under ~7% on a pool of 3–10 nodes, which is the regime most SFU tiers run in.

Minimal Runnable Implementation

The core is a hash ring with virtual nodes (so load spreads evenly), wrapped by a Redis-backed mapping that pins affinity and lets capacity override the hashed choice.

const crypto = require('crypto');

// Consistent-hash ring with virtual nodes for even spread across the pool.
class HashRing {
  constructor(nodeIds, vnodes = 160) {
    this.ring = [];                              // sorted [hash, nodeId] pairs
    for (const id of nodeIds) {
      for (let v = 0; v < vnodes; v++) {
        this.ring.push([this._hash(`${id}#${v}`), id]); // many points per node
      }
    }
    this.ring.sort((a, b) => a[0] - b[0]);
  }
  _hash(key) {
    // 32-bit unsigned from the first 4 bytes of a SHA-1 digest.
    return crypto.createHash('sha1').update(key).digest().readUInt32BE(0);
  }
  // First node clockwise from the room's hash owns it.
  nodeFor(roomId) {
    const h = this._hash(roomId);
    for (const [point, id] of this.ring) if (point >= h) return id;
    return this.ring[0][1];                      // wrap around the ring
  }
}

// Resolve a room to a node: prefer the stored affinity, else hash, else
// override the hashed node if it has no egress headroom for this room.
async function resolveNode(roomId, expectedSize, redis, ring, nodes) {
  const pinned = await redis.get(`room:${roomId}:node`);
  if (pinned) return pinned;                     // affinity wins — keep room whole

  let candidate = ring.nodeFor(roomId);          // cheap, stateless first guess
  const node = nodes.get(candidate);
  if (!node?.healthy || !hasHeadroom(node, expectedSize)) {
    candidate = leastLoadedHealthy(nodes, expectedSize); // capacity override
  }
  // Claim atomically so two concurrent first-joins can't split the room.
  const ok = await redis.set(`room:${roomId}:node`, candidate, 'NX', 'EX', 3600);
  return ok ? candidate : await redis.get(`room:${roomId}:node`); // lost race → read winner
}

The NX (set-if-not-exists) claim is what guarantees room affinity under concurrent joins: if two participants of a brand-new room arrive on two signaling instances at once, exactly one SET NX wins and both instances converge on the winner’s node. The hasHeadroom and leastLoadedHealthy helpers reuse the per-node egress accounting from the parent Load Balancing & Scaling SFUs guide.

Node failure is the other half of the runtime, and it needs its own path because a dead node’s mappings still point at it. A failure detector — heartbeats on the same shared store, flipping a node unhealthy after roughly 3 missed beats — triggers re-resolution for exactly the rooms that node owned, not the whole pool. The rebalance must be surgical: clear only room:*:node entries that resolve to the dead node and re-run resolveNode for each, which (because the dead node is now absent from nodes and the ring) routes them onto healthy nodes and signals their clients to reconnect.

// On node death, re-home only that node's rooms — never flush the whole map.
async function rebalanceAfterFailure(deadNodeId, redis, ring, nodes) {
  ring.remove(deadNodeId);                       // drop its vnodes from the ring
  const rooms = await redis.smembers(`node:${deadNodeId}:rooms`);
  for (const roomId of rooms) {
    await redis.del(`room:${roomId}:node`);      // clear stale pin to the dead node
    const size = await redis.scard(`room:${roomId}:peers`);
    const target = await resolveNode(roomId, size, redis, ring, nodes); // re-pin
    await signalReconnect(roomId, target);       // clients ICE-restart onto target
  }
  await redis.del(`node:${deadNodeId}:rooms`);   // drop the dead node's room set
}

Removing the dead node’s vnodes from the ring before re-resolving is what bounds the disruption to that node’s share — rooms on surviving nodes keep hashing to where they already are, so only the failed node’s rooms move. That is the same 1/N property that makes consistent hashing worth using over a plain modulo, now applied to failure instead of scale-out.

Reproduction Steps & Debugging Log Patterns

Start a 3-node ring and assign 1,000 synthetic rooms. Log the per-node room count; with 160 vnodes each node should hold roughly 310–360 rooms.

[ring] nodes=3 vnodes=160 rooms=1000
[ring] distribution node-a=341 node-b=338 node-c=321  (spread 6.2%)

Add a 4th node and re-resolve only unpinned rooms. Confirm that pinned rooms stay put and only about 1/N of unpinned rooms move.

[ring] node added=node-d  remapped=247/1000 (24.7%, ~1/4 expected)
[ring] pinned rooms unchanged=612  (affinity held)

Kill a node holding live rooms. Watch the failure detector flip it unhealthy and re-resolution route its rooms elsewhere.

[health] node-b missed 3 heartbeats → marked DOWN
[shard] room=r-8842 owner=node-b unreachable → re-resolving
[shard] room=r-8842 reassigned owner=node-c  (clients signaled to reconnect)

Confirm clients reconnect via ICE restart and re-pin. Expect a brief renegotiation, bounded at 3 attempts.

[client r-8842] connectionState=failed → iceRestart attempt 1/3
[shard] room=r-8842 re-pinned owner=node-c ttl=3600
[client r-8842] connectionState=connected  (migrated, 2.1s)

Verify no orphaned mapping survives: after the dead node is gone, its room:*:node entries should point only at live nodes.

Common Implementation Mistakes

Hashing without virtual nodes. A bare ring with one point per node distributes rooms unevenly — variance can exceed 30% across a small pool. Use 100–200 vnodes per node so the distribution smooths out.
Capacity-blind hashing for large rooms. Pure consistent hashing ignores room size and can stack several heavy rooms on one node. Override the hashed choice with an egress-headroom check before pinning.
Storing the mapping only in node memory. If room → node lives on the signaling instance that assigned it, a failover or a second instance loses it and splits the room. Persist every mapping in Redis so all instances and any failover agree.
No atomic claim on first-join. Without SET NX, two concurrent first-joins race onto two different nodes and the room is silently split across both. Always claim the mapping atomically.
Treating node failure as permanent placement loss. On node death, clear only that node’s mappings and re-resolve those rooms; don’t flush the whole map or remap healthy rooms, which would needlessly reconnect every call.

FAQ

Consistent hashing or an explicit Redis map — which should I use? Both, layered. Hash to pick a cheap stateless candidate, then store an explicit room → node mapping in Redis at first-join so affinity survives rebalances and so a capacity check can override the hashed node for heavy rooms. The hash bounds rebalance churn to 1/N; the stored map gives you capacity-aware, recoverable placement.

What happens to a room when its node dies? The failure detector marks the node down after missed heartbeats, the shard layer clears that node’s mappings and re-resolves its rooms onto healthy nodes, and affected clients reconnect via a bounded ICE restart (max 3 attempts, 3–5 s fallback). Because state lives in Redis, any signaling instance can perform the reassignment.

Should a large room ever span multiple nodes? Only once it exceeds a single node’s egress ceiling. Up to that point, keep it on one node so forwarding stays local; beyond it, cascade the room across nodes — one relay hop per node rather than a full per-participant split — as described in the parent guide.

Related: this sits under Load Balancing & Scaling SFUs; pair it with SFU vs MCU Cost & Quality Trade-offs for the per-node cost model and Scaling WebSocket Signaling with Redis Pub/Sub for the shared store that holds the room map.

Sharding Rooms Across SFU Nodes

Context & Trade-offs

Minimal Runnable Implementation

Reproduction Steps & Debugging Log Patterns

Common Implementation Mistakes

FAQ

Related Guides