SFU vs MCU Cost & Quality Trade-offs

Picking between a forwarding server and a mixing server is ultimately a cost-per-participant-hour decision crossed with a quality-and-flexibility constraint. This guide is part of the SFU vs MCU Topologies guide, and it answers one concrete question: for a room of a given size and a given client mix, which topology costs less to run and what quality do you trade to get there? The answer turns on four levers — server CPU, bandwidth math, layout flexibility, and mobile decode limits — which together set your dollar cost per participant-hour.

Context & Trade-offs

The two topologies move cost between CPU and bandwidth, and the cheaper option flips with room shape.

Server CPU. An SFU forwards encoded RTP and never decodes the payload, so its per-participant CPU is a few percent of a core — header rewrites, RTCP translation, and SRTP crypto. An MCU decodes every inbound stream, composites, and re-encodes one output per layout. A single 720p30 decode plus a share of a 720p encode is on the order of 0.2–0.5 of a modern core per active publisher. Concretely, one commodity 16-core cloud instance forwards roughly 500–1,000 simultaneous SFU streams, but mixes only 30–80 MCU publisher-inputs before saturating — a 10–20× CPU density gap.

Bandwidth math. Define per-participant uplink U (what a sender pushes, e.g. 1.5 Mbps for 720p) and downlink D (what a viewer pulls). In an SFU room of N participants, the server ingests N·U and egresses up to N·(N-1)·U — quadratic, the SFU’s dominant cost. Each client downloads up to (N-1)·U. In an MCU room the server still ingests N·U, but egresses only N·D_mix where D_mix is one composited stream (e.g. 2 Mbps), and each client downloads exactly D_mix regardless of N. So for a 10-party call at 1.5 Mbps uplink: the SFU server pushes up to ~135 Mbps; the MCU server pushes ~20 Mbps but burns the CPU to make it.

Layout flexibility. The SFU hands every client the original streams, so layout is a client-side CSS/canvas decision — any grid, any pinned speaker, per-user, free. The MCU bakes the layout into the encode, so every distinct arrangement is a separate server-side encode. Uniform layouts are nearly free on an MCU; per-user custom views erase its single-encode advantage.

Mobile decode limits. Phones cap concurrent hardware video decoders, commonly at 1–3. An SFU forwarding N-1 streams overruns that cap on large rooms, forcing software decode (CPU and battery spikes) or dropped frames. An MCU sends exactly one stream, so any client decodes the whole room with a single decoder — its decisive quality win on low-power endpoints.

There is a fifth lever that hides inside the other four: latency. An SFU adds only RTP-header rewrite plus network transit, so it contributes little beyond the path RTT you already measure on the candidate-pair report. An MCU routes every frame through decode → composite → re-encode, adding roughly 30–150 ms depending on codec, resolution, and how deep the jitter buffer is set. For conversational calls that delay is felt; for a one-way broadcast it is invisible. Treat latency as a quality cost that the dollar model does not capture: two topologies can cost the same per participant-hour and still deliver very different interactivity. The connectivity tier — ICE, STUN, and TURN relays from WebRTC Protocol Stack & Signaling Servers — adds the same baseline latency to both, so the encode/decode hop is the only differentiator you control here.

Lever SFU MCU
Server CPU / publisher ~0.02–0.05 core (forward) ~0.2–0.5 core (decode+encode)
Streams/instance (16-core) ~500–1,000 forwarded ~30–80 mixed inputs
Server egress (N=10 @1.5 Mbps) up to ~135 Mbps ~20 Mbps
Client downlink up to (N-1)·U one fixed D_mix
Client decoders up to N-1 exactly 1
Layout client-side, free server-encoded per layout
$ / participant-hour (N≈10) ~$0.002–0.006 (bandwidth-led) ~$0.01–0.03 (CPU-led)

The dollar figures are order-of-magnitude cloud estimates: SFU cost tracks egress (priced per GB), MCU cost tracks compute (priced per core-hour). For interactive 5–20 party calls the SFU is usually 3–5× cheaper per participant-hour; for a large passive audience on one fixed layout, the MCU’s shared single encode wins decisively.

Two structural facts make the table behave non-linearly. First, the SFU’s egress term is N·(N-1), so doubling room size roughly quadruples server bandwidth — the curve is gentle at 4 participants and brutal at 40. Second, the MCU’s CPU term is linear in publishers but steps with layouts: one shared grid is one encode, but the moment each viewer wants a personalized active-speaker arrangement you pay N encodes and the MCU’s per-participant cost climbs toward the SFU’s without recovering the SFU’s zero-encode benefit. The practical reading is that the SFU wins the small-and-interactive quadrant and the MCU wins the large-and-uniform quadrant, with a band in the middle where client capability — not dollars — decides. When you do outgrow a single node, neither cost model survives unchanged; horizontal fan-out and cascading introduce inter-node bandwidth that you plan for in Load Balancing & Scaling SFUs and the room-distribution patterns of Sharding Rooms Across SFU Nodes.

Minimal Runnable Implementation

A cost model that picks the cheaper topology from room shape and client capability.

// Decide topology from room size, client decode budget, and cloud unit prices.
// Returns the cheaper option plus the estimated $/participant-hour.
function chooseTopology({ n, uplinkMbps = 1.5, mixMbps = 2.0, minClientDecoders = 1 }) {
  const HOURS = 1;
  const GB_PER_HOUR = (mbps) => (mbps * 3600) / 8 / 1000;   // Mbps → GB/hour
  const PRICE_EGRESS_GB = 0.08;                              // $/GB cloud egress
  const PRICE_CORE_HOUR = 0.04;                              // $/core-hour

  // SFU: cost is server egress, ~N*(N-1) forwarded streams; CPU negligible
  const sfuEgressGb = GB_PER_HOUR(n * (n - 1) * uplinkMbps) * HOURS;
  const sfuCostPerPH = (sfuEgressGb * PRICE_EGRESS_GB) / n;

  // MCU: cost is CPU (decode all + encode one layout); egress is small
  const mcuCoresPerInput = 0.35;                             // decode + encode share
  const mcuCpuCost = n * mcuCoresPerInput * PRICE_CORE_HOUR * HOURS;
  const mcuEgressGb = GB_PER_HOUR(n * mixMbps) * HOURS;
  const mcuCostPerPH = (mcuCpuCost + mcuEgressGb * PRICE_EGRESS_GB) / n;

  // Hard constraint: phones that can't decode N-1 streams force MCU regardless of cost
  const sfuExceedsDecode = (n - 1) > minClientDecoders;

  const pickMcu = sfuExceedsDecode || mcuCostPerPH < sfuCostPerPH;
  return {
    topology: pickMcu ? 'MCU' : 'SFU',
    reason: sfuExceedsDecode ? 'client-decode-limit' : 'lower-cost',
    sfuCostPerPH: +sfuCostPerPH.toFixed(4),
    mcuCostPerPH: +mcuCostPerPH.toFixed(4)
  };
}

console.log(chooseTopology({ n: 8 }));                       // interactive call → SFU
console.log(chooseTopology({ n: 8, minClientDecoders: 1 })); // weak clients → MCU
console.log(chooseTopology({ n: 200, mixMbps: 2.5 }));       // big audience → MCU

Reproduction Steps & Debugging Log Patterns

  1. Run the model across n = 2, 4, 8, 16, 32 and log sfuCostPerPH vs mcuCostPerPH to find the crossover room size for your prices.
  2. Stand up one SFU and one MCU instance, drive a synthetic 8-party room into each, and poll getStats() at 1 s intervals on the server, recording totalEncodeTime, framesEncoded, and egress bytes.
  3. On a target phone, subscribe to the SFU room and read inbound-rtp per video track; watch for framesDropped climbing once the decoder count exceeds the hardware ceiling.
  4. Compare measured server egress (SFU) and core utilization (MCU) against the model’s predictions and adjust the unit prices.
  5. Re-run with per-user layouts enabled on the MCU and confirm its cost rises toward SFU levels.

Expected log shape:

// n=4   sfu=$0.0018/ph  mcu=$0.0140/ph   pick=SFU
// n=16  sfu=$0.0072/ph  mcu=$0.0140/ph   pick=SFU
// n=32  sfu=$0.0149/ph  mcu=$0.0140/ph   pick=MCU (cost crossover)
// SFU server: totalEncodeTime=0.000s framesEncoded=0   // forwarding, never encodes
// MCU server: totalEncodeTime rising, cores≈n*0.35      // mixing pipeline active
// phone (SFU, n=6): inbound video tracks=5, framesDropped climbing  // decode-limit hit

A healthy SFU shows flat encode time and egress scaling quadratically; a healthy MCU shows encode time and core count scaling linearly with publishers. If an “SFU” reports rising totalEncodeTime, it is transcoding and its cost model has silently become MCU-like.

Common Implementation Mistakes

FAQ

At what room size does the SFU stop being cheaper?

With typical cloud prices the crossover sits around 25–40 fully-subscribed participants, where the SFU’s quadratic egress finally exceeds the MCU’s linear CPU cost. Below that, the SFU is usually 3–5× cheaper per participant-hour; above it, or for large passive audiences on one layout, the MCU wins. Re-run the model with your own egress and core prices, since the crossover moves with them.

Does the MCU give better quality?

Only on constrained clients. The MCU guarantees a single decodable stream and consistent composition, which is strictly better for phones and embedded endpoints. On capable clients the SFU preserves the original per-stream quality and resolution with lower latency and no re-encode generation loss, so it generally looks better there.

How do I cut SFU bandwidth without switching to an MCU?

Forward fewer or lower layers per subscriber: cap visible tiles with active-speaker culling and let the server pick the right simulcast layer per client, as in Forwarding Simulcast Layers by Subscriber Bandwidth. That keeps SFU CPU economics while bending the quadratic egress curve down.

Related: return to SFU vs MCU Topologies, then design the forwarding path in Selective Forwarding Unit Design and plan for growth in Load Balancing & Scaling SFUs.