Bandwidth-Aware Layer Selection in an SFU

When a publisher sends three simulcast resolutions or an SVC stream with multiple spatial/temporal layers, the SFU must decide — per subscriber, continuously — which layer to forward. This guide is part of the Selective Forwarding Unit Design guide, and the exact problem it solves is this: given a subscriber’s transport-cc bitrate estimate that updates several times a second, pick the highest layer that fits inside that budget, switch cleanly when the budget moves, request a keyframe on every upswitch, and damp the decision with hysteresis so the stream does not oscillate.

Context & Trade-offs

The input signal is the subscriber’s available outgoing bitrate, derived from transport-wide congestion control feedback the way the Bandwidth Estimation & Congestion Control guide describes — updated roughly every 200 ms–1 s. The output is a layer index. The naive mapping (pick the highest layer whose nominal bitrate is below the estimate) oscillates badly: a momentary dip from 1.6 Mbps to 1.1 Mbps drops a subscriber from the 1.5 Mbps high layer to the 600 kbps mid layer, the freed headroom immediately reads as available again, and the SFU upswitches — producing a visible quality flap every few hundred milliseconds.

Two mechanisms tame this. Asymmetric thresholds: downswitch fast (protect the link the moment the estimate falls below the current layer’s bitrate) but upswitch slowly (require the estimate to clear the next layer’s bitrate by a margin and hold it). Hysteresis / dwell time: enforce a minimum time at a layer before another switch is allowed, commonly 2–3 s for upswitches, so a flapping estimate cannot drive a flapping stream. The cost of getting this wrong is concrete — every upswitch costs a keyframe (a large intra frame), so excessive switching both spikes bandwidth and undoes the savings simulcast was meant to deliver. The publisher-side encoding this depends on is configured per Simulcast & SVC Implementation, and the room-wide selection policy is the subject of Simulcast-Aware Forwarding.

Layer	Resolution	Target bitrate	Upswitch-into threshold	Downswitch-out threshold
Low	320×180	150 kbps	— (floor)	< 180 kbps
Mid	640×360	600 kbps	> 750 kbps held 2 s	< 660 kbps
High	1280×720	1500 kbps	> 1850 kbps held 2 s	< 1650 kbps

The asymmetry is deliberate: the upswitch-into threshold sits ~25% above the layer’s target bitrate, while the downswitch-out threshold sits ~10% above it — so the bands overlap and a single estimate value never sits in both an upswitch and a downswitch zone.

One more variable shapes the table: how much the SFU trusts the estimate. The transport-cc bitrate is itself a smoothed quantity that already absorbs short bursts of loss, so layering a second long smoothing window on top of it makes the selector sluggish without improving stability. Prefer a thin moving median (3–5 samples) over the raw estimate to reject single-sample spikes, then let the dwell timer — not additional averaging — do the rest of the de-noising. On a 1 s feedback cadence a 5-sample median already spans the dwell window, so the two mechanisms compose rather than fight.

Minimal Runnable Implementation

// Per-subscriber layer selector driven by transport-cc estimates.
// Downswitch immediately on starvation; upswitch only past a margin held for a dwell time.
class LayerSelector {
  constructor(layers, { upswitchHoldMs = 2000, requestKeyframe }) {
    this.layers = layers;              // [{ index, bitrate, upInto, downOut }], ascending
    this.current = 0;                  // start at the lowest layer until estimate proves headroom
    this.upHoldMs = upswitchHoldMs;
    this.requestKeyframe = requestKeyframe; // () => debounced PLI to the publisher
    this.upCandidateSince = 0;         // when the next-layer-up first became reachable
    this.lastSwitch = 0;
  }

  // Call on every transport-cc estimate update (a few times per second).
  onEstimate(availableBps, now = Date.now()) {
    const cur = this.layers[this.current];

    // --- Downswitch: react immediately, no dwell time, to protect the link. ---
    if (availableBps < cur.downOut && this.current > 0) {
      this.current -= 1;
      this.upCandidateSince = 0;       // cancel any pending upswitch
      this.lastSwitch = now;
      return this.current;             // no keyframe needed: lower layer is a subset / already decodable
    }

    // --- Upswitch: require the NEXT layer's margin, held continuously for the dwell time. ---
    const next = this.layers[this.current + 1];
    if (next && availableBps > next.upInto) {
      if (this.upCandidateSince === 0) this.upCandidateSince = now; // start the hold timer
      if (now - this.upCandidateSince >= this.upHoldMs) {
        this.current += 1;
        this.upCandidateSince = 0;
        this.lastSwitch = now;
        this.requestKeyframe();        // MUST request a keyframe: new layer needs a fresh intra to decode
        return this.current;
      }
    } else {
      this.upCandidateSince = 0;       // estimate dropped back below margin → reset the hold
    }

    return this.current;               // unchanged
  }
}

// Wire it to a subscriber. requestKeyframe is the debounced PLI from the SFU's KeyframeRequester.
const selector = new LayerSelector(
  [
    { index: 0, bitrate: 150_000,  upInto: 0,         downOut: 0 },
    { index: 1, bitrate: 600_000,  upInto: 750_000,   downOut: 180_000 },
    { index: 2, bitrate: 1_500_000, upInto: 1_850_000, downOut: 660_000 }
  ],
  { upswitchHoldMs: 2000, requestKeyframe: () => sub.keyframeRequester.request('layer-switch') }
);

// downOut indexes the layer you would drop INTO; map current layer's exit threshold accordingly.

The keyframe request on upswitch is not optional: forwarding a higher simulcast layer starts a new SSRC/encoding the subscriber’s decoder has never seen, so it cannot begin decoding until a keyframe arrives. Route that request through the SFU’s debounced keyframe requester so a room-wide upswitch coalesces into one PLI per source. Downswitches into a lower simulcast layer also need a keyframe in pure simulcast (each layer is independently encoded); SVC temporal/spatial downswitches within one bitstream often do not, because lower layers are a decodable subset — gate the keyframe call on whether the publisher is simulcast or SVC.

Reproduction Steps & Debugging Log Patterns

Publish a three-layer simulcast stream and subscribe a single client; throttle the subscriber’s downlink to 500 kbps. The selector should settle on the mid layer (600 kbps target, but its 750 kbps upInto is unmet so it stays at mid only if it started higher — from a cold start at low it will not upswitch into mid until 750 kbps is available; expect it to hold low at 500 kbps).
Raise the throttle to 2 Mbps and hold. Expect exactly one upswitch low→mid after the 2 s dwell, then mid→high after another 2 s, each accompanied by one keyframe.
Oscillate the throttle between 1.4 Mbps and 1.9 Mbps every 500 ms. A correct selector stays at mid (never sustains the 1.85 Mbps high upInto for 2 s); a broken one flaps high↔mid. Watch for the absence of repeated upswitch log lines.
Drop the throttle sharply to 120 kbps. Expect an immediate downswitch to low with no dwell delay — the downOut path must fire on the very next estimate.

[sub=A] estimate=1920000 layer=1 upCandidate started (next.upInto=1850000)
[sub=A] estimate=1900000 layer=1 upCandidate held 2010ms -> UPSWITCH 1->2, PLI requested
[sub=A] estimate=1610000 layer=2 DOWNSWITCH 2->1 (downOut=1650000), no keyframe
[sub=A] estimate=1700000 layer=1 upCandidate reset (1700000 < next.upInto 1850000)

The diagnostic tell of a flapping bug is alternating UPSWITCH / DOWNSWITCH lines within sub-second spacing; the tell of a stuck-low bug is an estimate well above upInto with upCandidate held resetting to 0 every line (a dwell timer that never accumulates because it is reset on an unrelated branch).

Common Implementation Mistakes

Symmetric thresholds. Using one bitrate boundary for both up and down switching guarantees oscillation around that value. Separate upInto (above the next layer) from downOut (just above the current layer) so the bands overlap.
No dwell time on upswitch. Reacting to a single high estimate sample upswitches on transient headroom and immediately downswitches when it evaporates — at the cost of a keyframe each time. Require the margin to hold for 2–3 s.
Forgetting the keyframe on upswitch. Switching to a higher simulcast layer without requesting a keyframe leaves the subscriber frozen on the last decoded frame until the publisher’s next periodic intra. Always request one (debounced) at the switch.
Requesting a keyframe on every SVC downswitch. SVC lower layers are a decodable subset; asking for an intra on every downswitch wastes publisher bitrate. Gate keyframe requests on simulcast-vs-SVC.
Allocating from a stale estimate. Driving selection off receiver reports or a publisher-side estimate instead of the subscriber’s own transport-cc feedback mis-sizes the layer. Use the per-subscriber availableOutgoingBitrate only.
Over-smoothing the input. Stacking a long moving average on top of the already-smoothed transport-cc estimate makes downswitches too slow to protect the link. Use a short median to reject spikes and let the dwell timer handle the rest.

FAQ

Why upswitch slowly but downswitch immediately? Congestion is asymmetric in cost: forwarding too high a layer onto a starved link tail-drops packets and corrupts the stream right now, so you must react instantly. Spare headroom, by contrast, is cheap to leave unused for a couple of seconds, and a hasty upswitch costs a keyframe and risks an immediate reversal — so you wait for the estimate to prove itself.

How big should the hysteresis margin and dwell time be? A margin of roughly 20–30% above the next layer’s target bitrate plus a 2–3 s dwell works well in practice. Larger values make the stream feel sluggish to recover quality; smaller values reintroduce flapping. Tune against your transport-cc update rate — faster feedback tolerates a shorter dwell.

Does this logic differ between simulcast and SVC? The threshold and hysteresis logic is identical, but keyframe handling differs: simulcast layers are independently encoded so any switch needs a keyframe, whereas SVC lets you drop spatial/temporal layers from one bitstream without a new intra on downswitch. Branch the keyframe request accordingly.

Related: this builds on Selective Forwarding Unit Design and feeds the room-wide policy in Simulcast-Aware Forwarding; ground the estimator in Bandwidth Estimation & Congestion Control.

Bandwidth-Aware Layer Selection in an SFU

Context & Trade-offs

Minimal Runnable Implementation

Reproduction Steps & Debugging Log Patterns

Common Implementation Mistakes

FAQ

Related Guides