Managing Audio Focus and Echo Cancellation Across Devices in WebRTC

This guide is part of the Audio/Video Track Management guide, and it solves a single concrete problem: keeping audio routed correctly and echo-free when a user switches between Bluetooth, wired headsets, and speakerphone mid-call without tearing down the WebRTC session.

Context & Trade-offs

When a WebRTC session starts, the browser hands routing to the platform audio HAL. Android implicitly requests communication mode; iOS routes through AVAudioSession under Safari’s control. The moment the active output device changes β€” a Bluetooth headset disconnects, a USB mic is plugged in β€” the OS reinitialises the routing path, and during that window hardware acoustic echo cancellation (AEC) is briefly disabled. The browser falls back to its software AEC with a stale delay estimate, producing audible echo until the delay line recalibrates, typically within 1–3 seconds.

The core trade-off is hardware versus software AEC. Hardware AEC (negotiated when you request echoCancellation: true) has near-zero CPU cost and accurate latency on a stable route, but it loses its calibration on every device switch. Software AEC is portable but assumes a fixed mic-to-speaker delay; after a switch from a 40 ms Bluetooth path to a sub-5 ms wired path, that assumption is wrong and residual echo leaks through. Forcing a track recreation after a switch costs a getUserMedia round trip (roughly 100–300 ms of mic re-acquisition) but resets AEC cleanly β€” usually the right call when the route changes drastically. Routing alone via replaceTrack() keeps the jitter buffer and SSRC intact, which matters for not perturbing Bandwidth Estimation & Congestion Control, but does not reset AEC state.

There is a second axis to the decision: focus versus routing. Focus is the OS-level question of which application owns the audio path β€” when a phone call, a navigation prompt, or another conferencing app grabs communication focus, your track may transition to muted even though the device never physically changed. Routing is the question of which physical endpoint the audio flows to. These are independent, and conflating them leads to spurious teardowns: a muted event from focus loss is recoverable and you should keep the sender alive, whereas an ended event means the device is gone and you must reacquire. The mute-vs-ended distinction is the same one drawn in the parent Audio/Video Track Management guide, and it is doubly important for audio because mobile focus arbitration toggles muted far more often than video sources ever do. Budget for a brief 1–3 s glitch on every transition rather than trying to eliminate it; the recovery target is graceful recalibration, not instantaneous perfection.

Always request echoCancellation: true and noiseSuppression: true explicitly so the browser negotiates hardware AEC instead of silently using a software pipeline with incorrect latency assumptions. Leave autoGainControl: false on high-gain microphones β€” aggressive AGC can drive the AEC filter into divergence.

Minimal Runnable Implementation

// Acquire audio with explicit AEC constraints, then swap devices on the
// SAME RTCRtpSender β€” preserving SSRC and jitter buffer (no SDP renegotiation).
async function acquireAndManageAudio(pc) {
  const stream = await navigator.mediaDevices.getUserMedia({
    audio: {
      echoCancellation: true,   // negotiate hardware AEC with the platform driver
      noiseSuppression: true,
      autoGainControl: false,   // keep AGC off on high-gain mics to stabilise AEC
      deviceId: { ideal: 'default' }
    }
  });

  const sender = pc.getSenders().find(s => s.track?.kind === 'audio');

  // React to OS-level device changes (unplug/replug, Bluetooth drop).
  navigator.mediaDevices.addEventListener('devicechange', async () => {
    const newStream = await navigator.mediaDevices.getUserMedia({
      audio: { echoCancellation: true, noiseSuppression: true, autoGainControl: false }
    });
    const newTrack = newStream.getAudioTracks()[0];
    // replaceTrack keeps the stream alive; recreate via getUserMedia (above)
    // so AEC re-initialises against the new route instead of reusing stale delay.
    await sender.replaceTrack(newTrack);
    console.log('audio route swapped, AEC re-initialised');
  });

  // Mute the mic when the tab is backgrounded to prevent echo accumulation.
  document.addEventListener('visibilitychange', () => {
    stream.getAudioTracks()[0].enabled = !document.hidden;
  });

  return stream;
}

Reproduction Steps & Debugging Log Patterns

  1. Start a call on a mobile device with Bluetooth headphones connected and echoCancellation: true.
  2. Force-disconnect Bluetooth via OS settings while the call is live; observe routing fall back to speakerphone.
  3. Poll getStats() for echoReturnLossEnhancement and totalAudioEnergy on the media-source report. Expect ERLE to dip for 1–3 s as software AEC recalibrates against the new delay.
  4. Confirm whether MediaStreamTrack.muted toggled (focus arbitration) rather than ended (device gone) β€” they demand different recovery.
  5. If echo persists past recalibration, recreate the track with a fresh getUserMedia() to force AEC re-init.

A note on the setSinkId path for output routing: input device changes go through getUserMedia/replaceTrack as above, but steering playback to a specific speaker is a separate call on the playback HTMLMediaElement, not on the track. Maintain a registry of available sinks from enumerateDevices() (filtering kind === 'audiooutput'), feature-detect setSinkId, and reassign on devicechange. Output routing never touches AEC calibration β€” the echo path is governed by the input device and the OS mixer β€” so you can switch sinks freely without recreating tracks.

Expected console / internals output:

// chrome://webrtc-internals audio processing graph, or about:webrtc on Firefox:
// AEC: Hardware AEC disabled, falling back to WebRTC APM
// AudioDeviceModule: Audio delay compensation applied: 120ms
// AEC: Divergence detected, resetting filter
// MediaStreamTrack: muted state changed to true (focus lost)

Platform-Specific Routing Behaviour

The same code produces materially different routing on each platform, and knowing the defaults saves hours of guesswork. On Android, requesting getUserMedia with audio implicitly puts the device into communication mode, which biases routing toward the earpiece or the connected headset and engages hardware AEC tuned for voice. Disconnecting a Bluetooth device hands the route back to the speakerphone, and the brief gap is where hardware AEC drops out. On iOS, Safari drives AVAudioSession and you have no direct API to pin the route β€” the OS decides, and a pagehide or incoming phone call will preempt your session entirely; your only lever is muting via enabled on visibilitychange and reacquiring on return. On desktop Chrome and Firefox, routing is far more deterministic: enumerateDevices exposes stable deviceId values you can pin with deviceId: { exact: ... }, and setSinkId reliably steers output, so device-switch logic that works on desktop will still need the mobile focus-handling described above before it ships.

A consequence of these differences is that you cannot test echo behaviour purely on desktop. A wired headset on a laptop almost never exposes the AEC recalibration glitch, because the mic-to-speaker path barely changes. Reproduce on a real handset with a Bluetooth-to-speaker transition to see the 1–3 s ERLE dip and confirm your recovery path. Tie this telemetry into the broader Media Handling, Codecs & Bandwidth Estimation observability you already run so audio regressions surface in the same dashboards as video and bandwidth ones.

Common Implementation Mistakes

FAQ

Why does echo return when switching from Bluetooth to wired headphones during a call?

The OS reinitialises the routing path and temporarily disables hardware AEC. The browser’s software AEC keeps its old delay estimate, which is wrong for the new, much shorter wired path, so echo leaks until it recalibrates in roughly 1–3 seconds. Recreating the audio track with a fresh getUserMedia() forces immediate recalibration.

How can I verify hardware echo cancellation is actually active?

Set echoCancellation: true, then call track.getSettings() and check the echoCancellation field. true confirms AEC is on but does not distinguish hardware from software; inspect the chrome://webrtc-internals audio processing graph for the definitive answer.

Does iOS Safari support audio focus management?

iOS Safari enforces AVAudioSession routing and requests communication mode automatically, but you must handle pagehide/visibilitychange to mute tracks when backgrounded, since iOS suspends Web Audio and MediaStream processing in background tabs.

Should I switch input devices with replaceTrack or by recreating the track?

Use replaceTrack when the route change is mild and you want to preserve the jitter buffer and SSRC continuity. Recreate the track with a fresh getUserMedia when the physical path changes substantially β€” Bluetooth to wired, or speakerphone to handset β€” because only re-acquisition forces the software AEC to recalibrate its delay line against the new latency.

Related: return to Audio/Video Track Management, or compare with Replacing Video Tracks Without Renegotiation and the Media Constraints & Device Enumeration guide.