Managing Audio Focus and Echo Cancellation Across Devices in WebRTC
This guide is part of the Audio/Video Track Management guide, and it solves a single concrete problem: keeping audio routed correctly and echo-free when a user switches between Bluetooth, wired headsets, and speakerphone mid-call without tearing down the WebRTC session.
Context & Trade-offs
When a WebRTC session starts, the browser hands routing to the platform audio HAL. Android implicitly requests communication mode; iOS routes through AVAudioSession under Safariβs control. The moment the active output device changes β a Bluetooth headset disconnects, a USB mic is plugged in β the OS reinitialises the routing path, and during that window hardware acoustic echo cancellation (AEC) is briefly disabled. The browser falls back to its software AEC with a stale delay estimate, producing audible echo until the delay line recalibrates, typically within 1β3 seconds.
The core trade-off is hardware versus software AEC. Hardware AEC (negotiated when you request echoCancellation: true) has near-zero CPU cost and accurate latency on a stable route, but it loses its calibration on every device switch. Software AEC is portable but assumes a fixed mic-to-speaker delay; after a switch from a 40 ms Bluetooth path to a sub-5 ms wired path, that assumption is wrong and residual echo leaks through. Forcing a track recreation after a switch costs a getUserMedia round trip (roughly 100β300 ms of mic re-acquisition) but resets AEC cleanly β usually the right call when the route changes drastically. Routing alone via replaceTrack() keeps the jitter buffer and SSRC intact, which matters for not perturbing Bandwidth Estimation & Congestion Control, but does not reset AEC state.
There is a second axis to the decision: focus versus routing. Focus is the OS-level question of which application owns the audio path β when a phone call, a navigation prompt, or another conferencing app grabs communication focus, your track may transition to muted even though the device never physically changed. Routing is the question of which physical endpoint the audio flows to. These are independent, and conflating them leads to spurious teardowns: a muted event from focus loss is recoverable and you should keep the sender alive, whereas an ended event means the device is gone and you must reacquire. The mute-vs-ended distinction is the same one drawn in the parent Audio/Video Track Management guide, and it is doubly important for audio because mobile focus arbitration toggles muted far more often than video sources ever do. Budget for a brief 1β3 s glitch on every transition rather than trying to eliminate it; the recovery target is graceful recalibration, not instantaneous perfection.
Always request echoCancellation: true and noiseSuppression: true explicitly so the browser negotiates hardware AEC instead of silently using a software pipeline with incorrect latency assumptions. Leave autoGainControl: false on high-gain microphones β aggressive AGC can drive the AEC filter into divergence.
Minimal Runnable Implementation
// Acquire audio with explicit AEC constraints, then swap devices on the
// SAME RTCRtpSender β preserving SSRC and jitter buffer (no SDP renegotiation).
async function acquireAndManageAudio(pc) {
const stream = await navigator.mediaDevices.getUserMedia({
audio: {
echoCancellation: true, // negotiate hardware AEC with the platform driver
noiseSuppression: true,
autoGainControl: false, // keep AGC off on high-gain mics to stabilise AEC
deviceId: { ideal: 'default' }
}
});
const sender = pc.getSenders().find(s => s.track?.kind === 'audio');
// React to OS-level device changes (unplug/replug, Bluetooth drop).
navigator.mediaDevices.addEventListener('devicechange', async () => {
const newStream = await navigator.mediaDevices.getUserMedia({
audio: { echoCancellation: true, noiseSuppression: true, autoGainControl: false }
});
const newTrack = newStream.getAudioTracks()[0];
// replaceTrack keeps the stream alive; recreate via getUserMedia (above)
// so AEC re-initialises against the new route instead of reusing stale delay.
await sender.replaceTrack(newTrack);
console.log('audio route swapped, AEC re-initialised');
});
// Mute the mic when the tab is backgrounded to prevent echo accumulation.
document.addEventListener('visibilitychange', () => {
stream.getAudioTracks()[0].enabled = !document.hidden;
});
return stream;
}
Reproduction Steps & Debugging Log Patterns
- Start a call on a mobile device with Bluetooth headphones connected and
echoCancellation: true. - Force-disconnect Bluetooth via OS settings while the call is live; observe routing fall back to speakerphone.
- Poll
getStats()forechoReturnLossEnhancementandtotalAudioEnergyon themedia-sourcereport. Expect ERLE to dip for 1β3 s as software AEC recalibrates against the new delay. - Confirm whether
MediaStreamTrack.mutedtoggled (focus arbitration) rather thanended(device gone) β they demand different recovery. - If echo persists past recalibration, recreate the track with a fresh
getUserMedia()to force AEC re-init.
A note on the setSinkId path for output routing: input device changes go through getUserMedia/replaceTrack as above, but steering playback to a specific speaker is a separate call on the playback HTMLMediaElement, not on the track. Maintain a registry of available sinks from enumerateDevices() (filtering kind === 'audiooutput'), feature-detect setSinkId, and reassign on devicechange. Output routing never touches AEC calibration β the echo path is governed by the input device and the OS mixer β so you can switch sinks freely without recreating tracks.
Expected console / internals output:
// chrome://webrtc-internals audio processing graph, or about:webrtc on Firefox:
// AEC: Hardware AEC disabled, falling back to WebRTC APM
// AudioDeviceModule: Audio delay compensation applied: 120ms
// AEC: Divergence detected, resetting filter
// MediaStreamTrack: muted state changed to true (focus lost)
Platform-Specific Routing Behaviour
The same code produces materially different routing on each platform, and knowing the defaults saves hours of guesswork. On Android, requesting getUserMedia with audio implicitly puts the device into communication mode, which biases routing toward the earpiece or the connected headset and engages hardware AEC tuned for voice. Disconnecting a Bluetooth device hands the route back to the speakerphone, and the brief gap is where hardware AEC drops out. On iOS, Safari drives AVAudioSession and you have no direct API to pin the route β the OS decides, and a pagehide or incoming phone call will preempt your session entirely; your only lever is muting via enabled on visibilitychange and reacquiring on return. On desktop Chrome and Firefox, routing is far more deterministic: enumerateDevices exposes stable deviceId values you can pin with deviceId: { exact: ... }, and setSinkId reliably steers output, so device-switch logic that works on desktop will still need the mobile focus-handling described above before it ships.
A consequence of these differences is that you cannot test echo behaviour purely on desktop. A wired headset on a laptop almost never exposes the AEC recalibration glitch, because the mic-to-speaker path barely changes. Reproduce on a real handset with a Bluetooth-to-speaker transition to see the 1β3 s ERLE dip and confirm your recovery path. Tie this telemetry into the broader Media Handling, Codecs & Bandwidth Estimation observability you already run so audio regressions surface in the same dashboards as video and bandwidth ones.
Common Implementation Mistakes
- Double AEC. Running a custom processing chain while the browser also runs its AEC causes comb filtering and metallic echo. Pick one β prefer the browserβs negotiated hardware AEC.
- Routing instead of resetting after a drastic switch.
replaceTrack()preserves the stale software-AEC delay line; a Bluetooth-to-wired change needs a freshgetUserMedia()to recalibrate. - Ignoring
visibilitychange/pagehide. Background routing conflicts accumulate; iOS suspends MediaStream processing in background tabs, so mute the track on hide. autoGainControl: trueon a high-gain mic. Drives the AEC filter into divergence and intermittent echo.- Calling
setSinkIdwithout feature detection. Firefox before 116 and Safari before 17 throwTypeError; detect withtypeof el.setSinkId === 'function'and fall back to system output.
FAQ
Why does echo return when switching from Bluetooth to wired headphones during a call?
The OS reinitialises the routing path and temporarily disables hardware AEC. The browserβs software AEC keeps its old delay estimate, which is wrong for the new, much shorter wired path, so echo leaks until it recalibrates in roughly 1β3 seconds. Recreating the audio track with a fresh getUserMedia() forces immediate recalibration.
How can I verify hardware echo cancellation is actually active?
Set echoCancellation: true, then call track.getSettings() and check the echoCancellation field. true confirms AEC is on but does not distinguish hardware from software; inspect the chrome://webrtc-internals audio processing graph for the definitive answer.
Does iOS Safari support audio focus management?
iOS Safari enforces AVAudioSession routing and requests communication mode automatically, but you must handle pagehide/visibilitychange to mute tracks when backgrounded, since iOS suspends Web Audio and MediaStream processing in background tabs.
Should I switch input devices with replaceTrack or by recreating the track?
Use replaceTrack when the route change is mild and you want to preserve the jitter buffer and SSRC continuity. Recreate the track with a fresh getUserMedia when the physical path changes substantially β Bluetooth to wired, or speakerphone to handset β because only re-acquisition forces the software AEC to recalibrate its delay line against the new latency.
Related: return to Audio/Video Track Management, or compare with Replacing Video Tracks Without Renegotiation and the Media Constraints & Device Enumeration guide.