Audio/Video Track Management in WebRTC

A RTCPeerConnection is only as stable as its tracks. Every camera swap, microphone mute, headset unplug, and OS-level interruption flows through the MediaStreamTrack → RTCRtpSender → RTCRtpTransceiver chain, and getting the wrong API at the wrong moment produces black frames, zombie senders consuming bandwidth, or a renegotiation storm that stalls media for seconds. This guide is part of the Media Handling, Codecs & Bandwidth Estimation guide, and covers the full track lifecycle for production WebRTC: attaching and replacing tracks, controlling transceiver direction, distinguishing mute from disable, handling ended events, and keeping the remote renderer stable through all of it.

The audience here is engineers shipping multi-party calling, screen sharing, or device-switching UIs who have hit the difference between track.enabled = false and sender.replaceTrack(null) the hard way. The implementation goal is a track layer that survives hardware hot-swaps and source changes without dropping streams or forcing a new offer/answer exchange unless one is genuinely required.

Track lifecycle and the sender/receiver/transceiver triad A live track can transition to muted and back, or terminate at ended; a transceiver pairs one sender and one receiver and carries a direction attribute. MediaStreamTrack lifecycle live frames flowing muted source paused ended terminal mute unmute Transceiver pairs one sender + one receiver RTCRtpTransceiver (direction: sendrecv) RTCRtpSender replaceTrack() outbound, keeps SSRC RTCRtpReceiver remote track fires ontrack
Track states and the transceiver that binds a sender to a receiver.

Step 1 — Attaching tracks with addTrack and replaceTrack

There are two ways media enters a peer connection, and they have very different renegotiation consequences. pc.addTrack(track, stream) creates a new RTCRtpSender (reusing a recvonly transceiver if one is free, otherwise minting a fresh one) and fires negotiationneeded, requiring a new offer/answer. sender.replaceTrack(newTrack) swaps the media source on an existing sender, keeps the SSRC and the negotiated codec, and does not trigger renegotiation — making it the correct tool for swapping a camera for a screen share, which is detailed in Replacing Video Tracks Without Renegotiation.

// First track in: addTrack mints a sender and fires negotiationneeded.
const [videoTrack] = localStream.getVideoTracks();
const sender = pc.addTrack(videoTrack, localStream); // → renegotiation required

// Later source change: replaceTrack reuses the sender, no SDP exchange.
const newTrack = screenStream.getVideoTracks()[0];
await sender.replaceTrack(newTrack); // SSRC preserved, no offer/answer

Attach every track you know about before generating the first offer. Adding tracks one at a time after the connection is live produces a separate negotiationneeded for each, and on slow signaling paths these can collide into glare. Cap getUserMedia constraints early so the encoder never sees a resolution it must immediately scale down; coordinate those caps with your Adaptive Bitrate Streaming in WebRTC targets so the negotiated bitrate ceiling matches the source. A common batching pattern is to acquire camera and microphone in a single getUserMedia call, iterate the resulting MediaStream, and addTrack each track synchronously inside one event-loop turn — most browsers coalesce the resulting negotiationneeded into a single fire, so you exchange one offer/answer for the whole bundle instead of two or three.

A subtlety: replaceTrack(null) removes the media source while leaving the sender, transceiver, and m-line in place. This is the cheapest possible “stop sending video” — the receiver sees the track go silent without any SDP churn. Use it instead of removeTrack() when you intend to resume on the same transceiver shortly. By contrast, removeTrack(sender) clears the sender’s track and fires negotiationneeded; the m-line is flipped to recvonly in the next offer rather than deleted (m-lines are never removed, only recycled), so reaching for removeTrack to “free a slot” is usually a mistake — you pay for renegotiation and the slot lingers anyway. Reserve addTrack/removeTrack for genuine structural changes and let replaceTrack handle the day-to-day source churn.

Step 2 — Controlling transceiver direction

Each RTCRtpTransceiver carries a direction attribute — sendrecv, sendonly, recvonly, or inactive — and the negotiated result is currentDirection. Changing direction fires negotiationneeded; it is how you stop or start a media flow at the SDP level rather than the track level. Set recvonly to keep receiving while you stop sending, or inactive to pause both directions without removing the m-line.

// Explicitly add a transceiver and control its direction.
const transceiver = pc.addTransceiver('video', { direction: 'sendrecv' });

// Stop sending but keep the slot for a later resume → renegotiation.
transceiver.direction = 'recvonly';

// Read what was actually negotiated after the answer is applied.
console.log('negotiated:', transceiver.currentDirection); // e.g. "recvonly"

The distinction that trips people up: direction is your request, currentDirection is the result. If you ask for sendrecv but the remote answers recvonly, your currentDirection becomes sendonly. Always read currentDirection — never direction — when deciding whether media is actually flowing. Reusing transceivers via direction changes is far cheaper than addTrack/removeTrack cycles and avoids the m-line ordering hazards covered in Debugging SDP m-line Mismatches, because the m-line count stays constant.

There is also a sequencing rule worth internalising: a transceiver created recvonly or inactive has no sender track yet, so you must replaceTrack and flip direction to start sending. Flipping direction alone with no track produces a sendrecv m-line that carries no media — the remote negotiates a send slot, allocates a decoder, and waits on an SSRC that never appears. The reverse ordering (set the track first, then the direction) is the safe one. When you pre-allocate transceivers up front — a common pattern for fixed-layout conferences where every participant slot is reserved before anyone joins — initialise them inactive, then promote each to sendrecv with replaceTrack as real media arrives. This keeps the m-line section of every offer identical across participants, which is exactly what makes SDP Renegotiation Without Dropping Streams tractable at scale.

Step 3 — Mute, enabled, and what the remote actually sees

There are three separate “off” states and conflating them causes most track bugs. track.enabled = false is a local gate: the track stays live, but the browser replaces its output with black frames (video) or silence (audio) and keeps sending RTP — bandwidth is not freed, and the remote sees a frozen-to-black image, not a paused stream. The muted property is read-only and reflects the source being unable to produce data (OS took the mic, camera in use by another app); you cannot set it. To genuinely stop sending and free bandwidth, use replaceTrack(null) or flip the transceiver to recvonly/inactive.

// Local UI mute: cheap, instant, but keeps the RTP stream alive (black/silence).
function setMuted(track, muted) {
  track.enabled = !muted; // remote sees black frames / silence, bandwidth unchanged
}

// True bandwidth release on the same sender, no renegotiation:
async function stopSendingVideo(sender) {
  await sender.replaceTrack(null); // RTP stops; transceiver and m-line remain
}

The rule of thumb: enabled for a transient mic-mute button where you want instant resume and don’t care about the few hundred kbps of black-frame RTP; replaceTrack(null) when the stream will be off long enough that wasting bandwidth matters, or when you want the remote’s bitrate estimator to recover that capacity. The remote side detects enabled = false only as a content change (black/silent), while replaceTrack(null) surfaces as the receiver track muting.

This distinction matters for congestion control as much as for UX. When ten participants each leave a muted-but-enabled video track alive, the SFU is still forwarding ten black-frame streams, and the per-subscriber estimator never reclaims that headroom — the symptom is an availableOutgoingBitrate that stays artificially low even though nobody is actually transmitting useful video. Routing real mutes through replaceTrack(null) or inactive lets the estimator recover the capacity within a getStats poll or two (sampling at the usual 1 s interval). On the audio side the calculus differs: an enabled = false audio track with Opus DTX still sends comfort-noise frames at a handful of kbps, which is negligible, so enabled is almost always the right mute for microphones and replaceTrack(null) the right one for cameras. Document this split in your call-control layer so a “mute” button maps to the correct primitive per track kind.

Step 4 — Verification: ended events and renderer stability

MediaStreamTrack fires ended when its source terminates permanently — the user unplugs a USB camera, the OS revokes the device, or the screen-share picker is dismissed. Unlike muted, ended is terminal: the track will never produce frames again, and ignoring it leaves a sender transmitting nothing while the remote stares at a frozen last frame. Bind the listener at attach time, push a signaling notification, and attempt reacquisition.

function watchTrack(track, sender) {
  track.addEventListener('ended', async () => {
    console.warn(`track ${track.id} ended (source gone)`);
    // Free the RTP stream immediately so the remote stops waiting on a dead source.
    await sender.replaceTrack(null);
    // Then attempt reacquisition or signal the peer to update its UI.
  });
}

On the remote, renderer stability hinges on never reassigning videoElement.srcObject when only the track changed. Because replaceTrack keeps the same SSRC, the remote MediaStreamTrack object stays identical — the <video> element keeps rendering through the swap with no flicker. Only ontrack (a genuinely new receiver) should cause you to touch srcObject. Verify the swap worked by polling getStats() for outbound-rtp.framesEncoded climbing and track.muted === false on the receiver. Frozen remote video despite a successful local replaceTrack almost always means the renderer rebound srcObject and lost the decode pipeline, or the encoder is starved by Bandwidth Estimation & Congestion Control backpressure rather than a track fault.

Edge Cases & Browser Quirks

Common Implementation Mistakes

FAQ

When should I use replaceTrack() versus toggling track.enabled?

Use replaceTrack() for any physical source swap (camera change, camera-to-screen) because it preserves the SSRC and skips SDP renegotiation entirely. Use track.enabled only for a transient mute where you want instant resume and accept that black-frame RTP keeps flowing. For a true bandwidth-freeing stop, replaceTrack(null) is the correct middle ground.

Does adding a track require an ICE restart?

No. addTrack fires negotiationneeded, which needs a new offer/answer exchange but reuses the existing ICE and DTLS transport. Reserve ICE restarts (createOffer({ iceRestart: true })) for genuine network topology changes or a connectionState of failed — not for track or direction changes.

Why does the remote video freeze after a successful local replaceTrack?

Almost always the remote renderer rebound srcObject when it didn’t need to, or the encoder is starved by congestion control rather than the swap failing. Confirm outbound-rtp.framesEncoded is climbing locally; if it is, the fault is on the remote render path, not the track layer.

What is the difference between muted and ended?

muted is a recoverable, read-only state meaning the source temporarily cannot produce frames (OS grabbed the device, tab backgrounded on iOS). ended is terminal — the source is gone for good. Never tear down a sender on muted; always handle ended.

Related: return to the Media Handling, Codecs & Bandwidth Estimation guide, or dive into Managing Audio Focus & Echo Cancellation Across Devices, Replacing Video Tracks Without Renegotiation, and SDP Renegotiation Without Dropping Streams.