WebRTC over CGNAT (Carrier-Grade NAT)
Carrier-grade NAT (CGNAT, RFC 6598) sits between millions of mobile and broadband subscribers and the public internet, sharing a small pool of public IPv4 addresses across thousands of customers. This guide is part of the ICE Candidate Gathering & Filtering guide, and it addresses one decision: how to keep WebRTC connections alive over CGNAT, where bindings expire fast, ports run out, and direct paths frequently never form.
Context & Trade-offs
CGNAT magnifies every NAT problem. Two properties dominate. First, binding lifetime: to conserve table space across thousands of subscribers, carriers age out idle UDP mappings aggressively — often in under 30 seconds, sometimes as low as 20 s. A srflx candidate discovered at call setup can be dead before connectivity checks finish, and a connection that goes briefly idle can lose its mapping mid-call. Second, port exhaustion: with thousands of subscribers behind one public IP, the carrier may run a per-subscriber port budget. Under load, new mappings get refused, so additional candidate gathering or a fresh allocation simply fails.
Most CGNAT deployments are also symmetric, which means STUN srflx candidates rarely produce a usable direct path — the same failure mode covered in Traversing Symmetric NAT with TURN. The practical consequence: assume a relay will be needed, keep mappings warm with frequent keepalives, and design for fast re-establishment rather than fighting for a direct path.
The trade-offs are concrete. Sending consent/keepalive traffic every 5–15 s holds the binding open at the cost of a trickle of background bandwidth and battery on mobile. Routing through a TURN Server Configuration & Auth relay adds 20–40 ms of one-way latency but converts a near-certain failure into a reliable call. Skipping keepalives saves battery but invites a silent drop the moment the user stops talking.
It is worth separating the two mechanisms that keep a CGNAT call alive, because they fail differently. WebRTC’s built-in ICE consent freshness (RFC 7675) sends a STUN binding request on the nominated pair roughly every 5 s and tears the connection down if it gets no response for ~15 s — that protects the active media path. But consent only runs on the pair carrying media; a paused or muted call can still let the underlying UDP mapping age out faster than consent notices, especially when the OS suspends the radio. An application-level keepalive on a data channel forces actual packets through the mapping on a schedule you control, independent of whether audio/video is flowing. The cleanest design uses both: rely on consent freshness for liveness detection, and add a short-interval data-channel heartbeat to keep the NAT binding warm during silence. When the mapping is lost anyway — port exhaustion, a radio handoff, a genuinely long idle — an ICE restart re-gathers and re-nominates without dropping the session, which is far cheaper than a full renegotiation or a user-visible reconnect.
Minimal Runnable Implementation
const pc = new RTCPeerConnection({
iceServers: [
{ urls: 'stun:stun.l.google.com:19302' },
{
urls: [
'turn:turn.example.com:3478?transport=udp',
'turns:turn.example.com:5349?transport=tcp' // survives UDP-hostile carriers
],
username: 'time-limited-user',
credential: 'base64-hmac-token'
}
],
iceTransportPolicy: 'all', // try direct first; ICE falls back to relay on CGNAT
bundlePolicy: 'max-bundle',
rtcpMuxPolicy: 'require'
});
// WebRTC sends STUN consent checks every ~5 s automatically, but an idle
// data channel keeps the binding hot below the carrier's <30 s aging timer.
function startKeepalive(pc) {
const dc = pc.createDataChannel('keepalive', { negotiated: true, id: 0 });
dc.onopen = () => {
const timer = setInterval(() => {
// 1-byte heartbeat well under the 30 s binding lifetime
if (dc.readyState === 'open') dc.send('�');
else clearInterval(timer);
}, 10000); // 10 s interval: safe margin under a 20–30 s CGNAT timeout
};
}
// On a dropped binding, re-gather rather than tearing the call down
pc.oniceconnectionstatechange = () => {
if (pc.iceConnectionState === 'disconnected') {
pc.restartIce(); // refreshes mappings; cap retries at 3
}
};
Set the keepalive interval to roughly half the observed binding lifetime — 10 s is a safe default against a 20–30 s timeout. Always offer a TLS TURN endpoint on 5349 (or 443) because some carriers throttle or block raw UDP.
Reproduction Steps & Debugging Log Patterns
- Place a client on a mobile carrier known to use CGNAT and establish a call, then stop all media/data for 35 s.
- Poll
pc.getStats()at 1 s intervals and watch the nominatedcandidate-pairforconsentRequestsSentrising andresponsesReceivedstalling. - Observe
iceConnectionStateflip todisconnectedshortly after the binding ages out, then watch whetherrestartIce()recovers it. - Re-run with a 10 s keepalive enabled and confirm the binding survives the idle window.
Expected log on binding expiry without keepalive:
// t+0s candidate-pair (relay/srflx) state: succeeded nominated: true
// t+28s consentRequestsSent: 6 responsesReceived: 4 <- mapping aging out
// t+31s iceConnectionState: disconnected
// t+31s restartIce() -> iceConnectionState: checking -> connected
If restartIce() cannot recover and you see no new relay candidate, suspect port exhaustion on the carrier — the allocation request is being refused. Fall back to the already-established TLS relay rather than gathering fresh candidates.
Common Implementation Mistakes
- No keepalive on idle connections. A muted, paused call goes silent, the CGNAT binding ages out in under 30 s, and the next packet is dropped — the user perceives a random disconnect.
- Keepalive interval too long. A 30 s interval against a 25 s timeout still loses the binding; set it to roughly half the lifetime.
- STUN-only configuration. CGNAT is usually symmetric; without a relay, direct paths fail and there is nothing to keep alive.
- Tearing down on the first
disconnected.disconnectedis often transient;restartIce()recovers most CGNAT mapping losses without a full renegotiation. - Ignoring port exhaustion. Repeatedly re-gathering under a carrier port budget makes things worse; reuse the existing relay allocation instead.
FAQ
How short can a CGNAT binding lifetime really be?
Commonly 30–120 s for UDP, but aggressive carriers age idle mappings in under 30 s — some near 20 s. Always assume the worst and keep mappings warm.
Does a keepalive drain mobile battery noticeably?
A 1-byte heartbeat every 10 s is negligible compared to active media. The radio is already awake during a call; the cost only matters for long-idle background connections, where you can stretch the interval slightly.
Why does my call work on Wi-Fi but fail on cellular?
Home Wi-Fi is usually a single cone NAT with generous timeouts; cellular is symmetric CGNAT with short binding lifetimes. Provision TURN and keepalives specifically for the cellular path.
Related: return to ICE Candidate Gathering & Filtering, and see Traversing Symmetric NAT with TURN and ICE Candidate Trickle vs Bulk Gathering.