ICE Candidate Gathering & Filtering: Architecture, Configuration & Debugging
Real-time connectivity hinges on deterministic ICE candidate generation, strict filtering, and tightly synchronised exchange. This guide is part of the WebRTC Protocol Stack & Signaling Servers guide, and it provides a step-by-step implementation path for production deployments: how the ICE agent discovers candidates, how to filter and prioritise them, how to transmit them over your signalling channel, and how to verify the result under real NAT topologies. The goal is a connection that reaches iceConnectionState === 'connected' quickly and predictably, with explicit handling for browser constraints and network fallbacks rather than the hopeful defaults most demos ship.
ICE (Interactive Connectivity Establishment, RFC 8445) sits between session negotiation and media transport. It enumerates every plausible network path between two peers, probes them in priority order, and nominates the best working pair. Get the gathering and filtering policy right and you collapse Time-to-First-Frame; get it wrong and you ship a product that βworks on my laptopβ but fails on cellular, in enterprises, and behind carrier-grade NAT.
Candidate Types & Discovery Flow
ICE produces four candidate types, each representing a different vantage point on the network. Host candidates come from local interfaces. Server-reflexive (srflx) candidates expose your public NAT mapping via STUN. Peer-reflexive (prflx) candidates are discovered mid-connectivity-check when a packet arrives from an address neither side advertised. Relay candidates are allocated on a TURN server and used when no direct path exists.
The priority each candidate receives follows the RFC 8445 formula priority = (2^24 Γ type_pref) + (2^8 Γ local_pref) + (256 β component_id). Host candidates carry the highest type preference, then srflx, then relay β so direct paths are always tried before falling back to a relay you pay for.
The distinction matters operationally because each type has a different cost, reliability, and privacy profile. Host candidates are free and instant but only work when both peers share a routable network or sit behind the same NAT. Server-reflexive candidates cost a single STUN round trip and work for the large majority of home and small-office NATs, but fail against symmetric NAT. Peer-reflexive candidates cannot be gathered ahead of time β they only materialise during connectivity checks when a STUN binding request arrives from a transport address neither peer advertised, which commonly happens when a NAT remaps a port β so you never configure them, you only observe them in getStats(). Relay candidates always work but route every packet through infrastructure you operate and pay for, adding latency. A correct deployment gathers all four and lets ICE nominate the cheapest pair that survives connectivity checks.
| Type | Source | Latency added | Survives symmetric NAT |
|---|---|---|---|
| host | local interface | none | only if same network |
| srflx | STUN binding | one RTT to STUN | no |
| prflx | discovered mid-check | none | sometimes |
| relay | TURN allocation | 20β40 ms one-way | yes |
Step 1 β Map Candidate Discovery Phases
The ICE agent systematically probes local interfaces and external servers to build a connectivity matrix. The phases run concurrently once setLocalDescription() resolves.
- Host candidates: Enumerate local interfaces (Wi-Fi, Ethernet, loopback). Disable loopback in production to prevent local-only routing and IP leakage.
- Server-reflexive (srflx): Issue STUN binding requests to discover the public IP:port your NAT assigned. This is the cheapest path that survives most home and small-office NATs. Provisioning is covered in STUN Server Deployment Strategies.
- Relay (relay): Allocate TURN sessions for symmetric NAT traversal or when UDP is wholly blocked. Configuration and credential handling live in TURN Server Configuration & Auth.
- mDNS handling: Modern browsers (Chrome, Safari) obfuscate local IPs with
.localmDNS hostnames for privacy. Accept them as-is; do not attempt to resolve or strip them client-side.
// Pre-warm the candidate pool so gathering overlaps SDP creation
const pc = new RTCPeerConnection({
iceServers: [
{ urls: 'stun:stun.l.google.com:19302' }, // srflx discovery
{ urls: 'turn:turn.example.com:3478', username: 'u', credential: 'p' } // relay
],
iceCandidatePoolSize: 4, // gather host+srflx eagerly before offer is created
bundlePolicy: 'max-bundle', // multiplex all m-lines onto one component
rtcpMuxPolicy: 'require' // mandatory in modern browsers
});
Step 2 β Apply Filtering & Priority Algorithms
Not every discovered path is viable, and some are actively harmful (leaking a VPN interface, routing over an unreachable IPv6 link-local address). Enforce transport policy before signalling begins.
- Transport policy: Set
iceTransportPolicy: 'relay'for compliance-heavy environments. This drops host and srflx candidates entirely, routing all traffic through your TURN infrastructure. Expect 20β40 ms of added one-way latency in exchange for predictable, auditable paths. - Dual-stack filtering: Drop IPv6 link-local (
fe80::/10) candidates β they never traverse NAT and only waste connectivity checks. If your infrastructure lacks symmetric IPv6 routing, prefer IPv4 to avoid asymmetric packet loss. The nuances are in IPv6 Dual-Stack ICE Handling. - Component mapping: Enable BUNDLE (
bundlePolicy: 'max-bundle') to multiplex all media over a single component, halving candidate-pair combinations and TURN allocations.
// Filter candidates as they are gathered, before they hit the wire
pc.onicecandidate = (event) => {
const c = event.candidate;
if (!c) return; // null = end-of-gathering, not a candidate
if (/fe80::/i.test(c.address || c.candidate)) return; // drop IPv6 link-local
signalingChannel.send(JSON.stringify({
type: 'candidate',
candidate: c.candidate,
sdpMid: c.sdpMid,
sdpMLineIndex: c.sdpMLineIndex
}));
};
Step 3 β Synchronise Signalling Exchange
Filtered candidates must be transmitted without blocking the SDP Offer/Answer Lifecycle. A robust WebSocket Signaling Implementation ensures out-of-order delivery and state transitions are handled gracefully, with delivery typically under 10 ms.
Buffer incoming candidates if the remote SDP has not yet been applied, then flush them on setRemoteDescription() resolution to avoid InvalidStateError.
let pendingCandidates = [];
async function handleIncomingCandidate(init) {
if (pc.remoteDescription) {
await pc.addIceCandidate(new RTCIceCandidate(init)); // safe: remote desc set
} else {
pendingCandidates.push(init); // buffer until ready
}
}
async function onRemoteDescriptionSet() {
for (const c of pendingCandidates) {
await pc.addIceCandidate(new RTCIceCandidate(c)); // flush in arrival order
}
pendingCandidates = [];
}
Ordering is the subtle part. Candidates can arrive at the remote peer before the offer/answer exchange has fully settled, so a queue that buffers until remoteDescription is set β then flushes in arrival order β is mandatory, not optional. Out-of-order or dropped candidate messages degrade gracefully (ICE simply tries fewer pairs) but a candidate applied before the remote description throws InvalidStateError and aborts the negotiation. Keep the signalling channel idempotent: re-delivering the same candidate must be harmless, because at-least-once delivery is far easier to build than exactly-once.
Whether you stream each candidate the moment it arrives or wait for gathering to complete is the single biggest latency lever here β covered in depth below.
Step 4 β Verification
Confirm the connection nominated the pair you expected and that gathering completed without silent failures.
- States progress
new β gathering β complete. Set an explicit timeout (5 s) to abort gathering on unstable networks rather than hanging indefinitely. - Poll
getStats()at 1 s intervals and correlatelocal-candidate/remote-candidatewith the nominatedcandidate-pairto see which path actually carries media. - Trigger re-gathering on network handoffs (Wi-Fi β cellular) via
pc.restartIce(), capped at 3 retries.
async function auditIceStats() {
const stats = await pc.getStats();
for (const r of stats.values()) {
if (r.type === 'candidate-pair' && r.nominated && r.state === 'succeeded') {
// confirm whether the live path is host, srflx, or relay
console.log(`Nominated RTT=${(r.currentRoundTripTime * 1000).toFixed(1)} ms`);
}
}
}
pc.onicecandidateerror = (e) => {
// 701 = STUN/TURN allocate failure, 401 = TURN auth rejected
console.error(`ICE error [${e.errorCode}] ${e.errorText} on ${e.url}`);
};
Section Deep-Dives
Each scenario below has its own focused guide:
- ICE Candidate Trickle vs Bulk Gathering β when to stream candidates incrementally (saving 200β800 ms of Time-to-First-Frame) versus waiting for
complete, with a bulk fallback timeout. - Traversing Symmetric NAT with TURN β why symmetric NAT defeats srflx candidates entirely and forces relay paths, plus the
iceTransportPolicyand TURN config that fixes it. - WebRTC over CGNAT β sub-30-second binding lifetimes, port exhaustion, keepalive tuning, and relay fallback on carrier networks.
- IPv6 Dual-Stack ICE Handling β happy-eyeballs-style pairing, IPv4/IPv6 prioritisation,
fe80link-local filtering, and per-browser differences.
The verification step is also where you catch the most expensive class of bug: a connection that appears to work in development because both peers are on the same LAN (nominating a host pair) but fails in production because the real path needed srflx or relay. Force iceTransportPolicy: 'relay' in at least one CI path so the relay is exercised deterministically rather than only when a tester happens to be behind symmetric NAT. Pair that with getStats() assertions that the nominated pair is the type you expect, not merely that iceConnectionState reached connected.
Edge Cases & Browser Quirks
- Chrome (β₯ 90): Exposes mDNS
.localhost candidates by default unless the page already has camera/mic permission. WithiceCandidatePoolSizeset, host and srflx are pre-gathered before the offer, shaving gathering time. - Firefox: Controls ICE TCP via
media.peerconnection.ice.tcp; relay-only behaviour and candidate ordering can differ from Chromium. Firefox is stricter about emitting the finalnullend-of-candidates signal β rely oniceGatheringState, not just the null event. - Safari (WebKit): Restricts non-standard UDP ports and is conservative about IPv6 candidate generation. Always run TURN on 3478 and TLS 5349 (or 443) so Safari clients behind restrictive firewalls still reach a relay.
- Mobile (all browsers): STUN mappings can refresh in under 30 s on carrier NAT, so srflx candidates gathered early may already be stale by the time the remote peer uses them.
Common Implementation Mistakes
- Hardcoded endpoints: Never bake a single STUN/TURN host into the client. Use environment-aware or geo-routed configuration so each user reaches the nearest relay β multi-region STUN alone cuts connect latency 40β60%.
- Silent failures: Ignoring
onicecandidateerrormasks firewall blocks and expired TURN credentials. Always logerrorCodeandurl. - Over-filtering: Dropping all host candidates behind symmetric NAT without a TURN fallback guarantees connection failure.
- State hangs: Unhandled
iceGatheringStatetimeouts cause indefinite hangs on cellular. Wrap gathering in aPromise.race()with a 5 s deadline. - Premature transmission: Sending candidates before
setRemoteDescription()resolves throwsInvalidStateError. Queue and flush on state change.
FAQ
How do I force WebRTC to use only TURN relays for compliance?
Set iceTransportPolicy: 'relay' in the RTCPeerConnection config. This suppresses host and srflx candidates so all media traverses your audited TURN infrastructure.
Why does media latency spike despite successful signalling?
ICE is likely stuck in gathering or failing candidate-pair validation. Verify UDP 3478 reachability, check firewall rules, and pre-warm candidates with iceCandidatePoolSize: 4.
How many candidates should a typical peer generate?
Usually 3β10: one or two host, one srflx per STUN server, and one relay per TURN allocation. A peer emitting dozens usually has multiple unfiltered interfaces (VPN, virtual adapters) leaking β filter them.
What triggers onicecandidateerror?
STUN/TURN allocation failure (errorCode 701), auth rejection (401), or a blocked port. Log the payload, back off exponentially, and call pc.restartIce() (max 3 attempts) if the connection degrades.
Related: continue with the WebRTC Protocol Stack & Signaling Servers guide, or dig into Traversing Symmetric NAT with TURN, WebRTC over CGNAT, and ICE Candidate Trickle vs Bulk Gathering.