Product News
Announcing Cloud Insights for Amazon Web Services

Engineering

Limitations of ICMP-Based Network Measurements

By Thom Haddow
| | 6 min read

Summary


When manually diagnosing network performance issues, there are several techniques that can rely on ICMP probing, e.g. measuring round trip time, measuring end-to-end loss, tracerouting, etc. In this article we will show that ICMP introduces some bias in these measurements, and that by using TCP-based probing we are able to overcome most of the limitations of ICMP.

ICMP probes can be routed differently than TCP traffic

The diagram below shows a visualization of three routes to a US-based destination of a popular website, as measured using a multipath-aware version of Traceroute using ICMP probes (refer to our previous post to understand multipath bias in traceroute).

ICMP probes not experiencing multipath routing
Figure 1: Three complete ICMP paths to a remote destination host, color coded by network.

ICMP measurement suggests a simple underlying network topology, with paths from different locations converging as they enter the network of the service provider. All intermediate nodes responded to ICMP-based measurement probes, suggesting we have a complete view of the network topology. However, if we switch to using TCP-based measurement probes, a different picture emerges:

Multipath routing discovered by TCP-based probing
Figure 2: The same three paths as measured using TCP-based probes.

Here we can see the pervasiveness of multipath routing within the Internet core. Nodes highlighted in blue represent a tier-1 backbone network which deploys multipath routing extensively. Additionally, the uplink hops from the San Francisco based node represent a different major network service provider which employs multipath routing. Both measurement processes used above were multipath aware, but the routes discovered using TCP probes were significantly different. We can conclude from this that network routers may not generally perform the same multipath routing for ICMP packets as they do for TCP packets, which can result in completely separate routes being taken by TCP connections compared to those of ICMP diagnostic probes.

The effect of such a process is that measurements of end-to-end performance metrics such as latency, jitter, loss or bandwidth using ICMP-based probes will only measure the properties of the ICMP path alone. Performance problems that exist upon the paths traversed only by TCP packets cannot be observed using ICMP probes, which may lead to misdiagnosis of network issues.

ICMP traffic is often rate-limited

ICMP probes are designed for network diagnostics so are often subject to rate throttling limitations. For example, comparing the results of a bandwidth estimation algorithm which can use either TCP or ICMP-based probes leads to discrepancies in the results between the two protocols. Measuring bandwidth to 100 independent destinations, we found that ICMP probes were either throttled or blocked in 83% of cases.

Pie graph comparing throttling between protocols
Figure 3: ICMP probes are much more prone to being throttled than TCP probes.

When ICMP-based probing is needed

There are cases where ICMP probing is needed: First, some intermediate nodes (as shown in white above) do not respond to TCP probes, e.g. typically network devices. We can infer their existence based on the hop-count of subsequent probes but they are otherwise passive, leading to incomplete path information. Second, aggressive firewall configurations may be more restrictive to TCP-based probes than to diagnostic ICMP measurements, and may block the TCP reset packets needed to discover that probes have reached their destination. Both of these limitations can also apply in the inverse too, where ICMP probes are passively discarded by intermediate nodes or firewalls, where TCP probes would otherwise be permitted.

From the above, it is clear TCP-based probing should be used whenever possible due to the bias introduced by ICMP measurements. While ICMP is sometimes better supported, for example by intermediate network devices not supporting TCP connectivity, it will often present an incomplete view of network topology and is more likely to be dropped or throttled by network destinations. In practice, we can see that TCP is much more representative of the network conditions experienced by real applications.

Subscribe to the ThousandEyes Blog

Stay connected with blog updates and outage reports delivered while they're still fresh.

Upgrade your browser to view our website properly.

Please download the latest version of Chrome, Firefox or Microsoft Edge.

More detail