The ThousandEyes team spent several days last week at Interop monitoring the health of InteropNet, the volunteer-built network that powers the conference. We set out to instrument InteropNet in order to monitor the health of services used by conference attendees and vendors. Over the course of the week while the network was up and humming, we gathered performance data and dug into network problems. Let’s take a look inside InteropNet to see how it performed.
Instrumenting InteropNet
Gathering performance data for InteropNet was a relatively simple exercise. Our in-house InteropNet guru and ThousandEyes solution engineer Ken Guo installed six agents to generate tests and record performance data. These six agents, virtual appliances running on Dell servers, were distributed across InteropNet. Four were located on VLANs that served the exhibit hall and conference rooms (PEDs). Two were located in InteropNet data centers, one in Sunnyvale and the other in Las Vegas. With these six agents deployed, we were able generate a view of the InteropNet topology.
Testing InteropNet Performance
We set up a number of tests that actively probed services on InteropNet, including:
- From InteropNet to key services: mobile app, registration server, social media sites, Salesforce.com and Webex
- From InteropNet to the LV and SFO data centers as well as EWR and DEN edges
- From POPs around the US to the Interop registration site, website and CenturyLink circuits
- Local DNS resolver
- Authoritative DNS server for Interop.com and Interop.net
- Interop BGP prefixes
Troubleshooting Critical Services
While at Interop we were on the lookout for service interruptions. One that we noticed occurred with Salesforce, popular with sales and BD folks at the show. We noticed two periods where Salesforce was unavailable from InteropNet, each lasting up to 10 minutes in length.
The lack of Salesforce availability corresponded with high levels of packet loss and latency on the path between InteropNet and Salesforce. Packet loss averaged 57% and latency jumped to 2 seconds.
When we drilled into the path visualization between InteropNet and Salesforce.com we immediately saw the culprit. InteropNet’s primary ISP, CenturyLink, peers with Comcast Business Network en route to Salesforce data centers in California. The two spikes in packet loss coincide with traffic dropping on the San Jose edge between CenturyLink (Qwest) and Comcast.
We reverse back in time by 30 minutes to see how these nodes were performing when availability was unaffected. At this point we see that CenturyLink and Comcast Business are peering in San Jose without issue.
From this information, we can conclude that the two service interruptions of Salesforce.com on InteropNet were caused by changes occuring at the peering point between CenturyLink and Comcast in San Jose. In this particular case, the network hiccups occurred when most attendees were not likely using the show network. But having visibility allowed the InteropNet team to monitor for problems as they arose throughout the week.
Interop Show Network: Viewing InteropNet’s Autonomous System
We also monitored Border Gateway Protocol (BGP) routing to InteropNet over the course of the conference in order to gain visibility into any routing issues that might occur. BGP defines the preferred routes that traffic will take from networks around the internet to InteropNet, as identified by Interop Show Network Autonomous Systems (AS 290 and 53692). These two networks have routes via CenturyLink (Qwest) (AS 209), the primary ISP, to the rest of the Internet.
It’s a Wrap
By now InteropNet has been torn down, only to be rebuilt again next year. In the end, InteropNet performed beautifully. Performance to key applications was speedy. Service interruptions were minimal. We had a blast helping out the InteropNet team build a network from scratch!