Product News
Announcing Cloud Insights for Amazon Web Services

The Internet Report

How Third-party Issues Led to McDonald’s, DMV Outages

By Mike Hicks
| | 23 min read
Internet Report on Apple Podcasts Internet Report on Spotify Internet Report on SoundCloud

Summary

Third parties appear responsible as outages impact McDonald’s, DMV, Microsoft Outlook, and more.


This is the Internet Report: Pulse Update, where we analyze outages and trends across the Internet, from the previous two weeks, through the lens of ThousandEyes Internet and Cloud Intelligence. I’ll be here every other week, sharing the latest outage numbers and highlighting a few interesting outages. As always, you can read the full analysis below or tune in to the podcast for firsthand commentary.

Internet Report on Apple Podcasts Internet Report on Spotify Internet Report on SoundCloud

 


Internet Outages & Trends

The web of dependencies and interdependencies involved in end-to-end digital service delivery introduces an untold number of potential failure points. Some of these failure points are from direct relationships, while others are from third-party dependencies that introduce layers of abstraction that operations teams have to stay on top of.

Managing this complexity at scale can be challenging, even overwhelming. Unexpected issues can arise from seemingly insignificant components, which can catch even the largest, most technologically sophisticated organizations by surprise.

This past fortnight, some McDonald’s customers out for a late-night snack might have been surprised to find McDonald's ordering and payment systems unavailable. A configuration change initiated by a third-party system supplier had impacted them. 

Speaking of organizations with large customer-facing operations, several Departments of Motor Vehicles across the United States had problems delivering their core functions when a third-party information clearinghouse experienced network issues.

Meanwhile, fiber cuts impacted cloud service traffic in parts of Africa. This fault was potentially difficult to diagnose for some providers, leading to vague early status notifications. Another fiber cut affected ASX Net, a connectivity service for linking organizations to Australian financial markets; however, this fiber cut went largely unnoticed because of the large-scale redundancy that was in place.

In most cases, the path to resiliency is to have redundant services or alternate paths where it makes economic sense and to maintain independent observability across the end-to-end digital delivery chain so that problems can be detected and pinpointed at the earliest signs before larger flow-on effects materialize.

Read on to learn about all the outages and degradations from the past two weeks, or use the links below to jump to the sections that most interest you:


McDonald’s Payment and Ordering Systems Outage

On March 15, McDonald’s experienced a global technology outage at 5 AM (UTC)—10 PM (PDT). While this incident is unlikely to have significantly impacted customers in North America, the timing and global reach of the outage meant that customers in Asia, Oceania, parts of Western Europe, and the UK appeared to be the most impacted.

According to an official statement, the root cause of the outage was a “configuration change” made by a third-party provider. The nature of the provider and the piece of the delivery chain it looked after has not been disclosed. However, reports indicate that restaurants, including McDonald's in-store kiosks or app, were unable to accept digital orders or process digital payments. While there are reports of some stores reverting to manual orders and accepting cash-only payments, this does not appear to have been a universal workaround.

This outage is illustrative of the complexity of end-to-end digital systems in omnichannel operations. When the end-to-end transactional process is completely digitized, retailers—whether in fast food restaurants, apparel, grocery, or other retail categories—rely on a carefully choreographed and synchronized set of software platforms and APIs to deliver a seamless experience to customers across multiple channels.

Each component of that delivery chain must function properly. When a component is sourced from a third-party provider, a degree of reliance is placed on that third party to maintain the resiliency, uptime, and availability of their services, so that nothing downstream of them breaks. In the event that a change on their end triggers an outage condition, the onus falls back on the retailer to be able to look across their omnichannel delivery, at a glance, and pinpoint exactly where in the chain the issue lies. 

Department of Motor Vehicles Disruptions

On the morning of March 21, Department of Motor Vehicles (DMV) locations were down across the United States, impacting driver’s license services, test bookings, and data access. According to reports, it appears that “there was no ability to process messages that support transactions of driver licenses and motor vehicle titles. This prevented a number of motor vehicle agencies from issuing driver licenses and vehicle titles during the outage.”

Based on a report from the Colorado DMV, the cause was a national outage with the American Association of Motor Vehicle Administrators (AAMVA), a third-party “information clearinghouse” whose network connects DMVs to “various verification services.” According to reports, AAMVA attributed the issue to a “loss of cloud connectivity” and said that it was working with its cloud providers on a resolution.

Undersea Cable Cuts in West Africa

On March 14, at approximately 10:30 AM (UTC), ThousandEyes observed a significant increase in traffic loss for some routes between Europe and South Africa. The varying nature of the loss, as well as increased network latencies, were consistent with congestion due to capacity constraints.

The root cause soon became clear: large-scale undersea fiber cuts along the west coast of Africa disrupted connectivity to numerous services, such as Microsoft Outlook and others.

Screenshot showing multiple services impacted
Figure 1. Impacts on performance and service access were experienced across multiple countries in Africa

ThousandEyes data shows that the issues impacted multiple countries in Africa, with the duration of the issues varying considerably between countries, from under an hour to half a day. ThousandEyes observed connectivity issues, including traffic loss and increased latency, affecting users across Africa and global users whose traffic traversed affected paths.

Screenshot showing increased latency on the AWS network
Figure 2. Latency to the AWS network in South Africa increased for users globally

ThousandEyes also observed what appeared to be attempts to redistribute load and use spare capacity by redirecting traffic across an alternate path during the incident. In some cases, enterprise customers in some countries appeared to work around the breakage by turning to terrestrial routes to carry traffic to the north of Africa, where it could be handed off to active subsea infrastructure.

By March 16, congestion conditions had been resolved for most routes, although higher network latencies continued to be observed. This indicates that some traffic may have continued to traverse sub-optimal paths, thereby impacting users.

ASX Net Fiber Cut

On March 19 (March 20 in Australia), a third-party fiber cut impacted a number of primary circuits for ASX Net, a connectivity service for linking organizations into Australian financial markets. While it took a day and a half to repair the fiber and fully resolve the issue, it appears that no customer services were significantly impacted as backup and redundant circuits were in place for users of the ASX Net services.

This is an example of how physical damage to network cables does not always impact connectivity. With sufficient alternate connections and capacity available—particularly for crucial services such as those connected to trading and financial markets access—an issue with one physical cable can be seamlessly routed around.

PlayStation Network, GeForce Issues

Cloud gamers experienced issues with two different services on March 21. The issues do not appear to be linked, aside from both impacting different cloud-based gaming services around the same time.

According to reports, PlayStation users may have encountered a variety of error messages on March 21, due to an up to seven-hour partial outage of the PlayStation Network. Existing users who were connected to the PlayStation Network appeared to be better off than ones who had to re-authenticate for access. That latter cohort of users saw messages attributing some of the problems to “a communication error with PlayStation Network.” Others reportedly encountered issues when trying to access content and the web version of the PlayStation store.

Screenshot showing availability issues for the PlayStation Store
Figure 3. Accessibility issues impacting users globally connecting to the PlayStation store

Meanwhile, at about the same time, Nvidia’s cloud gaming service, GeForce Now, also experienced an issue that prevented users from accessing game streams on their devices.


By the Numbers

In addition to the outages highlighted above, let’s close by taking a look at some of the global trends ThousandEyes observed across ISPs, cloud service provider networks, collaboration app networks, and edge networks over the past two weeks (March 11-24):

  • After a continuous decline since early February, there was a temporary increase in global outages before they returned to the downward trend. Between March 4 and March 10, the number of outages increased from 142 to 206, a 45% rise. The subsequent week (March 18-24) saw a decrease of 20%.

  • The United States followed a similar pattern. Outages increased 38% from 63 to 87, before decreasing by 33% the following week (March 18-24).

  • Between March 11 and March 24, only 39% of all recorded outages were observed in the United States. This represents a deviation from the long-standing trend of at least 40% of all outages being U.S.-centric.

Bar chart depicting global and US outages over the past eight weeks
Figure 4. Global and U.S. network outage trends over the past eight weeks

Subscribe to the ThousandEyes Blog

Stay connected with blog updates and outage reports delivered while they're still fresh.

Upgrade your browser to view our website properly.

Please download the latest version of Chrome, Firefox or Microsoft Edge.

More detail