This is The Internet Report, where we analyze outages and trends across the Internet through the lens of ThousandEyes Internet and Cloud Intelligence. I’ll be here every other week, sharing the latest outage numbers and highlighting a few interesting outages. As always, you can read the full analysis below or listen to the podcast for firsthand commentary.
Internet Outages & Trends
The term "triage" originates from the French word trier, meaning "to sort" or "to pick," and was initially used for sorting agricultural products, with its medical usage coming in the 18th century. In the context of Internet incident analysis and response, troubleshooting is supported by a variety of signals from throughout the entire service delivery chain. However, relying solely on individual service metrics, without considering the context, may not provide a complete understanding of the situation. Therefore, it is essential to examine all metrics holistically, distinguishing relevant signals from irrelevant ones. This process of digital triage is crucial for efficiently resolving problems as quickly as possible.
Read on to learn more and explore recent incidents at Workday, X, and Mastercard, or use the links below to jump to the sections that most interest you:
Workday Outage
On March 17, users of Workday, a cloud-based provider of enterprise applications for finance and human resources, encountered a “service unavailable” message when trying to access the platform. During the disruption, which lasted over an hour, Workday applications appeared to be reachable, but attempts to interact resulted in an error message.

ThousandEyes observed no significant network issues, such as packet loss or increased latency, on the network paths connecting to multiple Workday data centers (see Figure 2). This absence of significant network issues, combined with the lack of service degradation on the frontend web servers, suggested that the issues were specifically related to the application itself.

From ThousandEyes’ observations, it appeared that not only was the service reachable, but the login process also seemed responsive. However, attempts to interact further with the service encountered the “Workday currently unavailable” message (see Figure 3). It is important to note that this message was not specifically the result of an HTTP 5xx server-side error but rather a direct response to a service request and part of the Workday application.
This distinction is relevant because server-side HTTP codes such as HTTP 502 or HTTP 503 indicate that while the frontend of the service is reachable, there was an issue retrieving content or calling services in the backend. In this case, the absence of these HTTP 5xx status codes, along with the service-generated error message, suggested that the issue was on the application side, further defining the fault domain.

To further explore whether the fault domain was on the application side, the waterfall chart—a timeline representation of how elements of the Workday service were loaded into the browser—was examined. From this analysis, ThousandEyes observed that during the outage, after completing the login process (including associated calls), the application received an indication that there was an issue. Consequently, it redirected the request to a maintenance page on a different domain, which loaded static service-related information, including a graphic displaying a "service unavailable" error message. This redirection, occurring during the page loading process, further reinforced the conclusion that the fault domain was indeed on the application side.

Evaluating signals in isolation, without considering all data points collectively, can lead to incorrect assumptions about the cause of an outage, resulting in misguided attempts to resolve it. It is particularly tempting to jump to conclusions regarding the source of a disruption, especially when it coincides with a significant event that seems likely to be the culprit. For example, Workday's R1 feature release occurred just before the outage, potentially leading users to initially believe that the disruption was related to the release. However, Workday later clarified that there was no connection between the two.
When investigating and addressing an outage or disruption, it's important to be cautious of speculation that could lead you astray. This can delay the resolution of the issue and create additional problems with incompatible mitigation strategies. Instead, focus on gathering and analyzing all available data points from the service delivery chain in context. This involves conducting a digital triage to sift through the clues and accurately identify the root cause of the outage. Quickly identifying the fault domain—or eliminating possible fault domains—can significantly influence the decision on whether to reroute traffic or roll back a software release.
X Outage
On March 10, the social media platform X (formerly Twitter) experienced a series of disruptions that impacted users worldwide, leading to various service downtimes. The initial problems caused widespread inaccessibility that persisted intermittently throughout the day.
Around 9:45 AM (UTC), ThousandEyes began observing a decrease in availability for X services. This issue appeared to impact multiple geographical regions and manifested predominantly as connection errors during the TCP signalling phase. Connection errors typically indicate a deeper problem at the network layer.
Explore the X outage further on the ThousandEyes platform (no login required).

During the disruptions, ThousandEyes observed network conditions similar to those seen in a distributed denial of service (DDoS) attack, including packet loss. The objective of a DDoS attack is to overwhelm resources and temporarily disable access to a service. This can be accomplished using various techniques, with one common method being to flood the network with excessive traffic. This type of attack is often referred to as a volumetric attack, where the attack traffic competes with legitimate traffic. If the volume of attack traffic is substantial enough compared to the legitimate traffic, it can result in valid packets being dropped, leading to a denial of service for users. In this instance, the significant loss of traffic appeared to prevent users from accessing the application.

The detection of DDoS attacks will typically trigger some form of mitigation. This mitigation can be either manual or automated and can take many forms, including traffic filtering, rate limiting, blackholing, and utilizing CDNs for traffic distribution. In some cases, the mitigation strategy may involve instigating route changes that redirect traffic for mitigation purposes. In this specific instance, there did not appear to be any visible BGP route changes or advertisements related to X’s domain, suggesting that any mitigation likely occurred within the X network environment itself.
Similar to the Workday outage, the recent X outage serves as a reminder to IT Operations teams about the importance of having a comprehensive understanding of the entire service delivery chain when addressing a disruption. With this contextual visibility, administrators can quickly identify the cause of an outage and determine the appropriate next steps. Additionally, having a thorough overview of operations allows teams to evaluate the impact of their mitigation efforts. It's essential to understand whether these interventions are producing positive outcomes or unintentionally making the situation worse.
Mastercard Service Disruption
On March 9, a worldwide Mastercard service disruption led to declined payments. The disruption appeared to affect both physical cards as well as mobile wallet services, forcing businesses to accept alternative payment methods or cash.
The specific cause of the outage is yet to be disclosed, but it appeared limited to Mastercard and businesses that rely on Mastercard’s services. Other credit card providers—such as Visa and American Express—seemed unaffected.
During the disruption, ThousandEyes observed no systemic network issues that could be attributed to the problem. The outage was reported across multiple regions, including the United States, United Kingdom, Japan, Italy, and Australia. It appeared to impact various platforms, such as ATMs, online payments, and payment platforms like Tyro and Square. Each of these platforms is connected through different networks and regions, with alternative card providers operating from the same locations and devices appearing to be unaffected. This indicated that the common factor was Mastercard's service itself. While no official cause was identified, a Mastercard spokesperson confirmed, “There was a period of time earlier today during which some Mastercard transactions were declined. The situation has been resolved and all systems are working as normal."
This Mastercard service disruption again highlights the importance of accurately identifying both the source of a problem and aspects that are functioning correctly. For instance, businesses experiencing declined payments may have initially assumed the issue was with the payment platform or network connection. However, verifying with a different card provider could quickly eliminate concerns about the payment processing hardware and network connectivity, indicating Mastercard as the potential source of the problem. Rapidly diagnosing the cause of an outage enables businesses to take appropriate mitigation steps and resolve the issue as quickly as possible.
This disruption also serves as a reminder of the impact outages can have on financial services and the need for digital resilience in this—and every—industry. In some cases, a lack of digital resilience can even put financial services institutions at risk of regulatory action. For example, the European Union’s Digital Operational Resilience Act (DORA) requires banks, insurance companies, investment firms, and their third-party ICT providers to meet an enhanced set of requirements covering risk management, the resilience of their networks, incident reporting, and more.
By the Numbers
Let’s close by taking a look at some of the global trends ThousandEyes observed over recent weeks (March 3-16) across ISPs, cloud service provider networks, collaboration app networks, and edge networks.
-
Global outages decreased throughout this period, marking a return to the downward trend last observed in mid-February. In the first week (March 3-9), ThousandEyes observed a 5% decrease, with outages dropping from 447 to 425. This decline continued in the following week (March 10-16), as the number of outages further decreased from 425 to 378, representing an 11% reduction compared to the previous week.
-
In contrast, outages in the United States followed a different pattern. Initially, they increased slightly, rising from 189 to 199—a 5% increase compared to the previous week. However, during the second week (March 10-16), outages dropped from 199 to 154, reflecting a 22% decrease.
-
From March 3 to 16, an average of 44% of all network outages occurred in the United States, down from the 46% reported in the previous period (February 17 - March 2). This 44% figure continues a trend observed throughout 2024, where U.S.-based outages typically accounted for at least 40% of all recorded outages.
