Product News
Announcing Cloud Insights for Amazon Web Services

Outage Analyses

Internet Outages That Disrupted 2022 and How to Prepare for 2023

By Internet Research Team
| | 10 min read

This post is also available for: Germany (Deutsch), Spain (Español), France (Français), Italy (Italiano) & Japan (日本語).

Summary

We’re ringing in the New Year with a look back at how Internet outages disrupted business operations in 2022, be it through canceled flights, connectivity failures, or communication mishaps. And we share how your IT teams can better prepare in 2023.


Outages happen every day, both big and small, and all over the world. In 2022, outages were as disruptive as ever—causing poor user experiences and sometimes crippling business operations. ThousandEyes recorded thousands of outage events last year with our network-agnostic data, which lets us see clearly across the Internet and into the cloud. We provide these insights to our customers so they can plan proactively and mitigate downtime where possible. From that work, we created this timeline recapping some of the events we observed and their lessons. Our aim is to help you stay online and operational in 2023.


Join the "Top Outages of 2022" webinar to hear our experts' analysis and takeaways. Save your spot now.


British Airways, February 25, 2022

What Happened: British Airways' online services outage caused hundreds of flight cancellations and disruptions in the airline's operations, including at its London Heathrow hub—the busiest international airport in the world. Our monitoring shows that this incident occurred when application servers became unresponsive rather than due to a network issue.

Geo Impact: Global → Read the Outage Analysis

Learning: Architecting backends that avoid single points of failure can reduce the likelihood of a chain of events, like the one experienced by British Airways, that can ground your entire fleet.

Twitter, March 28, 2022

What Happened: Twitter was rendered unreachable after a Russian Internet and satellite communications provider blackholed traffic by announcing one of Twitter’s prefixes. BGP misconfigurations are not uncommon. However, they can also be used to block traffic in a targeted way, and it is not always easy to tell when the situation is accidental versus intentional. 

Geo Impact: Global → Read the Outage Analysis

Learning: Though your company might have RPKI implemented to fend off BGP threats, it's possible that your telco won't. Something to consider when selecting ISPs. 

Atlassian, April 5, 2022

What Happened: Atlassian's Jira, Confluence, and OpsGenie are three products that many developer teams rely on. Due to a maintenance script error, these services experienced a days-long outage that impacted roughly 400 of Atlassian's customers. Despite the relatively small subset of impacted customers, the generic updates presented on Atlassian’s status page could have caused confusion for those whose experiences did not match up.

Geo Impact: Global → Read the Outage Analysis

Learning: One cannot rely on status pages alone to communicate about outages. Customers can be left worrying for hours or even days with no answer as to how serious an outage is and when it will be fixed.

Rogers Communications, July 8, 2022

What Happened: Rogers Communications withdrew its prefixes due to an internal routing issue, rendering the Tier I provider unreachable across the Internet for nearly 24 hours. This outage affected millions of users and many critical services across Canada.

Geo Impact: Americas → Read the Outage Analysis

Learning: No provider is immune to outages, no matter how large. So, for crucial services like hospitals and banking, plan for a backup network provider that can alleviate the length and scope of an outage.

Internet-outages-disrupted-2022-prepare-2023-figure-1.png
Figure 1. Packet loss observed for locations connecting to a Rogers customer

Amazon Web Services, July 28, 2022

What Happened: This AWS outage was caused by an Availability Zone power failure and impacted applications such as Webex, Okta, and Splunk. Not all users or services were affected equally, however, with Webex components located in Cisco data centers remaining operational.

Geo Impact: Global → Read the Outage Analysis

Learning: Be sure to have redundant AZ architecture as they are typically active/active and remove the need to execute a backup plan.

Internet-outages-disrupted-2022-prepare-2023-figure-2.png
Figure 2. Interfaces affected in the AWS network

Google, August 9, 2022

What Happened: Google Search and Google Maps became unavailable to users worldwide, with those attempting to reach the services receiving error messages. Users from the United States to Australia, Japan to South Africa could not load sites or execute functions. Applications dependent on Google’s software function also stopped working during this rare outage.

Geo Impact: Global → Explore This Outage in ThousandEyes | Read the Outage Analysis

Learning: It is important to monitor not just your application frontends but also the performance-critical dependencies that power your app.

Internet-outages-disrupted-2022-prepare-2023-figure-3.png
Figure 3. Outage renders Google domain properties unreachable in several countries

Zoom, September 15, 2022

What Happened: The brief outage impacted users globally, leaving them unable to log in or join Zoom meetings. Rescheduled telehealth appointments or job interviews were just two ways that users felt the disruption of this application issue.

Geo Impact: Global → Read the Outage Analysis

Learning: It may be that the app itself is causing issues rather than the network. Having visibility into which it is can prevent confusion and finger-pointing during root cause analysis.

Zscaler, October 25, 2022

What Happened: Customers using Zscaler Internet Access (ZIA) experienced connectivity failures or high latency in reaching Zscaler proxies. Because Secure Service Edge (SSE) implementations typically proxy web traffic and critical business tools and SaaS, Salesforce, ServiceNow, and Microsoft 365 could have been made unreachable for some customers by this incident.

Geo Impact: Global → Read the Outage Analysis

Learning: SSE is another piece of the Internet puzzle to consider when things go awry. Having network-agnostic data for complex scenarios like this can enable quicker attribution and remediation.

Internet-outages-disrupted-2022-prepare-2023-figure-4.png
Figure 4. Traffic to the Zscaler proxy spikes to 100% packet loss

WhatsApp, October 25, 2022

What Happened: The two-hour outage left WhatsApp users unable to send or receive messages and was related to backend application service failures rather than a network failure. Occurring during peak hours in India, where the app has a user base in the hundreds of millions, the incident left people unable to communicate for personal or business matters.

Geo Impact: Global → Read the Outage Analysis

Learning: A thriving SaaS business relies on continuous improvement, which is why an immediate feedback loop—whereby mistakes can be rectified quickly—is necessary. Having data that can help rule out the network as the culprit when a production system error occurs can speed up the resolution of technical issues.

Amazon Web Services, December 5, 2022

What Happened: ThousandEyes observed significant packet loss between two global locations and AWS' us-east-2 region for more than an hour. The event affected end users connecting through ISPs to that region's cloud infrastructure provider's services.

Geo Impact: Global → Explore This Outage in ThousandEyes | Read the Outage Analysis

Learning: With public cloud, it’s important to monitor not just the applications themselves but also the cloud infrastructure components, including individual cloud regions and cloud availability zones and any dependent cloud software services.

Downtime is inevitable, and outages are a fact of life for every ISP and cloud provider. But by doing the work to build a resilient infrastructure, you can safeguard your applications from their negative impacts while improving your users' experiences.


Don't miss our upcoming webinar, "The Top Outages of 2022: Analysis and Takeaways," to hear expert reflections on last year's major outages and how you can prevent or plan for them this year. It happens on January 19th, and you can register here.


Subscribe to the ThousandEyes Blog

Stay connected with blog updates and outage reports delivered while they're still fresh.

Upgrade your browser to view our website properly.

Please download the latest version of Chrome, Firefox or Microsoft Edge.

More detail