Close
  • Latest News
  • Artificial Intelligence
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Cloud
    • Cloud

    ThousandEyes Report: Top Cloud Outages of 2023

    A year in review: Major cloud outages of 2023 and the lessons learned for better digital infrastructure.

    By
    Zeus Kerravala
    -
    March 13, 2024
    Share
    Facebook
    Twitter
    Linkedin
      AI-generated image of red cloud icons over a network.
      Image: Eve Creative/Adobe Stock

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      To say the cloud is important to how we work, live, and play is a major understatement – the cloud is now a critical element of tech infrastructure. However, what underlies these outages is often a mystery. That’s why I was intrigued by a recent webcast from ThousandEyes, which looked under the covers at the major cloud outages of 2023.

      Hosted by Brian Tobia, ThousandEyes’ Lead Technical Marketing Engineer, the webcast included a look at the anatomy of an outage. “It’s important to understand the different types of outages we see,” he said. “Understanding them can help you understand how to mitigate some of their impact.” He said outages could vary in the blast radius, whether they’re planned or unplanned, and their mean time to recovery.

      Let’s take a look at what caused the year’s major cloud outages – and what we can learn from these unfortunate incidents.

      TABLE OF CONTENTS

      • Different Types of Cloud Outages
      • Application Outages On the Rise
      • Top Cloud Outages of 2023
      • Bottom Line: Need for Monitoring to Prevent Cloud Outages

      Different Types of Cloud Outages

      “The distributed architecture of today’s applications means there are a lot of different moving parts that need to be orchestrated for something to actually work,” Tobia said. “And a lot of these parts are often single points of failure. Because they’re reused in multiple applications, like an API or a common service, we can see the impact of an outage more widely felt, despite it being a single service.”

      Tobia noted that tracking cloud computing outages can help teams identify patterns and prevent customer service disruptions.

      Looking at the ThousandEyes report from 2023, Tobia said there were many different types of outages. “Overall, we still saw the most common type being ISP-related outages,” he said. “But we saw that there was an increase in CSP outages in 2023 compared to the previous years.”

      In 2023, the number of US-centric outages increased from 34% to 37% and minor outages became more common. “We’re seeing that these smaller, more contained outages are becoming more common,” he said. “Before, there were traditionally a lot of bigger network outages—like really big ones—that would take down a whole bunch of services. But now we’re also seeing smaller ones.”

      But even a cloud outage that starts in the US due to maintenance activity at night can cascade into other geographies in the middle of their business days.

      Application Outages On the Rise

      Tobia said that application outages, which continued to rise in 2023, can have a greater impact. A network outage will affect a single provider, but not so for applications. “The application outage really cascades because a bunch of people are relying on that one application,” he said. “It doesn’t matter what network you’re coming from.”

      He then moved on to look at some outage examples from 2023, focusing on how ThousandEyes works. “We’re able to collect all this data through ThousandEyes,” he said. “Being able to correlate that and collect all this data, it’s really important to get the end-to-end picture of where an outage might occur. And then, also really important, correlate that across every layer.”

      He added that ThousandEyes can show users every layer of a connection, whether it’s related to border gateway protocol (BGP), networks, applications, HTTP errors, or page load times.

      Top Cloud Outages of 2023

      Tobia detailed the list of 2023 outages, including:

      • A 90-minute outage for Microsoft on January 25: This was due to BGP changes that caused network issues. “This was total chaos from a BGP perspective,” Tobia said.
      • A two-hour outage for Outlook on February 7: This resulted in service unavailable/application errors. “The last outage was more around some changes on their ISP routers and other WAN routers,” he said. “This may have been more on the application side.”
      • A seven-hour Virgin Media outage on April 4: This outage arose because of a BGP route withdrawal that caused network traffic loss. “It was kind of similar to what we saw on the on the Microsoft side, when those BGP changes were occurring,” he said. “Without a route to the Virgin Media UK network, a lot of the Internet and transit providers dropped the traffic.”
      • A two-hour AWS cloud outage on June 14: This outage caused latency, server timeouts, and HTTP errors. “They eventually identified the issue as being part of their capacity management system located within US-EAST-1,” he said. “And this impacted services like Lambda API gateway, and the actual management console itself, Global Accelerator.”
      • A two-hour Slack cloud outage on August 2: As a result of this outage, users couldn’t send or receive messages. “Network paths were totally fine,” he said. “We didn’t see any packet loss, latency, or anything like that. So it was purely an application or client issue.”
      • An 18-hour Square cloud outage on September 8: This outage resulted in app errors and backend transactions failing. “This outage prevented it from processing transactions,” he said. “So end users were actually able to submit a transaction – some sellers who were using this to receive payments were successful, [but some users] reported connections dropping out or things not working.”
      • A 36-hour Workday/Cloudflare outage that started on November 2: A complete power failure at a Cloudflare data center caused application and service issues. “Cloudflare was the provider and Workday was an application that runs on Cloudflare infrastructure,” he said. He noted that DR resources took 6 hours to come online. “So there was a complete outage until that facility came online,” he added. “It was able to serve requests at a diminished rate and then full resolution didn’t happen until more than 36 hours later.”

      Bottom Line: Need for Monitoring to Prevent Cloud Outages

      It was a busy year! Clearly, Tobia’s examples of cloud outages were sobering. Who thought a full power outage was possible today with the sophisticated data centers that providers like Cloudflare run? But they still had to deal with the arduous process of getting their DR up and running and working with a power company that may not move at the fastest pace.

      Tobia’s presentation also underscored the importance of monitoring resources to understand what happens when a service goes down so that one can learn and avoid repeating the same mistakes.

      Unfortunately, for most businesses, having a backup for every cloud service the organization uses would be fiscally challenging. For support, IT leaders can use data from companies like ThousandEyes to make uptime part of the evaluation criteria.

      For a complete guide to the cloud computing sector, see our in-depth coverage: Top Cloud Service Providers and Companies

      Zeus Kerravala
      Zeus Kerravala
      https://zkresearch.com/
      Zeus Kerravala is an eWEEK regular contributor and the founder and principal analyst with ZK Research. He spent 10 years at Yankee Group and prior to that held a number of corporate IT positions. Kerravala is considered one of the top 10 IT analysts in the world by Apollo Research, which evaluated 3,960 technology analysts and their individual press coverage metrics.

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      10 Best Artificial Intelligence (AI) 3D Generators

      Aminu Abdullahi - November 17, 2023 0
      AI 3D Generators are powerful tools for creating 3D models and animations. Discover the 10 best AI 3D Generators for 2023 and explore their features.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Applications

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×