How Fastly's Bad Software Update Took Down Half the Internet

{authorName}

Tech Insights for ProfessionalsThe latest thought leadership for IT pros

Tuesday, July 27, 2021

How were websites around the globe taken offline by a tiny bug hidden in a software update made by one of the world's biggest CDN providers?

Article 3 Minutes
How Fastly's Bad Software Update Took Down Half the Internet
  • Home
  • IT
  • Software
  • How Fastly's Bad Software Update Took Down Half the Internet

In June, vast swathes of the internet were knocked out after an error at one of the web's biggest cloud computing providers. The outage left many of the world's most-visited websites unavailable for almost an hour and highlighted how dependent much of the internet is on a small number of critical services.

With major sites including Amazon, the BBC, Reddit, eBay, the Financial Times, the Guardian and the UK government all affected, the disruption was widespread and hard to ignore.

But what actually happened and how can sites prevent any similar incidents occurring in future?

What happened at Fastly?

The cause of the mass outage was traced back to a company called Fastly, which is a content delivery network (CDN) provider. While most consumers visiting the affected sites will likely never have heard of it, Fastly and other firms like it are key parts of the underlying infrastructure of the net.

It all started when the company updated its software back on May 12th. A bug contained in this meant that all it would take was one customer making a specific change to their settings in certain circumstances and the entire network would fail.

This is exactly what happened almost a month later on June 8th, when a single user made a valid configuration change that triggered the bug and led to errors being reported across 85% of Fastly's network.

Nick Rockwell, Senior Vice-President of Engineering and Infrastructure at Fastly, said the issue was detected in under a minute, while the company identified and isolated the cause of the disruption within half an hour.

While 95% of the network was operating as normal within 49 minutes, Rockwell noted this still resulted in a "broad and severe" outage.

Why CDNs are the backbone of the internet

The Fastly outage made such waves because CDNs are an integral part of the way the internet operates, with almost every major website dependent on them to provide fast, reliable service to users. They consist of a large network of geographically-diverse servers that host content as close as possible to the end user.

For example, if you try to access a page on a US-based website like CNN from the UK, instead of being routed to North America, the browser will instead connect to a copy of the page held on a server in Europe. This reduces the amount of time taken to retrieve the content and delivers a much smoother user experience.

For a simple page, the difference in response time is minuscule - fractions of a second - and wouldn't be noticeable to a human. But when all the complex elements of a modern website, such as images and videos, are taken into account, across large numbers of users, the effect of using a CDN is significant.

However, the nature of these networks means that when things do go wrong, problems can spread quickly, as the Fastly outage proved.

Highlighting the importance of resilience

The Fastly incident has led to questions about the resilience of the internet and the wisdom of relying so heavily on a small number of providers to deliver these critical services. Therefore, it may be a good idea for companies to investigate how they can guard against such issues in the future.

For instance, having a multi-CDN strategy is one way in which firms can protect themselves. Even Fastly notes there are "serious pros" to this, including "resiliency, scale [and] improved performance", though the company also warned it’ll result in greater network complexity.

Businesses may therefore have to decide whether the added time and effort a multi-CDN solution entails is worth the risk. But as Fastly's outage shows, it only takes one small bug to have serious knock-on effects, so if availability matters, it's worth considering.

Tech Insights for Professionals

Insights for Professionals provide free access to the latest thought leadership from global brands. We deliver subscriber value by creating and gathering specialist content for senior professionals.

Comments

Join the conversation...