Downtime can be a disaster for businesses of any size. Whether it's an offline website preventing customers completing purchases, hardware failures that prevent access to critical servers or cloud outages that keep employees from communicating with each other, the cost of failure can be high.
It's estimated that IT outages cost organizations in North America $700 billion a year. These include those caused by human error, system failures, cyber crime and natural disasters, and not even the biggest firms are immune from the effects.
For large enterprises, downtime can be even more costly, as it tends to come with a huge amount of negative publicity. However, this also offers other firms an opportunity to learn from these incidents and improve their own systems. So what can you learn from some of the biggest and costliest IT outages of recent years?
1. Facebook, 2021
Our dependence on social apps to stay in touch was made clear this year when Facebook (now known as Meta) suffered a widespread network outage that affected users across its platforms. This impacted 3.5 billion users of Facebook, WhatsApp and Instagram for 5.5 hours and cost the firm as much as $60 million in lost advertising revenue, as well as wiping off almost 5% of its share value.
The cause of the downtime was traced to an error in a configuration change that meant routers were unable to find and connect to the firm's data center. The knock-on effects were so widespread that IT staff couldn't even get into the building to fix the problem, as their physical access cards relied on the same systems that were offline.
The lesson: Small changes can have big impacts. Even an error in a single line of code can be enough to cause an outage, so it's vital any updates are checked thoroughly.
2. Fastly, 2021
Facebook's downtime wasn't even the first major incident to affect big internt brands in 2021. Just a few months earlier, a network outage at content delivery network (CDN) provider Fastly resulted in some of the world's most popular websites going offline, including:
- The BBC
Those firms, like most large businesses, all rely on CDN providers like Fastly to distribute their content quickly to users. But when those services fail, there's not a lot they can do but wait. In this case, the cause was said to be one of Fastly's customers, who performed a routine, valid configuration change that triggered a bug within the system, resulting in 85% of the network returning errors.
The lesson: Understand your dependencies. With much of the IT world built on cloud systems and other third-party services, issues outside your control can still have a major impact. Know where your weak points are and be sure you have a strong backup plan in place.
2. Amazon, 2018
An IT outage doesn't have to last for days to be costly, as Amazon found out in 2018 at the worst possible moment. Over the last few years, the company has turned its Prime Day promotion into its very own Black Friday - a hugely hyped one-day event where consumers hunt for limited-time deals. But in 2018, it underestimated just how popular it would be.
The company's servers weren't able to cope with the huge volume of traffic to the site, resulting in a cascading series of failures that left the site unavailable within 15 minutes of the sale starting. The problems lasted for hours while the firm scrambled to manually add capacity, costing it an estimated $1.2 million a minute in sales.
The lesson: Fail to plan and you plan to fail. Even the biggest organizations need contingencies in place to cope with surges in traffic, especially if you've got a major event coming up.
3. British Airways, 2017
Outages that prevent customers accessing digital services are one thing, but when they roll over into affecting the real world, they can be even more severe. For a big, international business like British Airways, the consequences can be especially disruptive, as it found out in 2017. It suffered an almost total system failure caused by an IT contractor accidentally switching off an 'uninterruptible power supply' at a key data center, which in turn damaged critical servers during the resultant power surge.
As a result, hundreds of flights around the world were canceled, leaving tens of thousands of passengers stranded, while planes and crews were left out of position, further complicating efforts to get back on track. All told, it took around a week to get operations up and running again, at a total cost of around £80 million ($109 million).
The lesson: Beware the domino effect. No system works in isolation, and the knock-on effects of a single error can have widespread repercussions across the business, especially for those in complex industries.
4. Sony, 2011 (and 2014)
The hack on Sony that knocked its PlayStation Network offline for 24 days and compromised the personal data of 77 million users still stands as one of the most damaging attacks in history. This is in terms of both the length of the downtime and the cost, with some estimates suggesting the company was out of pocket to the tune of $170 million once all expenses, including fines and compensation, were factored in.
However, it seems that key lessons from the incident weren't learned, as Sony was hacked again a few years later - this time shutting down its movie studio division, with estimates of costs this time reaching as much as $300 million.
The lesson: Cyber attacks can have huge impacts. From lost revenue to regulatory penalties and legal action, the potential consequences of failing to take security seriously are almost limitless - and those who don't learn from history are destined to repeat it.