What do you do when disaster strikes? It's not a subject any business likes thinking of, but preparing for these eventualities is a vital part of any company's planning.
However, it's one thing making plans for the worst, but quite another to have to put these into action in a real emergency. The worst thing that can happen in this situation is that you discover too late that what you thought was a carefully-crafted response is actually far from adequate, leaving you at a loss about how to recover.
So what can you do to ensure this doesn't happen? The answer is to ensure that your disaster recovery (DR) plan is not only thoroughly researched and covers every reasonable eventuality, but also that it has been fully tested beforehand to spot any potential holes in it.
Have the right plan
Make sure you have a comprehensive plan that covers every potential type of disaster the business may face. This should spell out what to do in the event of a hardware or power failure, a cybersecurity incident or a natural disaster and ensure everyone within the organization knows their role.
There are a range of key steps that need to be taken in order to create a useful DR plan and, while the specifics may be unique to every business, it's essential that every firm has a clear strategy, as the longer you have to endure downtime, the more at risk the business is. For instance, the US National Archives and Records Administration notes more than nine out of ten firms that lose access to their computer systems for ten days or more will declare bankruptcy within a year.
However, simply devising a strategy is just the first step. A plan that sits in a cupboard gathering dust won’t be of any use, and could actively harm the organization if it is put into action in a real situation. If there are gaps or contradictions within it that leave employees confused or taking the wrong course of action, it can greatly increase the recovery time and lead to much higher costs.
How often should you test?
It's therefore vital that such plans are fully tested, and tested frequently. It's not enough to run a training drill just once when you've finished crafting your plan, as the tech world and the nature of threats businesses face is constantly evolving, so it may be easy for your plan to become outdated. New applications or processes, the opening of new locations and emerging external challenges can all change your environment and leave existing plans ineffective.
However, frequent testing is an area where many businesses fall short. According to one survey, less than a third of organizations (31%) test their DR plans more than once a year, which, given how quickly a firm's situation can change, could end up costing them huge amounts of money, or even worse, put them under threat.
While there’s no set schedule for how often you should test your DR plans, there are a range of factors that should be taken into consideration when setting a schedule. These include:
- The size of the business - larger firms with more moving parts will need to test more regularly than smaller businesses
- How often infrastructure is updated - any time you make significant changes to storage, backup, or data retrieval processes, you'll need to run tests to identify the effect these alterations have had
- Turnover rate - it's important that staff are well trained on what their responsibilities are, so if you frequently have new people in the IT department, you'll need to make sure they're up to speed
Testing every year should be the absolute minimum, and this will only be for relatively small firms with a fairly stable environment. For large enterprises where circumstances are changing constantly, quarterly or even monthly testing should be the norm.
What should a DR test involve?
Once you've set a schedule that ensures you're keeping up with any changes in the business, the next step is to figure out what the test will look like, and what key questions you need to answer. There’ll be a wide range of issues to be scrutinized, from seeing how long it’ll take the team to bring a server back online to how members communicate with each other in difficult or stressful circumstances.
There's no one type of test that will meet all these goals, so you need to be prepared to run through a variety of scenarios. At the same time, you also need to consider how the test will run and how in-depth it needs to be. There are a few key types of testing you can undertake, which will be suited for different situations.
Key methods to consider include:
- Walkthrough testing - the simplest type of test, this involves your team working through each stage of the plan verbally to determine if it's logical and to identify any weaknesses
- Simulation testing - this involves running through a specific scenario such as a hardware failure or natural disaster. It's more in-depth than a walkthrough and may involve role-playing
- Parallel testing - this involves testing failover systems to determine if they can support actual business transactions, while production systems continue to operate as normal
- Full interruption/cutover testing - this is the most in-depth test and uses actual production data and equipment to test your DR plan. This may involve actually disconnecting primary systems and so has the potential to disrupt business operations. However, it can be the best way to identify any issues with the plan in a real environment
The next steps
Whatever type of test you perform, it's vital that each step is fully documented, so the findings can be fed back into DR planning and the right changes can be made to the strategy.
Strong note-taking throughout the test itself is a must. If you're only reviewing what happened after the fact, it's easy to forget what happened exactly and in what order. This also allows anyone else not directly involved to easily review the test.
The post-test debrief should use these resources to help identify where any weaknesses lie, and also make direct recommendations about how to improve the plan. Then, when changes have been made, test again. A good DR plan should never be static, so there should be no point when it’s declared 'complete'. Instead, testing needs to be an ongoing process that’s always updated and refined in response to new circumstances.