• Expert

Rethinking disaster recovery for the cloud


DR practices that were designed for in-house computing are out of sync with the cloud world. If you haven't already revised your DR plan for cloud computing, now's the time.

On September 4, 2018, a cooling problem in a data center created an outage for Microsoft Azure cloud that impacted companies across the south central United States. “Azure was down for most of one business day,” said one IT professional. “Though we’re a nationwide company, all of our traffic goes through Dallas, Texas, so the entire company was affected. It caused a slowdown in many of our business processes.”

As a leading public cloud services provider, Azure is not alone when it comes to outages. Google cloud and Amazon AWS have also experienced outages that have adversely affected their corporate clients. If you haven't already revised your DR plan for cloud-based computing, you need do it now.

Rethinking DR

“We really haven’t thought about modifying our DR plan until now,” said an IT manager at a west coast financial services firm. “When we went back over our contracts with cloud vendors, we discovered that almost all of the contracts contained disclaimer clauses saying that the cloud providers would not be responsible for service or data recovery SLAs if a disaster occurred. That really concerned us.”

The plot thickens further for companies using software as a service (SaaS) vendors that in turn rely on third party cloud providers to host their services.

What happens when the third-party cloud provider the SaaS company is using experiences an outage in its data center? “In that case, which is highly unlikely, we would simply put the client in touch with our cloud provider,” said one California SaaS company executive.

Unfortunately, finding yourself face to face with a third party you have no contract with and that you don't even know, is not a good position to find yourself in if you are experiencing a disaster.

In the cloud, you have to think differently. DR practices that were designed for in-house computing are out of sync with the cloud world, where strategies such as replication of systems and data, cooperative testing with vendors, and even failover to alternate vendors need to be considered.

Here are seven recommended best practices for revising your DR plan for the cloud.

1. Regularly backup and replicate your systems and data

“There is just huge exposure now with cloud computing that companies haven’t thought about,” said Michael Flavin, director of sales at Saalex IT, a network infrastructure company. “One of the ways that companies can protect themselves against a cloud outage is by maintaining a secure backup of their systems and data offsite that they can failover to. This can be accomplished by regularly replicating your data to this second backup data center.”

2. Understand the order in which you restore systems during an outage

In the old central data center days, it was relatively uncomplicated to determine which systems had to be restored first during an outage, and which came after. What made this easier to determine was the fact that all of these systems were under your own direct control.

This isn’t the case with hybrid computing, where applications and data can move from one cloud to another, or between clouds and your in-house data center.

“When clients come to us, one of the first things we do is to sit down with them and determine which systems need to be restored first,” said Derrin Rummelt, director of cloud engineering and R&D for U.S. Signal, a hybrid IT solutions provider. “Then we perform testing to ensure that recovery actually works.”

It’s critical to know the order of restoration and also where different systems and groups of data operate and are stored—because in some cases, it might be necessary to reach out to another cloud or the internal data center to complete system transactions. If even one of these resources is unavailable, your recovery is in jeopardy.

This becomes more complicated as applications and data are modified, because additional risk is introduced when organizations fail to retest the new modifications. As a consequence, recoveries no longer work.

3. Test your DR regularly

Even if your systems and data remain relatively unchanged, there are always risks that new changes are introduced into the infrastructures and platforms that your cloud vendors use and that can impact the performance of your own systems and data. The only way to safeguard against this is to annually test your DR plans with your cloud vendors to ensure that a recovery really does work.

“A company can be using multiple SaaS, PaaS and IaaS cloud platforms in its IT,” said Saalex’s Flavin. “By regularly testing these systems, even through replication, you can assure that recovery works in each cloud scenario.”

Can organizations realistically take this task on?