Contingency Planning for High Availability Networks (Case Study)

contingency1.jpg

Introduction:

Uninterrupted operation of information systems are vital components to helping us maintain a high availability network that helps support and provide continuous service to our customers. Information system resources are an essential element to our business success and it’s crucial that we identify services utilized in these systems which need to operate efficiently. Despite a greater awareness for the need of business continuity planning, research has suggested the costs for data center downtime increased significantly in recent times. In 2016, total costs were estimated to be at $2.4 million, up 39 percent within the prior three years.

Proper contingency planning is the foundation upon which effective operations of alternative information systems can be establish in the unfortunate event of losing access to primary systems due to the result of an unplanned outage. This can be achieved by instituting procedures, technical measures, and thorough planning, which allows the quickest possible system recovery subsequent to service disruptions.

Findings & Recommendations:

It’s interesting to note that Uninterruptible Power Supplies (UPS) system failures are the number one cause of unplanned outages, which accounted for 25% of events as seen in figure 1 below. Over the past year we have experienced a large number of power outages at our network operation centers on the west coast where all of our back-up power sources failed to activate causing major disruptions in services. As a result, we’ve come to the conclusion that we must implement a preventative maintenance program that mandates periodic scheduled maintenance checks that would be run against all UPSs’ and all other power sources for that matter including our back-up gas generators and lead acid cell batteries. It’s trivial at best and the replacement cost for a replacement UPS justifies the mitigation against all future losses as a result of failures. Corrective maintenance should always be the last resort.

The second most cause for unplanned outages are Distributed Denial of Service attacks (DDoS) which accounted for 22% of all failures, seen in figure 1 below. Cyber criminal activity climbed from 2 percent of outages in 2010 to 18 percent in 2013 to its current standing at 22 percent. This can best be attributed to DDoS attacks made easy by widely available and freely downloadable online “script kiddie” software such as Low Orbit Ion Canon (LOIC) and others. Cyber attacks are the number two reason for system outages and is dramatically on the rise. Although we haven’t experienced catastrophic outages as a result of such attacks, we deem it necessary to plan for it by putting policies in place as part of our contingency plans. This will help us maintain a high available network without service interruptions while simultaneously helping us meet regulatory compliance. We can meet challenges head on with a properly trained and skillful workforce comprised of industry experts in cybersecurity.

1

Figure 1

It was also noted that the current average of partial downtime is 64 minutes with an average cost of $8,851/minute, evidently up 12% compared to the previous year. At a cost of $926 per minute of unplanned outages, companies realized, on average, a total loss of $17,244/minute! It’s obvious the efforts put forth didn’t make a huge impact. Inadequate planning is statistically a wide known vulnerability that plagues many recovery plans for many organizations. It would seem that the remedies required for inadequate planning would incur a comparatively negligent expense as compared to realized potential losses.

There are a few ways in which downtime can have adverse affects on an organization, some of which can be seen in figure 2 below. Statistics have identified the average costs where businesses lose money as follows:

  • Business Disruption – $201,550
  • Lost Revenue – $197,500
  • Recovery – $17,570
  • IT Productivity – $56,789
  • Equipment – $8,865

4

Figure 2

Poor business continuity planning is contagious in regards to data recovery. Among the items where many organizations fall short include:

  • 53 percent reported to not backing up their data on a daily basis.
  • 32 percent of IT administrators cited that backing up every day is not an efficient use of their time.
  • By industry, the healthcare field is considered to be among the most negligent as an alarming 66 percent of respondents report to not testing their backup systems to gauge effectiveness.
  • 75 percent of organizations claimed that daily backups threaten workplace productivity.

These are areas we feel we can make huge inroads at minimal cost. By implementing a sound contingency plan, we can transform the existing culture through interactive training events and un-announced audits. We feel these steps would help change the lackadaisical attitude that has developed over prior years in regards to contingency planning and preparedness.

3

Figure 3

Disasters can do far more damage besides hitting the pocketbooks of organizations, some of those are listed below. There are documented cases where the reported repercussions which threaten to hinder core operations can place business continuity in serious jeopardy.

  • 6 of organizations suffered damage to their brand reputation.
  • 8 percent of recovery costs the company money that wasn’t included in the budget.
  • 1 of organizations suffered permanent losses.
  • 2 of recovery efforts resulted in disruptions that had a major impact on revenue potential.
  • 9 percent of recovery efforts consumed staff time that impacted the business.

Some of the different approaches we’ve identified as options towards establishing a fully inclusive business continuity management plan include the following:

  • Software-based disaster recovery solutions
  • Remote disaster recovery sites that mirrors most of our primary sites
  • Dissimilar secondary sites
  • Hardware-based solution for disaster recovery (replication)
  • Cloud based solutions

These success of these initiatives are solely reliant upon adequate funding but also, in part, rely upon plans of action and milestones. In addition to adequate funding, successful contingency planning is best relied upon through proper planning by way of innovative business intelligence and analytics. This includes opening proper communication channels with employees and upper level management about their roles and responsibilities during and after unplanned service disruptions including all processes involved in service restoration.

References:

Austin, T. (2015, February 9). Business continuity, disaster recovery among top 2015 IT investments. Retrieved October 21, 2016, from http://continuitycenters.com/news/business-continuity-disaster-recovery-among-top-2015-it-investments/

Upgard. (2016, October). 5 Things About Configuration Management Your Boss Needs To Know. Retrieved October 19, 2016, from https://www.upguard.com/blog/5-configuration-management-boss

Bradford, C. (2016, June 14). Statistics: Business Continuity For IT Pros. Retrieved October 19, 2016, from https://www.storagecraft.com/blog/statistics-continuity-tech/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s