Big Data: 5 Considerations for Disaster Recovery

Big Data drives the issue of disaster recovery to the front of the discussion.  How critical or how much benefit your Big Data business application is to the bottom line of the business is directly proportionate to your Big Data disaster recovery planning efforts.
The following five considerations will help your evaluation of the data amounts, detail level and costs for planning your Big Data disaster recovery.


  1. Recognize that the odds are that your company will declare a disaster.  According to many business surveys of various storage, backup and government websites, it’s likely that 76% of companies will declare some type of business outage or disaster recovery.  Some of the factors are self-inflicted such as the recent NASDAQ outage, Amazon global slowdown, or the Goldman Sachs trading issues. Some are caused by natural disasters like hurricanes, floods or earthquakes.  Big Data makes IT’s job much more difficult during disaster recovery.
  2. It’s Big Data: you can’t have everything.  There are basically five types of recovery: local tape/storage backup, remote backup, virtual snapshot, local cluster replication, or remote cluster replication.  These different solutions present different return time objectives (RTOs) and different recovery point objectives (RPOs).  Big Data disaster recovery planning presents an even bigger problem because of the costs of the Big Data storage requirements.  Replicating Big Data storage for disaster recovery is very cost prohibitive and you need to evaluate the value of your data over time.
  3. Understand your company’s unique RTO requirements. Every business is different and an alternative to full storage replication is to prioritize the Big Data input types and their value as they age to minimize your Big Data disaster recovery storage infrastructure.  This additional type of categorization of the Big Data disaster recovery helps drive the most cost-effective solution from a business value point of view.  Discussion of RTOs and RPOs may help eliminate Big Data backups entirely and instead focus the Big Data disaster recovery conversation on getting the application and Big Data feeds operational as soon as possible, bypassing any history requirements.
  4. Logically prioritize your applications and business functions.  The analysis of Big Data disaster recovery planning cuts to the core of the business processing dependencies, their financial costs and benefits, and social impact to the business.  The financial considerations directly impact the proper priority of Big Data disaster recovery by setting it ahead or behind other core operational or revenue generating related applications.  Determining the 80/20 storage requirements of the operational data necessary for your Big Data disaster recovery plan can help minimize the storage and properly prioritize a cost effective solution.
  5. Leverage Disaster Recovery practices and assets for true business value. Given the scale, complexities and costs of Big Data disaster recovery planning, consider designing and leveraging your business continuity solution beyond using it only for a Big Data disaster recovery to help justify or absorb some costs considerations.  For example, using your disaster recovery Big Data replica or VM environment as a testing, reporting or training environment can help all types of IT operations and cost considerations.  Big data disaster recovery costs aren’t only a tremendous storage consideration, but also software licensing and processor capacity consideration.  By leveraging all aspects of your Big Data disaster recovery infrastructure, you can bring the most cost effective and efficient solution to your Big Data system.

Disaster recovery planning is fundamentally no different than planning for any other IT disaster recovery situation.  Unfortunately, if you find that management is not committed to disaster recovery for the small systems, it definitely won’t be committed to the disaster recovery of the Big Data system.  Use these five considerations to drive the discussion and come to a cost effective disaster recovery solution that will keep applications operational and the business viable.

I invite you to my session at the IOD conference where I will discuss some of the solutions for “Big Data Disaster Recovery Performance” that are working and available today.
Dave Beulke is a system strategist, application architect, and performance expert specializing in Big Data, data warehouses, and high performance internet business solutions.  He is an IBM Gold Consultant, Information Champion, President of DAMA-NCR, former President of International DB2 User Group, and frequent speaker at national and international conferences.  His architectures, designs, and performance tuning techniques help organization better leverage their information assets, saving millions in processing costs.


Also as President of the Washington DC DAMA National Capital Region, I wanted to let you know of the great speakers and topics that we have for our September 19th DAMA Day.  Register today!

[widgets_on_pages id=”DAMA 2013 shortcut”]


Also I will be talking more about Big Data design considerations, the new BLU technology, Hadoop considerations, UNION ALL Views and Materialized Query Tables during my presentation at the International DB2 Users Group IDUG EMEA conference in Barcelona, Spain, October 13-17, 2013.  My speech is Wednesday October 16th at 9:45 “Data Warehouse Designs for Big Data” in the Montjuic room.

This presentation details the designing, prototyping and implementing a 22+ billion row data warehouse in only six months using an agile development methodology.  This complex analytics big data warehouse architecture took processes for this federal government agency from 37 hours to seconds.  For more information on the conference go to


I will also be presenting at the Information on Demand (IOD) conference in Las Vegas November 3-7, 2013.  I will be presenting “Big Data Disaster Recovery Performance” Wednesday November 6, at 3 pm in the Mandalay Bay North Convention Center – Banyan D.

This presentation will detail the latest techniques and design architectures to provide the best Big Data disaster recovery performance.  The various hardware and software techniques will be discussed highlighting the Flash Copy and replication procedures critical to Big Data systems these days.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>