3 Big Data Disaster Recovery Performance Factors

As I talked recently in my blog “Big Data: 5 Considerations for Disaster Recovery,” Big Data disaster recovery drives companies to intense analysis of the big data usage and its importance.  Getting the business to discuss and define the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) for Big Data is a very interesting team building and consensus building challenge.  Herding cats is easier than getting real answers for these Big Data disaster recovery performance RTO and RPO initiatives.

The following three factors are vital for starting the conversations and building consensus to define your RTO and RPO for your Big Data disaster recovery performance objectives.

  1. RTO – Recovery Time Objectives are I/O sensitive.  Big Data disaster recovery is about restoring your Big Data which can sometimes be petabytes of information.  Defining how much of your petabytes of Big Data is absolutely needed for operations is vital to defining and minimizing your RTO.  To get to the truth about your Big Data usage patterns analyze the system, processing dependencies, end-user usage, and data access history patterns.  Leverage the 80/20 rule to analyze the minimums and maximums for each functional access to determine the data parts accessed, the data freshness time frame, and the amount needed for your Big Data disaster recovery.  This analysis will provide the raw figures for calculating the best RTO possible through the simple formula RTO = minutes (T, S/B) where T stands for processing time, S for resulting data set size,  and B for effective available bandwidth.
  2. Establish a Recovery Point Objective (RPO) that is easily understandable.  The Big Data disaster recovery point needs to be understood and agreed to by a wide variety of people, from C level executives, IT management, and operations to business personnel, store managers, and register assistants.  Make sure to communicate your all your disaster recovery plans, especially your Big Data disaster recovery RPO, to everyone. This can make all the difference when a customer posts a Twitter comment and your social Big Data analytics isn’t available yet to address their issue or situation.

    Having an easily understood RPO also helps when disaster strikes and the pressure is on everyone to execute the Big Data disaster recovery plan. Having a solid RPO agreement helps everyone understand what systems, databases and application will be synchronized, how they will interact with the Big Data disaster recovery RPO, and when they will begin serving the business processing again after the outage.
  3. Leverage the new Big Data technologies whenever possible to tremendously minimize RTO.  The new DB2 BLU Acceleration which I detailed in my blog “IBM BLU Acceleration – Best Yet for Big Data!” shows that compression technology is a great example of an easy way to minimize the Big Data disaster recovery RTO.  By getting up to 95% data compression through the new DB2 BLU acceleration, the Big Data disaster recovery time can also be cut by 95%.  This compression technology is changing the critical database into easily backed up Big Data disaster recovery assets.  The BLU technology provides much better RTO timeframes than any other DBMS, especially Hadoop systems that need three copies of uncompressed data to function.

    Also, leveraging the new storage Flash Copy technology that offloads copy processes to storage memory and storage processors is a great way to quickly flash copy your data and offload the backup processes away from the general CPUs and into the storage infrastructure itself.  By using the quick “flash copy” a Big Data disaster recovery image can be created in moments.  The flash copy has the advantage of providing an easy Big Data disaster recovery point for RPO flexibility that can be fit into any business processing timeframe by moving the work off the mainframe CPUs.

    There are also the new features in the IBM DB2 Analytics Accelerator (IDAA) new High Performance Super Saver (HPSS).  The basic features of IDAA HPSS to back up a portion of your Big Data and push it into the IDAA, thereby shifting it from the DB2 environment is a another great way to minimize the Big Data disaster recovery footprint by eliminating it all together.  With a new DB2 IDAA HPSS version and features coming out this fall the DB2 IDAA HPSS provides a lot of flexibility to improve query performance through multiple IDAAs and eliminate data from the main CPU, minimizing your RTO timeframe and backup operations.

There are many other Big Data disaster recovery considerations. Using these three factors will help your company have the best RTO, RPO, and Big Data disaster recovery possible.

I invite you to my session at the IOD conference where I will discuss some of the solutions for “Big Data Disaster Recovery Performance” that are working and available today.

Dave Beulke is a system strategist, application architect, and performance expert specializing in Big Data, data warehouses, and high performance internet business solutions.  He is an IBM Gold Consultant, Information Champion, President of DAMA-NCR, former President of International DB2 User Group, and frequent speaker at national and international conferences.  His architectures, designs, and performance tuning techniques help organization better leverage their information assets, saving millions in processing costs.


Also as President of the Washington DC DAMA National Capital Region, I wanted to let you know of the great speakers and topics that we have for our September 19th DAMA Day.  Register today!

[widgets_on_pages id=”DAMA 2013 shortcut”]


Also I will be talking more about Big Data design considerations, the new BLU technology, Hadoop considerations, UNION ALL Views and Materialized Query Tables during my presentation at the International DB2 Users Group IDUG EMEA conference in Barcelona, Spain, October 13-17, 2013.  My speech is Wednesday October 16th at 9:45 “Data Warehouse Designs for Big Data” in the Montjuic room.

This presentation details the designing, prototyping and implementing a 22+ billion row data warehouse in only six months using an agile development methodology.  This complex analytics big data warehouse architecture took processes for this federal government agency from 37 hours to seconds.  For more information on the conference go to www.idug.org.


I will also be presenting at the Information on Demand (IOD) conference in Las Vegas November 3-7, 2013.  I will be presenting “Big Data Disaster Recovery Performance” Wednesday November 6, at 3 pm in the Mandalay Bay North Convention Center – Banyan D.

This presentation will detail the latest techniques and design architectures to provide the best Big Data disaster recovery performance.  The various hardware and software techniques will be discussed highlighting the Flash Copy and replication procedures critical to Big Data systems these days.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>