Big Data: 3 Questions to Ask When Comparing Relational to NoSQL Databases

Actually working with Big Data versus hearing the hype of Big Data and NoSQL vendors continues to amaze me.  If I see Volume, Variety, and Velocity as the description of Big Data one more time I won’t be able click away from the web page fast enough.  So below are three main questions you should ask before even starting to think about Big Data and picking a new NoSQL database over an established relational database.

What feature justify using NoSQL over relational database?  Hopefully, the answer is not buzz words or acronyms and is something truly in the NoSQL databases; such as Entity Value Attribute, Graph Data Store, or Extensible Data Stores.  All of these data attributes and key values data store types can be built with relational databases and some databases, such as DB2, even have these special key value data types built in.  Key value pair table designs, columnar data table designs or Entity Value Attributes tables can be easily and quickly defined in relational databases.

Often it’s said that NoSQL provide more schema flexibility with these designs then relational databases.  Defining key value data designs are simple in NoSQL and in relational databases because of the small two or three attribute definitions.  These types of data stores are fundamental and can be modified and quickly managed easily within any relational or NoSQL database environment.

What type of Big Data transaction consistency is required?  Ask if the big data analysis in the NoSQL database needs to be repeatable and reliable. The answer is usually “absolutely” but unfortunately most of the NoSQL databases don’t really enforce transaction integrity or complete transaction synchronization across Big Data.  NoSQL transaction integrity runs the full spectrum of reliably with some having strong ACID consistency capabilities like relational databases to only BASE weak eventual consistency or sometimes only supplying approximate answers.

If the business is going to make customer- or product-impacting decisions on the answers from your Big Data analytics project, shouldn’t the answers and processes be repeatable, reliable and re-creatable?  With the some of the weak consistency models within the Big Data NoSQL databases this may be very difficult.

Do you realize that a Big Data application using an open source NoSQL database is going to cost more than a relational database?  Many NoSQL databases start out free with a small free version.  The prototype that the CEO saw works on the laptop with several gigabytes of data.  The enterprise version may be a different license costing thousands or more.  The building out of an enterprise Big Data NoSQL infrastructure can require a huge hardware investment, hundreds of terabytes of storage and a lengthy setup time.  Also, since all of this is new to your personnel, training is going to be needed which will elongate your schedule and can all add up to bigger costs to the business then using or augmenting your existing relational database infrastructure.

Remember that none of these NoSQL databases have any storage compression technology which means that the full data sizes will be needed.  Relational databases and especially the DB2 family leverage compression technology, saving typically 80% of the storage space and tremendous amounts of CPU and I/O for all processes.  These relational database compression savings are especially emphasized with Big Data, so do a storage estimate immediately and compare the costs of the NoSQL storage versus what is needed for the relational compressed database.

NoSQL does not use SQL as its interface.  So users, tools and interfaces have to reference the database through different, new interfaces and/or Java APIs.  This requires your staff to learn all these new interfaces, their considerations and performance tuning aspects.  With limited tools available for these new Big Data NoSQL databases your problems have few shortcuts to solve your Big Data problems.  With corporate budget issues and staffing timelines tight, a learning curve for NoSQL administration, interface programming and integrity issues will only make the project timeline and problems bigger.

So, truly look at all the factors of your Big Data project. Your relational database infrastructure can truly be an advantage for getting business value out of your Big Data.  Any of these questions that are answered with acronyms, buzz words or technology jargon are big warning signs.  Focus the Big Data applications on bringing true business value to the customer or product situations.  Ask users to define the business questions that will get answered with the Big Data and you’ll see that the answer has nothing to do with the NoSQL technology and the relational database can provide a better, more cost effective solution faster by leveraging proven performance, reliability and savings through your relational database infrastructure, processes and personnel.

Dilbert gets it right again. http://www.dilbert.com/2013-01-09/


Dave Beulke is an internationally recognized DB2 consultant, DB2 trainer and education instructor.  Dave helps his clients improve their strategic direction, dramatically improve DB2 performance and reduce their CPU demand saving millions in their systems, databases and application areas within their mainframe, UNIX and Windows environments.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>