Big Data Day Recap – 5 Very Interesting Items

During the Big Data DAMA-NCR Meeting in Washington D.C. this week, I heard from Svetlana Sicular, Research Director of Data Management Strategies at Gartner Group; Raul F. Chong, Senior Big Data and Cloud Program Manager at IBM; and John Adler and Madina Kassengaliyeva from Think Big Analytics.

Their insights into Big Data projects were quite interesting and gave the attendees a lot of good real world insights into the status of Big Data at various companies.  Below are five things that really stuck out during their presentations and during my discussions with the other attendees.

  1. Companies are finding previously hidden information, unknown trends, and new ideas from their existing data that is paying off.  Big Data analytics is about asking new questions from new or existing data and taking another look at all your company’s Big Data from different perspectives.  Some companies are aggregating their transactions with additional weather, traffic, location, or competitor’s price information to gain new insights and draw more information from their existing data.
  2. Big Data projects are failing.  Big Data is different from previous large data mining or data warehouse exercises since some companies are analyzing completely new and different types of data stores.  These new data stores often contain unstructured data, constant sensor data, or other new forms of information previously unknown, never modeled, and never analyzed before.  Since all of these data types, modeling efforts, and data sources are new, it was estimated that only 2 out of 10 new Big Data projects are adding value.  Some projects are adding extraordinary costs in terms of personnel, massive expensive storage, and server farm infrastructures even while they are failing s.  But given the vendor hype and few stories of huge potential value, business continues to make Big Data a top business priority.
  3. It’s a good thing Big Data projects are failing.  By failing these Big Data projects are uncovering new data sources, different analytical ideas, and validating the tried and true data management standard, governance and security policies.  The information technology department’s discussions with the business people about the business profitability, and collaboration with the business are opening new IT business partnerships.  Everyone learns from failures and Big Data projects have a lot of new challenges.  The take-away advice was to fail faster and widely explore more business areas to uncover possible success situations.
  4. Start with the Big Data sources first and do the technology last.  The media and vendors are happy to help you think that the open source community NoSQL database choices are the only answer to Big Data analytics.  Nothing could be farther from the truth since your existing MPP or relational infrastructures can likely handle all the demands of your Big Data project.  Big Data needs data governance, masking, encryption, and secure protections.  Most of the new NoSQL databases don’t have any facilities for these efforts and certainly don’t have all the ACID characteristics close to the vetted, existing infrastructure solutions.  Companies that automatically think Big Data requires a new platform will have another new expensive problem silo project looking for a solution.
  5. Big Data requires data management expertise.  Most of the Big Data projects are being driven by the business.  Unfortunately, most of the business people driving the Big Data NoSQL databases are data management illiterate; don’t recognize the lack of NoSQL data management facilities that I mentioned earlier and don’t know anything about availability, referential integrity and normalized data designs.  When these Big Data projects get underway, security, privacy, technology and performance issues come up very quickly.  Having experienced data management functions, data design procedures, and protections in place are critical.

A conversation with an attendee told of a story about attending a recent NoSQL database conference where there was a big announcement of the new version the NoSQL database supporting and providing a new secondary index.  This announcement of a new secondary index got a standing ovation from the NoSQL conference attendees.  He said it was very curious that these NoSQL conference attendees would get excited about a secondary index.  Almost all experience database professionals know that secondary indexes have been around for over 30 years.  Maybe the reason a majority of Big Data projects are failing because they are using a DBMS that doesn’t have state of the art data management capabilities.

It was great hearing all the speakers and talking with all the attendees about the realities of Big Data projects.  Hopefully, these five observations will help your Big Data project avoid some of the problems.  ____________________________________________________
Dave Beulke is a system strategist, application architect, and performance expert specializing in Big Data, data warehouses, and high performance internet business solutions.  He is an IBM Gold Consultant, Information Champion, President of DAMA-NCR, former President of International DB2 User Group, and frequent speaker at national and international conferences.  His architectures, designs, and performance tuning techniques help organization better leverage their information assets, saving millions in processing costs.

________________________________________________________

Also I will present twice at the International DB2 Users Group in Barcelona.  The first presentation will be talking more about Big Data design considerations, the new BLU technology, Hadoop considerations, UNION ALL Views and Materialized Query Tables during my presentation at the International DB2 Users Group IDUG EMEA conference in Barcelona, Spain, October 13-17, 2013.  My speech is Wednesday October 16th at 9:45 “Data Warehouse Designs for Big Data” in the Montjuic room.

This presentation details the designing, prototyping and implementing a 22+ billion row data warehouse in only six months using an agile development methodology.  This complex analytics big data warehouse architecture took processes for this federal government agency from 37 hours to seconds.  For more information on the conference go to www.idug.org.

The second presentation is “Agile Data Performance and Design Techniques” Tuesday October 15, at 16:30-17:30.  Would you like to have enough time to look at all the DB2 database performance options and design alternatives?  In this presentation you will learn how Agile application development techniques can help you, your architects, and developers get the optimum database performance and design.

Learn how Agile development techniques can quickly get you to the best database design.  This presentation will take you through the Agile continuous iterations with releases that reflect the strategy of optimum database design and application performance.  Using these techniques you and your developers will be able to uncover the performance considerations early and resolve them without slowing down the Agile time boxing processes and incremental development schedule.  For more information on the conference go to www.idug.org.

__________________________________________________________

I will also be presenting at the Information on Demand (IOD) conference in Las Vegas November 3-7, 2013.  I will be presenting “Big Data Disaster Recovery Performance” Wednesday November 6, at 3 pm in the Mandalay Bay North Convention Center – Banyan D.

This presentation will detail the latest techniques and design architectures to provide the best Big Data disaster recovery performance.  The various hardware and software techniques will be discussed highlighting the Flash Copy and replication procedures critical to Big Data systems these days.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>