Since I am president of the local Washington DC DAMA National Capital Region (NCR), I was fortunate to put the agenda together for our annual DAMA Day. Our DAMA Day this year featured the topic of Big Data and I was fortunate to convince Jeff Jonas to keynote at our DAMA Day event. Jeff and I met several years ago through the IBM Gold consultant program along at many International DB2 User Group and Information on Demand conferences and frequently discussed our big data processes and warehouses. If you are not familiar with Jeff, he is leading the industry with cutting edge research and real life implementations of big data for all types of real world applications. He also works with privacy groups to ensure that laws, protections and common sense are incorporated within these big data systems.
Jeff was kind enough to make time in his busy schedule and his keynote met all expectations, covering the considerations and experiences of processing big data. If you have read any of his past blogs, you’ll know he highlights the concept that the “data finds the data.” He posits that the data you already have helps you find and match up your data to achieve your big data goal. Whether it is marketing, medical research or management big data processing will drive the data context and the data analysis to correlate your data relationships and turn data value into results.
He also talked about sizing these big data systems and how they typically need more storage than designers originally think. This is because the big data system needs more working space to store all the possibilities first. As big data processing continues and correlates the data, the number of unique items is consolidated, storage needs shrink, and your big data value truly is discovered.
He also talked about how some big data initiatives are really outrageous saying that just because you have all this big data doesn’t really mean you can get valuable information from it. He said that you need to design, understand probabilities and analyze all your assumptions so that your processing can come to the correct big data context and draw the correct valuable conclusions.
Thinking about a magazine survey I read years ago about the hyped data warehousing projects that said 82% of all new data warehouse initiatives were failing, I wonder what the percentage of big data development projects will run into trouble. I guess Jeff can relate to some businesses trying to do too much because he said his next blog entry is going to be named “Fantasy Analytics.” If Jeff is involved I am sure they have a lot of experience to guide them away from fantasies to the realities of big data development of practical and realistic process for success.
(Check out Jeff’s blog. A link is in my BlogRoll in the bottom right corner.)
I look forward to supporting the DB2 community through the local DB2 User Groups.
I am coming to Dallas and Austin, Texas October 10th and 11th and look forward to presenting my “Agile Big Data Analytics: Implementing a 22 Billion Row Data Warehouse” and “Java DB2 Developer Performance Best Practices” speeches. Check the website www.db2forum.org for Dallas for more information. The Austin one should be updated soon.
I will be talking more about Big Data, UNION ALL Views and Materialized Query Tables during my presentation at the Information on Demand (IOD) conference in Las Vegas October 21st through 25th. Through my speech “Agile Big Data Analytics: Implementing a 22 Billion Row Data Warehouse” Monday, October 22, 10:15 – 11:15 am in the Mandalay Bay North Convention Center – Islander C. This presentation details the designing, prototyping and implementing a 22+ billion row data warehouse in only six months using an agile development methodology. This complex analytics big data warehouse architecture took processes for this federal government agency from 37 hours to seconds.
Also I look forward to supporting the International DB2 Users Group (IDUG) conference in Berlin, Germany November 5th-9th with two topics “Data Warehouse Designs for Performance” and “Java DB2 Developer Performance Best Practices” on Tuesday November 6th.
On December 4th 5th and 6th I will be presenting at the Minneapolis, Milwaukee and Chicago DB2 User groups.
Please come by any of these presentations and say, “Hi.”
Dave Beulke is an internationally recognized DB2 consultant, DB2 trainer and education instructor. Dave helps his clients improve their strategic direction, dramatically improve DB2 performance and reduce their CPU demand saving millions in their systems, databases and application areas within their mainframe, UNIX and Windows environments.