Content Analytics: Four techniques to make your big data project a success

Would you start a trek through a jungle without a map or GPS? Many companies are attempting content analytics with their new big data, petabytes of data generated from the web, mobile devices and other connections without a plan. These companies are trying to leverage their big data, but getting lost in billions of rows and petabytes of information. As companies begin to try to leverage analytics against their big data content, they need to have detailed information about techniques to deal with the amount of big data, its unique content and the analytic processes that will drive business profitability.

First, understand and measure the amount of big data that your company is talking about pushing through the content analytical process. Processing big data on a daily, weekly or any regular routine schedule takes capacity planning, large storage requirements and processing power. Big data is input in many forms and processing billions or 100s of millions of rows daily can become a storage management nightmare. Plan for handling multiple instances of your big data sources to determine your storage requirements before your storage is overwhelmed.

Next, determine the level of data quality within your big data. Data quality problems are rarely caused by electronic sensors, but the data quality of social media or unstructured data can be very uneven and need major cleansing and/or analytics analysis. “One in a million” happens a thousand times in a billion data rows and big data sometimes requires dealing with 100s of billions of rows. So be ready for data quality issues, determine your data sources, their data quality reliability, the amount of data cleansing to perform and the level of effort required to clean up the big data.

Next, determine what big data golden nugget will be uncovered. These days big data or petabytes of data are easy to come by. Web pages, their clicks, electric sensors, orders and their details and social media about your company’s products are only a few sources companies are using in their content analytics. In-depth knowledge of the data, understanding its original input sources, context and unique qualities are the critical items that provide value through the analytical process. Knowledge of these items is paramount for mining value from the data. Favorable or unfavorable sediment from big data can help your company change its products or business processes before the competition. Understand, evaluate and communicate the value of your big data findings.

Finally, and most important know the business value of the big data content analytics. What is the value of offering a product first? What is the content analytics process targeting to uncover; what unique insight will benefit the company most? What is the answer’s value to the company? What will happen to the bottom line if you know you need to change products offered earlier in the sales cycle, discover a unique trend or idea or uncover manufacturing problems early based on knowledge gleaned from big data? The value proposition for the content analytics of the big data is critical and having the return on investment story early is vital.

So use these ideas and follow these techniques to help lead your company through the new jungle of big data content analytics.


I will be speaking at the Heart of America DB2 Users Group on March 12 in Overland Kansas. (

I will be doing my new speech: Agile Big Data Analytics: Implementing a 22 Billion Row Data Warehouse

This presentation discusses the design, architecture, meta-data, performance and other experiences of building a big data and analytics data warehouse system. You will learn through this presentation the real life issues, agile development considerations, and solutions for building a data warehouse of 22+ billion rows in only six months.

This presentation will help you understand techniques to manage, design and leverage the big data issues for a more in-depth understanding of your business. Also the agile development processes will be detailed, showing how to uncovered complex analytics requirements and other issues early in the development cycle. This presentation will help you understand all these experiences that took processes from 37 hours to seconds so you can create a successful big data design and scalable data warehouse analytic architecture.

Also I am beginning to plan my Regional DB2 User Group support for 2012 year. Please send an email to me at moc.ekluebevadnull@evad if you would like me to come and speak or offer a DB2 class at your local user group.

I look forward to speaking at the IDUG DB2 Tech Conference 2012 North America conference. The conference will be held in Denver, Colorado on May 14-18, 2012. Sign up today at


Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>