Big Data Analytics: DB2 Data Types

Building a big data warehouse with billions of rows always needs an extra bit of attention to get the best performance out of the database processing.  Application programming is always interesting as the efficiency of the code and the DB2 data types handled by the application code are critical for data quality and integrity.

With our design from last week utilizing DB2 Date Functions we were able to optimize, centralize and standardize the date processing.  Working through the big data database definitions and realizing that the requirements show that the METER_NBR will continue to expand over the lifetime of the system the INTEGER definition is not going to be sufficient and the data type definition needs to be changed to BIGINT to handle the billions of rows.  The INTEGER data type maximum value 2,147,483,647 is only 2.1 billion and having projections of hundreds of billions of rows or trillions over the lifetime of the system the maximum value needs to be defined as a BIGINT data type with the 9,223,372,036,854,775,807 that is Quintillion – 10 to the 18th power limit.

Using these new expanded BIGINT data type number changes our big data table definition to the following.

CREATE TABLE PSIDB088.MST_PROD_MST_2012 (
 PROD_ID   INTEGER NOT NULL WITH DEFAULT 0,
 DEPT_KEY SMALLINT NOT NULL WITH DEFAULT 0,  
 METER_NBR BIGINT NOT NULL WITH DEFAULT 0,
 METER_VALU INTEGER NOT NULL WITH DEFAULT 0,  
 PRS_VALU  CHAR(1) NOT NULL WITH DEFAULT ‘*’,
 DGE_VALU  CHAR(1) NOT NULL WITH DEFAULT ‘*’, 
 SFT_VALU  CHAR(1) NOT NULL WITH DEFAULT ‘*’,
 CRT_TS    TIMESTAMP NOT NULL GENERATED ALWAYS 
 FOR EACH ROW ON UPDATE AS ROW CHANGE TIMESTAMP
 IMPLICITLY HIDDEN)
         IN PROD.MSTTS001;

The changes and data migration are easily handled since the numeric data type flows into the bigger BIGINT data type.  The Java programmers, on the other hand, thought that no changes were necessary within their frameworks and code since it is just a bigger number data type.  Unfortunately the Java application codes needs to reference a bigger Java data type to handle the BIGINT.  The data type code changes reverberated through the application changing the Java application from using a java.lang.Integer to the correct java.lang.Long data type for BIGINT for the database table row.

So remember to have your application frameworks and the application code checked to verify the proper handling and matching of the DB2 data type to the Java data types.  Understanding the data integrity issues along with the lack of error handling common within Java applications always makes for an interesting day.

___________________________________________________________

I look forward to supporting the DB2 community through the local DB2 User Groups.  I will be at the St. Louis DB2 Users Group September 11 presenting my “Performance Discoveries and Recommendation” speech detailing the many system, database, application and SQL performance issues and discoveries I have found at clients over the many years of DB2 zOS and DB2 LUW performance health checks.

 

Also I am coming to Dallas and Austin, Texas October 10th and 11th and look forward to presenting my “Agile Big Data Analytics: Implementing a 22 Billion Row Data Warehouse” and “Java DB2 Developer Performance Best Practices” speeches.

I will be talking more about Big Data, UNION ALL Views and Materialized Query Tables during my presentation at the Information on Demand (IOD) conference in Las Vegas October 21st through 25th.  Through my speech “Agile Big Data Analytics: Implementing a 22 Billion Row Data Warehouse” Monday, October 22, 10:15 – 11:15 am in the Mandalay Bay North Convention Center – Islander C.  This presentation details the designing, prototyping and implementing a 22+ billion row data warehouse in only six months using an agile development methodology.  This complex analytics big data warehouse architecture took processes for this federal government agency from 37 hours to seconds.

Also I look forward to supporting the International DB2 Users Group IDUG conference in Berlin, Germany November 5th-9th with two topics “Data Warehouse Designs for Performance” and “Java DB2 Developer Performance Best Practices” on Tuesday November 6th.

On December 4th 5th and 6th I will be presenting at the Minneapolis, Milwaukee and Chicago DB2 User groups.

Please come by any of these presentations and say, “Hi.”

___________________________________________________________

Dave Beulke is an internationally recognized DB2 consultant, DB2 trainer and education instructor.  Dave helps his clients improve their strategic direction, dramatically improve DB2 performance and reduce their CPU demand saving millions in their systems, databases and application areas within their mainframe, UNIX and Windows environments.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>