Big Data Analytics: Leveraging DB2 Functions

When building a large data warehouse with billions of rows any extra processing is magnified exponentially.  Any inefficiency quickly becomes a bigger performance problem as it is multiplied by each of the billions of rows.

To squeeze every performance drop out the big data standard, centralized processes or built-in functionality always helps the team focus on every function so they can be optimized for maximum efficiency.

One of the best ways to get optimum performance out of any DBMS, especially DB2, is by leveraging functionality that is built into the database.  This helps because of two main reasons: first it is a built-in function that has had the best and brightest DB2 developers analyze and optimize its performance.  The second reason is because the function is embedded into DB2 and executes automatically at DB2’s higher priority within the overall system and does not require an outside of the database call, or a call to an outside program or a call across the network.  Built-in functionality always performs better. When you are working with the multiplication factor of big data, all those little performance related items add up.

Most database table designs have processing that maintains when the rows are inserted and updated.  Since the processing can be handled a variety of ways the following details how this can be done automatically within DB2 and be standardized, centralized and efficient as possible.

CREATE TABLE PROD.MST_PROD_MST_2012 (
PROD_ID   INTEGER NOT NULL WITH DEFAULT 0,
DEPT_KEY SMALLINT NOT NULL WITH DEFAULT 0,
METER_NBR INTEGER NOT NULL WITH DEFAULT 0,
METER_VALU INTEGER NOT NULL WITH DEFAULT 0,
PRS_VALU  CHAR(1) NOT NULL WITH DEFAULT ‘*’,
DGE_VALU  CHAR(1) NOT NULL WITH DEFAULT ‘*’,
SFT_VALU  CHAR(1) NOT NULL WITH DEFAULT ‘*’,
CRT_TS    TIMESTAMP NOT NULL
WITH DEFAULT ‘1900-01-01 01:01:01.000000’)
IN PROD.MSTTS001;

So using our big data table definition from the previous weeks the CRT_TS timestamp column has a default of “1900-01-01 01:01:01.000000.”  This is common default definition for auditing and easily identifying the database table rows added through non approved or non-standard processes.  If you want to lock down the timestamp column DB2 provides the keywords GENERATED ALWAYS FOR EACH ROW ON UPDATE AS ROW CHANGE TIMESTAMP phase.  This timestamp definition, as the phase says, generates the value for timestamp during INSERT and UPDATE operations.  This eliminates any programming and any additional calls to routines to get the current timestamp.  By always generating the timestamp value, the timestamp is locked down from being manipulated directly within any application SQL statement, guaranteeing compliance for any audit situation.

Additionally, if the column needs to be almost hidden the IMPLICITLY HIDDEN keyword can be used to hide the column from all the application programming. Using this keyword allows the column to be in the database table, but won’t allow any direct SQL manipulation via an INSERT or UPDATE statement. Instead DB2 will always generate a value for the CRT_TS column. SQL such as SELECT * FROM PROD.MST_PROD_MST_2012 will not return the CRT_TS column and it can only be seen when it is explicitly named in a SELECT statement which again is great for auditing and compliance tracking.  If an application tries to INSERT with a value for the CRT_TS timestamp column the application will get:

DSNT408I SQLCODE = -117, ERROR:  THE NUMBER OF ASSIGNED VALUES IS NOT THE SAME AS THE NUMBER OF SPECIFIED OR IMPLIED COLUMNS

And if the application tries to modify the CRT_TS timestamp column with an UPDATE the application will get:

DSNT408I SQLCODE = -798, ERROR: A VALUE CANNOT BE SPECIFIED FOR COLUMN CRT_TS WHICH IS DEFINED AS GENERATED ALWAYS

Using these methods changes our big data table definition to the following.

CREATE TABLE PSIDB088.MST_PROD_MST_2012 (
PROD_ID   INTEGER NOT NULL WITH DEFAULT 0,
DEPT_KEY SMALLINT NOT NULL WITH DEFAULT 0,
METER_NBR INTEGER NOT NULL WITH DEFAULT 0,
METER_VALU INTEGER NOT NULL WITH DEFAULT 0,
PRS_VALU  CHAR(1) NOT NULL WITH DEFAULT ‘*’,
DGE_VALU  CHAR(1) NOT NULL WITH DEFAULT ‘*’,

SFT_VALU  CHAR(1) NOT NULL WITH DEFAULT ‘*’,
CRT_TS    TIMESTAMP NOT NULL GENERATED ALWAYS
FOR EACH ROW ON UPDATE AS ROW CHANGE TIMESTAMP
IMPLICITLY HIDDEN)
IN PROD.MSTTS001;

Sometimes the database table doesn’t need to be locked down that much.  If that is the case you can substitute the GENERATE ALWAYS syntax for GENERATED BY DEFAULT which allows manipulation of the CRT_TS column by directly referencing the column.  Either definition provides improved performance by having the DB2 database engine generate the value needed for the column without calling any extra routine, improving performance that extra little bit for each and every one of the billions of rows within our big data petabyte table.

_______________________________________________________

I look forward to supporting the DB2 community through the local DB2 User Groups.  I will be at the St. Louis DB2 Users Group September 11 presenting my “Performance Discoveries and Recommendation” speech detailing the many system, database, application and SQL performance issues and discoveries I have found at clients over the many years of DB2 zOS and DB2 LUW performance health checks.

Also I am finalizing coming to Dallas and Austin, Texas October 10th and 11th and look forward to presenting my “Agile Big Data Analytics: Implementing a 22 Billion Row Data Warehouse” and “Java DB2 Developer Performance Best Practices” speeches.

I will be talking more about Big Data, UNION ALL Views and Materialized Query Tables during my presentation at the Information on Demand (IOD) conference in Las Vegas October 21st through 25th through my speech “Agile Big Data Analytics: Implementing a 22 Billion Row Data Warehouse” Monday, October 22, 10:15 – 11:15 am in the Mandalay Bay North Convention Center – Islander C.  This presentation details the designing, prototyping and implementing a 22+ billion row data warehouse in only six months using an agile development methodology.  This complex analytics big data warehouse architecture took processes for this federal government agency from 37 hours to seconds.

Also I look forward to supporting the International DB2 Users Group (IDUG) conference in Berlin, Germany November 5th-9th with two topics “Data Warehouse Designs for Performance” and “Java DB2 Developer Performance Best Practices” on Tuesday November 6th.

Please come by any of these presentations and say, “Hi.”

______________________________________________________

Dave Beulke is an internationally recognized DB2 consultant, DB2 trainer and education instructor.  Dave helps his clients improve their strategic direction, dramatically improve DB2 performance and reduce their CPU demand saving millions in their systems, databases and application areas within their mainframe, UNIX and Windows environments.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>