Big Data Analytics: Union ALL View Petabyte Performance

Last week’s example built a petabyte table by putting multiple tables under the definition of a UNION ALL View.  This tactic provides unlimited growth potential for the logical big data table to all types of regular and analytical users.  Unfortunately, as some commented, this UNION ALL view may need help to get performance in DB2.  While other DBMSs push down, prune and slice views based on the SQL WHERE clause DB2 needs additional information in the UNION ALL view definition to do predicate push down, match up and limit access to the appropriate table within the view.

Given our view definition from last week:

CREATE VIEW PROD.MST_PROD_MTR_ALL AS (
SELECT
PROD_ID,
DEPT_KEY,
METER_NBR,
METER_VALU,
PRS_VALU,
DGE_VALU,
SFT_VALU,
CRT_TS
FROM PROD.MST_PROD_MTR_2012
UNION ALL
SELECT
PROD_ID,
DEPT_KEY,
METER_NBR,
METER_VALU,
PRS_VALU,
DGE_VALU,
SFT_VALU,
CRT_TS
FROM PROD.MST_PROD_MTR_2013);

When we run a simple SQL statement scanning the entire view’s information we get the expected Explain results accessing both tables under the UNION ALL view.  The SQL needs to reference both of the entire tables under the UNION ALL View to get the data as the Explain graphic displayed below shows.

SELECT *
FROM PSIDB001.MST_PROD_MTR_ALL;

Explain for Big Data Business Analytics DB2 Performance 1

Even when we have a big data analytics SQL query WHERE clause that matches the table constraints criteria, the DB2 optimizer still goes through both of the base tables under the UNION ALL View as shown by the query and DB2 Explain below.

SELECT *
FROM PROD.MST_PROD_MTR_ALL
WHERE CRT_TS = ‘2012-03-01-00.00.00.000000’;

Big Data Business Analytics DB2 Performance 2

Creating another UNION ALL View with the table range constraints defined within the view definition provides the DB2 optimizer with additional information to optimize the query and only access the 2012 base table information.

CREATE VIEW PROD.MST_PROD_MTR_UAL_WCRT_TS AS (
SELECT
PROD_ID,
DEPT_KEY,
METER_NBR,
METER_VALU,
PRS_VALU,
DGE_VALU,
SFT_VALU,
CRT_TS
FROM PROD.MST_PROD_MTR_2012
WHERE ( CRT_TS >= ‘2012-01-01-00.00.00.000000’
AND   CRT_TS < ‘2013-01-01-00.00.00.000000’)
UNION ALL
SELECT
PROD_ID,
DEPT_KEY,
METER_NBR,
METER_VALU,
PRS_VALU,
DGE_VALU,
SFT_VALU,
CRT_TS
FROM PSIDB001.MST_PROD_MTR_2013
WHERE ( CRT_TS >= ‘2013-01-01-00.00.00.000000’
AND   CRT_TS < ‘2014-01-01-00.00.00.000000’));

This additional information helps the SQL optimization by matching the view criteria with the WHERE clause and provides enough information to restrict the additional access steps of referencing  the other base 2013 table underneath the UNION ALL view definition.

SELECT *
FROM PROD.MST_PROD_MTR_UAL_WCRT_TS
WHERE CRT_TS = ‘2012-03-01-00.00.00.000000’;

Big Data Business Analytics DB2 Performance 3

Coding daily INSERTs, UPDATEs and DELETEs should be done against the current base tables whenever possible (PROD.MST_PROD_MTR_2012) or a single table view that only references the current year such as the one below.  Referencing the single 2012 base table directly is always best as the DB2 Optimizer doesn’t have to push down, prune or materialize any additional WHERE clause predicates.

CREATE VIEW PROD.MST_PROD_MTR_CURR_YR AS (
SELECT
PROD_ID,
DEPT_KEY,
METER_NBR,
METER_VALU,
PRS_VALU,
DGE_VALU,
SFT_VALU,
CRT_TS
FROM PROD.MST_PROD_MTR_2012));

Using UNION ALL View technique is just another technique to have a single table view of all the data no matter how large. With unlimited petabytes of capacity, users will be able to reference the same object for many years using this view.

Also I want to say a big “Thank You” to all my readers.  It is nice to hear from all of you and know you support hearing about these maddening sometimes basic database design ideas.  As one person said it best with their comment “With new groups of management and developers, hearing about these types of issues reinforces the discussions and nicely explains techniques to improve performance.”

_____________________________________________________

Dave Beulke is an internationally recognized DB2 consultant, DB2 trainer and education instructor.  Dave helps his clients improve their strategic direction, dramatically improve DB2 performance and reduce their CPU demand saving millions in their systems, databases and application areas within their mainframe, UNIX and Windows environments.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>