Big Data Analytics: Union ALL View Building a Petabyte Table

There are many companies that are realizing business insights through new big data analytics.  Many are finally looking at data that they only once dreamed of analyzing because of the continuing decline in the cost of storage and CPU as well as the availability of cloud computing resources.  The amount of data and the number of questions that companies want to analyze only continues to increase.  Even the DB2 team for LUW recently built a 3 petabyte data warehouse that was featured in Guinness Book of World Records. Also with Oracle doing nothing to fix their security problems and lying about their technology in the press, DB2 is the place to develop your Big Data Analytics project.

The last two weeks I have talked about the maddening basic issues that seem to get glossed over, ignored or missed in the rush to get these new big data systems set up and questions answered.  But doing the basics paid off tremendously for building this database by saving 57% of the storage costs for the system.  Also the I/O savings over the lifetime of the system will be huge because from the very beginning of the space estimates that were done the team knew that we were building a multi-petabyte table to store many years of data and created the design accordingly.  Below is a UNION ALL View design technique that works out very nicely for creating petabyte or even multi-petabyte tables on any DB2 platform.

Our physical design from last week’s blog shows our base table create statement.  One of the ways to expand the capacities of DB2 tables on both DB2 for zOS and DB2 for LUW is to create multiple tables and put them together and create a logical single table through a UNION ALL View.  This technique provides many flexible options for each of the tables within the UNION ALL View for clustering, indexing, partitioning and storage control.

With the example below, every year’s table could be partitioned differently, the current year could have many more indexes, or each year’s objects could be assigned to a different storage group.  Also, each table could hold data up to its DB2 platform and DB2 software release storage limit, allowing the logical table unlimited capacity.

Another advantage of UNION ALL View is administration of the data.  If structure changes are needed for the underlying tables a new table could be created and then the UNION All View could be redefined with the new table definition.  This would minimize downtime and switching to a new table structure could be done quickly allowing the old data to be restructured whenever desired.

CREATE TABLE PROD.MST_PROD_MTR_2012 (
  PROD_ID   INTEGER NOT NULL WITH DEFAULT 0,
  DEPT_KEY SMALLINT NOT NULL WITH DEFAULT 0,
  METER_NBR INTEGER NOT NULL WITH DEFAULT 0,
  METER_VALU INTEGER NOT NULL WITH DEFAULT 0,
  PRS_VALU  CHAR(1) NOT NULL WITH DEFAULT ‘*’,
  DGE_VALU  CHAR(1) NOT NULL WITH DEFAULT ‘*’,
  SFT_VALU  CHAR(1) NOT NULL WITH DEFAULT ‘*’,
  CRT_TS    TIMESTAMP NOT NULL
      WITH DEFAULT ‘1900-01-01 01:01:01.000000’,
  CONSTRAINT P_KEY_1 PRIMARY KEY (CRT_TS))
         IN D900551D.G0551D01;

  COMMIT;

–==============================================================
— INDEX:
–==============================================================
   CREATE UNIQUE INDEX PROD.X2012001
   ON  PROD.MST_PROD_MTR_2012 (
      CRT_TS ASC
   )
   CLUSTER
    USING STOGROUP PSISG
       PRIQTY 720
       SECQTY 360
       ERASE NO
   FREEPAGE 0
   PCTFREE 10
   GBPCACHE CHANGED
   BUFFERPOOL BP1
   CLOSE YES
   ;

 ALTER TABLE PROD.MST_PROD_MTR_2012
 ADD CONSTRAINT
 RANGKEY1 CHECK ( CRT_TS > ‘2012-01-01-00.00.00.000000’
            AND   CRT_TS < ‘2013-01-01-00.00.00.000000’) ;

–==============================================================
— DROP   TABLE PROD.MST_PROD_MTR_2013;
—     COMMIT;
–==============================================================

 CREATE TABLE PROD.MST_PROD_MTR_2013 (
  PROD_ID   INTEGER NOT NULL WITH DEFAULT 0,
  DEPT_KEY SMALLINT NOT NULL WITH DEFAULT 0,
  METER_NBR INTEGER NOT NULL WITH DEFAULT 0,
  METER_VALU INTEGER NOT NULL WITH DEFAULT 0,
  PRS_VALU  CHAR(1) NOT NULL WITH DEFAULT ‘*’,
  DGE_VALU  CHAR(1) NOT NULL WITH DEFAULT ‘*’,
  SFT_VALU  CHAR(1) NOT NULL WITH DEFAULT ‘*’,
  CRT_TS    TIMESTAMP NOT NULL
      WITH DEFAULT ‘1900-01-01 01:01:01.000000’,
  CONSTRAINT P_KEY_1 PRIMARY KEY (CRT_TS))
         IN D900551D.G0551D01;

  COMMIT;

–==============================================================
— INDEX:
–==============================================================
   CREATE UNIQUE INDEX PROD.X2013001
   ON  PROD.MST_PROD_MTR_2013 (
      CRT_TS ASC
   )
   CLUSTER
    USING STOGROUP PSISG
       PRIQTY 720
       SECQTY 360
       ERASE NO
   FREEPAGE 0
   PCTFREE 10
   GBPCACHE CHANGED
   BUFFERPOOL BP1
   CLOSE YES
   ;

 ALTER TABLE PROD.MST_PROD_MTR_2013
 ADD CONSTRAINT
 RANGKEY1 CHECK ( CRT_TS > ‘2013-01-01-00.00.00.000000’
            AND   CRT_TS < ‘2014-01-01-00.00.00.000000’);

  CREATE VIEW PROD.MST_PROD_MTR_ALL AS (
    SELECT
       PROD_ID,
       DEPT_KEY,
       METER_NBR,
       METER_VALU,
       PRS_VALU,
       DGE_VALU,
       SFT_VALU,
       CRT_TS
FROM PROD.MST_PROD_MTR_2012
    UNION ALL
    SELECT
       PROD_ID,
       DEPT_KEY,
       METER_NBR,
       METER_VALU,
       PRS_VALU,
       DGE_VALU,
       SFT_VALU,
       CRT_TS
FROM PROD.MST_PROD_MTR_2013);

What is also good about the UNION ALL View structure is that the DB2 Optimizers on DB2 for zOS and DB2 for LUW are smart enough to only access the proper base table(s) when the SQL includes the CRT_TS in the WHERE clause.  This SQL information gets evaluated and DB2 quickly prunes the unnecessary base tables underneath the UNION ALL View from the DB2 access path.

Unfortunately, there are some considerations for the UNION ALL View for a large single table configuration.  First, the application would need to know the underlying base table to Insert, Update and Delete during the course of the year.  Also, if you needed to keep PROD_ID uniqueness within all the base tables under the UNION ALL View logical table, INSTEAD Triggers would be needed.  The INSTEAD Triggers would use SELECT SQL statements to verify the data value against the base tables.  This overhead can be avoided by letting all SELECT activity go against the UNION ALL View and having all the updates, inserts and deletes go directly against the base tables.

By using this UNION ALL View technique, many more base tables for additional years can be put under the View.  Each new year can have separate and customizable settings allowing for all types of performance tuning for partitioning, clustering, indexing and storage management making the UNION ALL View to be as many 100s of petabytes necessary.

_______________________________________________________

Dave Beulke is an internationally recognized DB2 consultant, DB2 trainer and education instructor.  Dave helps his clients improve their strategic direction, dramatically improve DB2 performance and reduce their CPU demand saving millions in their systems, databases and application areas within their mainframe, UNIX and Windows environments.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>