DB2 Temporal Table: Table Space Clustering Design for Your DB2 Data Warehouse

Table Space Clustering Design for Your DB2 Data Warehouse

Last week we talked about the partitioning choices, options and considerations. This week we’ll talk about the other important aspect of table space design, clustering. The clustering of the data can have a huge impact on the overall flow and locking within any table in a DB2 data warehouse and especially our DB2 temporal table.

The first consideration for your DB2 table space clustering design is to determine the biggest workload within your DB2 data warehouse table environment. Is the table mainly a reference dimension table or is it a fact table with major or minor amounts of changes, inserts or deletes. The DB2 data warehouse dimension tables’ change activities usually are minor factors, but still need to be clustered to facilitate efficient access to the fact table data. Usually the biggest workload within any DB2 data warehouse is the query workload against the main DB2 data warehouse fact tables. These fact tables provide the details, summaries and totals for the entire business intelligence workload against the DB2 warehouse.

To cluster the DB2 data warehouse fact tables appropriately, the different types of workloads such as data loading, archiving process, SQL queries and their answers need to be analyzed. The clustering should be focused on the biggest workload in elapsed time and CPU activity. The most labor-intensive processing in a DB2 data warehouse can sometimes be the loading of the data, but in most cases the biggest workload is the DB2 SQL query workload.

These application DB2 SQL queries along with the update, insert and delete processes all need to reference the data efficiently. For DB2 SQL queries within the DB2 data warehouse the data sometimes needs to be sorted through an ORDER BY SQL clause. The natural sequence of these activities should be researched and their order noted for potential as the clustering sequence. Research all, yes all, of the DB2 SQL queries’ ORDER BY requirements and understand the uses of the data within your DB2 data warehouse. Are the results sets retrieved via order number and order date? Or do the applications get the data by department type and then department number to understand the best and worst selling departments quickly. All of the applications should be researched and the different percentages of the overall workload their ORDER BY requirements determined within the application. For a DB2 data warehouse the large read activity usually does not present locking considerations, but if there are constant insert and update processes running against the DB2 data warehouse the proper clustering of your data can greatly reduce deadlocking within your database since all the data is referenced in the same order and direction within the database table.

Another performance advantage of matching the clustering to all of our DB2 temporal table SQL queries ORDER BY clauses is that the system doesn’t need to sort the data and can avoid this extra processing within your DB2 data warehouse. Since the biggest workload results sets are via order number and order date, clustering this way will eliminate the extra sorts and have the most positive performance impact. Having the insert, update, and delete processing improved to reference the DB2 temporal table this same way can help minimize database deadlocking. By having the clustering in the DB2 temporal table design this way and the insert, update, and delete processes referencing the data in the same direction, deadlocks are minimized and the buffering of the data all flows the same direction.

So research and analyze your various DB2 temporal table workloads and find the biggest processing against each of your database table space tables within your DB2 data warehouse. Then design the table space table clustering to minimize sorting of extra I/Os and minimize your deadlocks within your environment.

 

Come to my presentation at International DB2 Users Group – IDUG Conference in Anaheim May 3-9, 2011.

If you are or someone at your company is attending the IDUG NA Conference in Anaheim, Dave will give his “Database Performance Discoveries and Recommendations” presentation on Tuesday, May 3rd at 4:30-5:30.

Come to my presentation at the Information on Demand – IOD Conference in Montpellier, France July 6-9, 2011.

If you are or someone at your company is attending the IOD-EU Conference in Montpellier, France Dave will give his “Database Performance Discoveries and Recommendations” presentation and maybe his “Java DB2 Developer Performance Best Practices Part 2” presentation.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>