Five More SQL Performance Tips for your Big Data

I have talked about many DB2 SQL performance tips before (10 Performance Guidelines and  Five Big Data SQL Performance Tips). Dealing recently with tables with tens of billions of rows and crazy generated SQL from GUI interfaces has resulted in these five more SQL performance tips for your big data systems.  These five SQL performance tips leverage DB2 11’s improvements and can benefit all applications, not just Big Data systems.  Incorporate these five SQL performance tips and help out your DB2 databases’, systems’ and applications’ processing.

  1. Understand all SQL phrases that cause a SORT.  DB2 SQL workloads invoke sorts within their processing and it’s almost always an expensive SQL performance operation.  Sorts are especially performance killers when referencing Big Data tables that have billions of rows.   Sort requirements are determined by the DB2 optimizer when translating your SQL into an access path.  Certain SQL keywords within your query can cause sorts and need to be only be used when absolutely necessary.  If possible try to avoid using DISTINCT, ORDER BY, UNION, UNION ALL, and other SQL phrases that might invoke a DB2 SORT.  Remember to check your DB2 EXPLAIN of your SQL to discover and minimize the SORTs. Eliminate these SQL phrases and reference the data in the order that it is stored and sequenced in the database table definition to improve performance.

  2. Verify your SQL Scalar functions are index-able.  In DB2 11 many scalar functions highlighted below became index-able.  Unfortunately there is a huge number of scalar DB2 functions that can be used in SQL.  Verify that your scalar function is index-able through your DB2 EXPLAIN process to guarantee its SQL performance.  Once your system is migrated to DB2 11, these common scalar functions below will be index-able and will improve your overall performance.
    • value BETWEEN col1 AND col2
    • value BETWEEN column-expression AND column-expression
    • SUBSTR(COLX, 1, n) = value   **NOTE only from first position**
    • DATE(timestamp-column) = value
    • YEAR(date-column) = value
    • CASE WHEN THEN ELSE END = value
  1. Remove all math from the WHERE Clause if possible.  Math in WHERE clauses disrupts SQL performance for data retrieval most of the time.  When trying to retrieve data from the table, the math may cause a lot of performance overhead.  Depending on how the math is coded, the DB2 system may have to perform the calculation for every row examined which can be quite a SQL performance overhead.  Also, if the calculation data type’s comparison and/or precision needs to be translated or resolved, these extra operations can also negatively impact the SQL performance.  Pre-calculate math variables before they are used within the SQL statement WHERE clause when possible.  Also verify your WHERE clause variable data formats and data types against their comparison database columns because any mismatch can cause extra overhead and hurt your SQL performance.

  2. Understand the number of trips to the database.  Interactive web sites and nightly batch processes can all benefit from using multi-row SELECT and INSERT operations.  Using a multi-row SELECT can sometimes return all the customer’s data into the application’s memory providing a full view of all the data within the web application.  Robust batch processes can benefit from making fewer call to DB2 through multi-row INSERT processes.  Making fewer calls to the database minimizes the trips and improves performance.

  3. Minimize and optimize any WHERE ‘OR’ clauses.  DB2 SQL WHERE clauses can become quite complex with detailed criteria for retrieving the proper data.  Unfortunately the WHERE clause sometimes needs multiple AND and OR logic to get the data the application needs.  Especially when writing all the WHERE clause OR conditions make sure that each OR condition is index-able.  Sometimes the OR condition excludes other columns from being combined with index-able columns. If appropriate, the index-able columns need to be repeated within the OR portion of the WHERE clause.  Verify that the WHERE OR clause portion is index-able through the DB2 EXPLAIN process and rewrite where necessary, adding other columns within the OR block so that all of your SQL WHERE clause criteria continues to be index-able.

Writing efficient index-able WHERE clauses makes sure your DB2 SQL performance is as efficient as possible.  Use these five SQL performance tips and the other tips that I written about in my blog to improve your application and overall DB2 processing CPU efficiency and I/O performance.

____________________________________________________

Dave Beulke is a system strategist, application architect, and performance expert specializing in Big Data, data warehouses, and high performance internet business solutions. He is an IBM Gold Consultant, Information Champion, and President of DAMA-NCR, former President of International DB2 User Group, and frequent speaker at national and international conferences. His architectures, designs, and performance tuning techniques help organization better leverage their information assets, saving millions in processing costs.

Here’s a list with links to other posts that may help you improve your DB2 performance, security, and ease-of-use:

5 More DB2 SQL Performance Tips
Another 5 More DB2 SQL Performance Tips
5 Big Data SQL Performance Tips – Fixing Generated SQL
DB2 11 SQL Performance Improvements
5 More SQL Performance Tips for your Big Data
Hadoop SQL: 4 Reasons Why BigInsights Is the Best
Vital Java DB2 SQL Performance Considerations
More DB2 Family Security Best Practices Part 7: Preventing SQL Injection
DB2 SQL Security Audits

_______________________________________________________

Sign up for the IDUG DB2 Technical Conference in Phoenix, Arizona this May 12-16th.

  • Sign up for dinner with me during the IDUG Dine Around Dinner Thursday night at IDUG.
  • Also plan on attending my presentation 1221-F04 “Big Data Disaster Recovery Performance,” Tuesday 4:30 at the IDUG conference in Phoenix.
  • The IDUG European conference is still accepting presentation abstracts to be potentially picked for the IDUG EU conference in Prague, Czech Republic November 9-14th.

For more details on any of these items go to www.idug.org.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>