Big Data: Five Simple Database Design Performance Tips

Doing performance consulting, database system and application tuning exposes me to many different database designs. There are many different aspects to properly analyze a database schema design.  The following five simple database design performance tips are the first five critical aspects that can be quickly analyzed and evaluated especially in the early stages of development.

 

  1. Is the database design normalized?  Database normalization using at least third normal form and maybe up to fifth normal form is the starting critical evaluation criteria.  The reason the database design normalization processes have been endorsed forever is because they are effective for identifying all the insert, update and delete data anomalies and support the integrity of the application data. 

    Look at the number of tables and the unique keys within each of the tables.  Too many tables with the same unique key(s) can be a sign of over normalization.  Over normalization can be a database design performance killer.  If your team doesn’t know about database design normalization search the web; there are many great deep resources.

  2. Do the database design tables leverage the business’ natural keys?  Good database design leverages, accesses and fully filters using the natural keys within the business.  Make sure to confirm these natural business keys are used within the database design.  Too often recently with some of the new Big Data NoSQL databases or some of the Object Relational Mapping products being used for web development, a ROWID, SEQUENCE or IDENTITY column is being used or required.  These types of keys will limit the database integration within the rest of the business environment and are usually detrimental for getting good SQL application performance by requiring a business key to ROWID translation for every database activity.

  3. Does the database have good column definitions?  Database column names and definitions are going to be used for as long as the database will be active so confirm the proper names, abbreviations and short standard names are used for your column names.  Also analyze the database column data type definitions for proper business use, domain and range.  Having all your columns as a single data type such as VARCHAR, CHAR or XML does not reflect the business diversity of data.

    Even though column definitions can be easily changed, make sure to reflect the numeric type and range of values so that the proper DECIMAL, SMALLINT, INT or BIGINT is used.  Having a database design with tables that are defined with good numeric business centric keys is preferred if possible.

  4. Are all the columns NULLable?   Within the database columns definitions good data domains, ranges and values should be analyzed, evaluated and prototyped for the business application.  Having good default values, a limited scope of values and always a value are best for performance and application logic.  NULLable columns are only good when data is unknown or doesn’t have a value yet.  Someone’s death date data is the classic example of a NULLable column because it is unknown unless they are already dead.  Make sure your database design represents data that is known and only uses a minimum of NULLable columns.
  5. Do you leverage the correct amount of Referential Integrity (RI)?  I endorse RI defined within the database schema definitions because it enforces the business policies, is usually more efficient than application or multiple applications enforcing it and database defined RI will always be there years later to validate the data relationships. 

    RI needs to be defined within the database schema during the development phase of the application so the coders can understand the RI and how to efficiently minimize the RI overhead by referencing and performing all operations in the proper RI sequence. 

    The RI should be developed to the correct depth of the business application and the various transactions within the schema.  Good RI database structures can usually be built within five to ten levels.  Using over ten levels of RI usually leads to database performance issues, indicates table design over normalization or business transactions with a scope requires too many tables.  

These are only five simple database design performance tips and there are many more that I thought about while writing this article.  Use these first five database design performance tips as the starting criteria for your next database development and your system will be designed with a good perspective on performance. 

_____________________________________________________

Have you made your plans for IDUG in Orlando this year?  Also make sure to register early and get the IDUG early bird discount. Sign up today!

I look forward to speaking at the IDUG DB2 Tech Conference 2013 North America conference.  The conference will be held in Orlando, Florida on April 29-May 2, 2013.  Get more information at www.idug.org.

I will be speaking at the conference presenting Big Data and Analytics Session F07 – “Data Warehouse Designs for Big Data Performance” Wed, May 01, 2013 (02:15 PM – 03:15 PM) in Bonaire 5&6.
_____________________________________________________
Dave Beulke is an internationally recognized DB2 consultant, DB2 trainer and education instructor.  Dave helps his clients improve their strategic direction, dramatically improve DB2 performance and reduce their CPU demand saving millions in their systems, databases and application areas within their mainframe, UNIX and Windows environments.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>