3 Critical Programming Performance Criteria

The new year always presents great opportunities after the holiday rush and code freeze-thaw of the holiday year-end.  Unfortunately the code freeze-thaw can present wonderful opportunities for growth.  Recently these three critical factors came to light during the analysis of performance aspects of the new version of a website.

  1. There must always be a process commit scope.

    No matter how big or small the system, its (z/OS or UNIX) memory, or the amount of data being processed, there needs to be an understanding of a processing commit scope.  A commit scope is critical because the processing will always find some new way to fail.  

    This is especially important in the new big data analytics world we live in because one in a million happens a thousand times when you are processing a billion transactions.  It can be the process’s fault or some other new process can run wild, stomp on memory or storage, and break the system or database ruining everyone’s holiday break.

    Having a commit scope built into your process and making sure that the processing restart logic is tested to restart the process needs to be validated during development and verified during production implementation.  Without this type of testing, your big data analytic processing can fail and leave your database in a recovery situation.
     
    2)    Data integrity is most important.

    Debates about whether to implement database or application based referential integrity (RI) will continue to be discussed.  RI is especially important now that Hadoop systems that have no database referential integrity mechanisms are being brought into the company enterprise environments.  Depending on your company’s standards, history, and application programming expertise, your decision on how best to implement RI should continue to be debated for each new system and each proposed database environment.

    My opinion is that RI should be defined into the database table schema and enforced/checked for all database SQL Insert, Update, and Delete operations.  The performance of these embedded DB2 RI functions are optimized for performance, and if their tiny bit of overhead is too much, then only the RI definitions can be used with DB2 utilities to easily inspect the data relationships.  The reason I believe RI should be fully defined in the database is because I’ve seen application developers change or remove RI related code.  The database schema definitions are more permanent than the application code that will be altered through enhancements, changes, or performance tuning.  All of these application changes may remove the RI code checking.  Either schema defined RI or application code RI needs to be monitored to maintain the database relationships and integrity.

    Since Hadoop has no built in database RI facilities, implementing application based RI can be more complex and may require more performance and processing.  Hadoop needs external RI processes or other RI checks and balances to verify data relationships.  These RI application processes are also more complex because of the volume and unstructured or streaming characteristics of the Hadoop data.

    Your data integrity is guaranteed in DB2, and this is another reason performance is enhanced through these built-in DB2 facilities over Hadoop add-on application processes.  DB2 can handle all types of data management requirements efficiently where Hadoop will require extra processing and applications to balance and reconcile your big data. So DB2 can handle all and any requirements whether it’s RI, unique constraints, check constraint/condition, functional dependency, or global variables. DB2 has all the capabilities built in to serve every business requirement and related RI situations.  A good discussion with definition examples of all these RI type DB2 features can be found here on IBM developer works.  

    3)    Languages are not programming and don’t guarantee performance.

    With the proliferation of new programming languages, especially for web site development, there is a broader and richer landscape for programmers so they can choose the right tool for the functional situation.  Performance in some of these new languages is better than others, and for critical processes performance is always paramount.  

    Avoid the temptation of these new languages for performance sensitive functions.  While the new programmers may be able to code the function quickly, its performance may suffer.  Some of the new languages are built through open source, on top of other languages or use runtime interpreted code functionality.  All may be good for ease of programming but may not be best for that critical performance function that is done millions or billions of times a day.

    Verification of the performance of these new languages is sometimes difficult, and your performance analysis needs to be prepared to isolate the new languages different functionality into sections.  Even the performance tracking of Java code continues to baffle some developers when tools are available to understand the garbage collection, memory footprint, and CPU utilization.  Adjusting these types of Java program considerations is difficult enough, but getting new programming language in-depth performance details or making the Java types of adjustments in these new languages can also be a considerable amount of complex work.  Of course the performance issue will be fixed in the next release of the open source.

Application developers will continue to have commit scope issues in new open source languages with databases with no RI defined.  Unfortunately the DBA and or the systems people will have to help them fully recover the situation and rerun their process until it works.  Try to make sure your development teams handle these issues BEFORE problems occurs.  Everyone will be happier, especially your customers who want to use the new mobile application.


Dave Beulke is a system strategist, application architect, and performance expert specializing in Big Data, data warehouses, and high performance internet business solutions. He is an IBM Gold Consultant, Information Champion, President of DAMA-NCR, former President of International DB2 User Group, and frequent speaker at national and international conferences. His architectures, designs, and performance tuning techniques help organization better leverage their information assets, saving millions in processing costs.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>