Data Pedigree Analysis Eliminates Data Sprawl

The sports commentator of the Triple Crown horse races detailed the parents of the horse, its grandparents and great grandparents and even great-great grandparents. I quickly understood the pedigree and heritage of the horses, their racing wins and very impressive successes. Horse racing pedigrees got me to thinking about a recent CIO conversation and the data sprawl and data pedigree problems IT is facing.

At a recent client engagement the CIO said, “We have over 442+ applications on disparate platforms. Many redundant systems in Windows, UNIX and legacy systems and we need new applications with the same unique/duplicate data with complex relationships to solve new business issues. “

This statement and his frustration with IT told me we have reached the tipping point where the number of systems is over whelming IT resources. The cost of the hardware and data storage have come down so much that it is easier to duplicate and proliferate systems than to properly manage the data pedigree of our information. This philosophy leads to terabytes or petabytes of data sprawl and processing duplication issues costing millions of dollars yearly in maintenance, processing and software costs for the business.

With or without the economic struggles, businesses will continue buying competitors, merging with other companies, deploying tactical systems and developing emergency applications to take care of special business opportunities without concern for data pedigree. All of this business activity promotes global or local systems, stove-piped applications, non-standard IT systems, specialized tactical just-in-time systems and unique applications within business divisions. Each of these systems have their own platform, programming language, databases, data pedigree issues and complex relationships providing evidence of the CIO’s statement.

Talking to the CIO further and thinking about the horse race earlier I asked “Do we know the data pedigree of the data in these 442+ systems? How many duplicate functions or data do we have to support? Is there a project plan or budget for retiring, consolidating or sharing some of these systems, data and functionality better?” His reply was “There isn’t one yet but there is going to be one. We can’t continue duplicating everything

IT around the world is beginning to recognize and comes to grips with addressing the data sprawl and disparate complex systems. Thinking about the Triple Crown commentator discussing horse pedigree, I thought IT needs to understand where our data comes from; it needs a data pedigree process. We need to know our data’s parents, its grandparents, great-grandparents and great-great grandparents. We need to understand its travels through our systems and where it is successfully tested for quality, governed, and secured within the landscape of our data sprawl.

Given that data content is king, understanding your data pedigree is the best way to address and manage these issues. By understanding which system received the first data input from the outside world, what and where our data business rules are applied, where it gets its quality checked, where it is protected, where it is reported for compliance and where it travels throughout our disparate systems is mostly unknown these days. Data pedigree analysis can identify duplicate data, duplicate functions, and duplicate systems within systems, even for companies that have 442+ systems. Identifying all the data elements, the data pedigree, the programs, and processes and their usage within these systems is the first step in a data pedigree program that realizes the scope of the data sprawl issues.

The next time the business units ask for new data warehouse applications for analytics ask them what source data to use. Picking the right instance of duplicate data and processing within your business will determine the answers. Just make sure you pick the one that produces the best answers.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>