Analytics Lessons from the Election

I have been doing analytical systems for many years, starting with my first data warehouse database design and application for E F Hutton in 1988. This election followed many analytics lessons that I have learned and experienced with several analytical data warehouse projects over the years.

First, don’t believe the early estimates or the first analytical results. The pre-election polling always had Clinton in the lead or at least the ones quoted by the major media outlets. These analytical results may be biased by specific data chosen for the project or poor data cleansing issues. Either of these conditions cause analytical analysis problems when the next iteration of new data or improved data cleansing routines are finally tuned. When the true data comes into the system, all data exceptions included or governance rules are implemented within an analytical architecture framework, dramatically increasing the likelihood of correcting earlier incorrect or erroneous assumptions. This idea is especially important and very common for all the analytical project schedules.

Next, everything everyone tells you is true. With this election neither candidate proved to be the shining star for whom people wanted to vote. According to some reports. the voting was mainly against a candidate as opposed to for a candidate. Various TV, print and Internet media did not act truthfully or ethically with insider email scandals and fake or exaggerated news reports that only cloaked the candidates’ shortcomings, issues, and issue positions. Even Facebook had problems with fake candidates and news and may have influenced the election.

The fake candidate information and news confused people. In analytics, improper data can confuse end-users and provide wrong results leading to wrong analytical analysis conclusions because of a variety of reasons such as poor data quality, purchased data sources with improper contexts, or combined data that shows conflicting data properties.

Next, small data changes or small percentages make a big difference. With this recent election and most of the elections from the last twenty years the winning margin has been only a small percentage difference. Everyone remembers Florida’s hanging chads from a few elections ago? The same is true with analytical analysis that measures anything such as customer retention, product marketing, or profitability ratios of customers.

Marketplace competition with bricks and mortar versus online stores continues to squeeze margins and profitability razor thin. These minuscule margins force laser focus on data quality, data context, and data governance processes for all analytical elements. Data sourcing, data lineage, and the percentage of data occurrence for each value, as well as the impact of single records or values requires exacting and detailed analysis to show their impact on ultimate conclusions.

Sometimes your analytical analysis proves something you don’t like. With all the election results, protests, and media meltdowns, this election has shown the surprise polling analysis did not realize, expose, and report. Were the polls deliberately slanted?  Some are saying yes, but there were several factors as discussed here. The difference in the polls versus the election results will reverberate across all types of election and citizen canvassing efforts for some time to come.

With all types of open source data warehousing systems and data scientists analyzing all types of data, sometimes the various data elements don’t show the data difference, profitable margin, or data significance until the very event or last bit of data is included within the analytics. Then it shows that the hypothesis or profit margin is not attainable, which can be very disappointing but can still be a huge success for the company because it prevents them from going through a unprofitable business scenario for a long time period. Remember going to market with the correct product at the correct price is still very valuable, even if one scenario pricing model or product model doesn’t work out.

Every single entry or record can be the cause of analytical analysis erroneous results. If your analytics uses millions or billions of entries each record can cause problems especially when the margin of profitability error continues to get smaller and smaller. That is why the analytical projects and getting accurate results are so difficult.

Dave Beulke is a system strategist, application architect, and performance expert specializing in Big Data, data warehouses, and high performance internet business solutions.  He is an IBM Gold Consultant, Information Champion, President of DAMA-NCR, former President of International DB2 User Group, and frequent speaker at national and international conferences. His architectures, designs, and performance tuning techniques help organization better leverage their information assets, saving millions in processing costs. Follow him on Twitter  or connect through LinkedIn.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>