Data Quality – Garbage in Garbage out

While reviewing the content of this course “Data Management and Analytics”, and considering my next report topic, it occurred to me that there is a very strong central theme throughout the course – “Data”.  Ok, so this is stating the blindingly obvious but it does underpin nearly everything in the business world.  But it not just Data though is it?  Data is simply a series of charagarbage in out imagecters, a mixture of alphanumeric digits until we put some context to it.  Ultimately, it’s what we do with data, how and where we do it that gives us any form of realistic meaning.  Buzzwords of the decade include Big Data, Data Analytics and Business Intelligence are all reliant on data.  However, all of these trends would be useless without data, but more importantly meaningful data.

The quality of the data we use determines and underpins the success, of lack thereof in our daily decisions.  It is for this reason, I believe data quality should be front and centre of the buzzwords for the decade.

Data Quality

There are well recognised papers by industry experts that advocate 4 core dimensions of Data Quality.  Nancy Couture in her paper on “Implementing an Enterprise Data Quality Strategy” (2013) suggested “fitness for use” as a broad definition when considering a data quality assessment programme.  In this article, it is suggested, rather than trying to focus on every dimension, start by focusing on the basics of completeness and timeliness and then move on to validity and consistency.

Components

The below is a simple illustration of the dimensions of data quality.

data-quality-dimensions

As illustrated above, there are 6 core dimensions to data quality.

Completeness can be described as the expected comprehensiveness.  Data can be complete even if optional data is missing.  For example, customer contact information should hold name, address and phone number as mandatory fields but potentially have customer name middle initial as optional.  Remember though that data can be complete but not accurate.

planet field cartoon

Timeliness “Delayed data is data denied”.  Timeliness is really about having the right information at the right time.  User expectation drives timeliness.  For example, income tax returns are due on a certain date, filing late returns incurs a penalty.    In the good old days, we went to a travel agent to book a holiday. Nowadays, the user expectation is to be able to see real time availability and price. We suffer real frustration in decision making when occasionally we come across a system where real time information is not available.  According to Jim Harris of Information-Management, due to the increasing demand for real-time data-driven decisions, timeliness is the most important dimension of data quality.

Consistency of data refers to data across the organisation being in sync with each other.  Identical information available across all processes and departments in an organisation.  This can be difficult to achieve where there are multiple processing systems taking information from potentially different sources. A Master Data Management (MDM) strategy seeks to address inconsistency.  In database parlance, consistency problems may arise during database recovery situations.  In this case it is essential to understand the back-up methodologies and how the primary  data is created and accessed.

Validity – Is the data itself valid?  Validation rules are required to ensure the capturing of data in a particular manner ensure that the detail is valid.  Ensuring that the same fields are used consistently for the same information capture.  Nancy Couture describes validity as “correctness” of the actual data content.  This is the concept that most data consumers think about when they envision data quality.

Integrity refers to Data that has a complete or whole structure i.e. overall completeness, accuracy and consistency.    The business rules define how pieces of data relate to each other in order to define the integrity of the data.  Data integrity is usually built into database design with the use of entity and referential integrity rules.

Accuracy.  Data values stored for an object are the correct values.  It may seem an obvious component of the data quality dimension but the data that is captured needs to correct i.e. accurate.  There are two aspects, one is that the recording of the information is correctly recorded as in without typo and data entry error.  The second is that data needs to be represented in a consistent and unambiguous form.  For example, the manner in which a date of birth is recorded, US style 12/10/1972 or European style 10/12/1972.  So when is the birthday?  Good database design should resolve issues on this nature.

cartoon - metadata

Business Benefits

Data Quality as a subset of Data Management is aligned with Master Data Management (MDM) and Data Governance.  They all focus on Data as an asset to the business.  Modern business parlance seeks to find a Return on Investment (ROI) from their Data Management strategies.

Data Analytics: With quality data, we can undertake sound analysis of the business and improve the quality of decision making which in turn improves business performance.  The business can investigate potentially new areas of revenue not previously considered.

Timeliness of good data and analytics affords new opportunities to reach the market with new offerings ahead of the competition.  Further competitive edge can be achieved with rapid decision turnaround, rapid reaction to market conditions.  Predictive analytics can lead to a proactive position in the marketplace.

Customer satisfaction ratings can be improved through improved accurate interaction with the business.

Customer trust in the information and how it is stored is likely to be important in the future.

“Gartner predicts that 30 percent of businesses will have begun directly or indirectly monetizing information assets via bartering or selling them outright by 2016”.

Compliance: Knowing your organisational data i.e. who, what, where, how, why and when goes a long way towards achieving compliance.  Whether it’s compliance with Data Protection requirements, Financial regulations, compliance with Sarbanes-Oxley (SOX), PCI Security (Payment Card Industry) or seeking to achieve ISO 8000, the International Standard for Data Quality.

This is by no means an exhaustive list of the business benefit of good Data Quality.  What about the cost to business of poor data quality?  It depends on the business.

Customers: Poor data, leading to poor marketing, sales, support or service experience will cost your business customers and revenue.

Shareholders: Data accuracy, auditability, transparency are crucial to stakeholder’s trust.  Loss of trust will mean downgrading of shares and weak stock market performance.

Employee Productivity and Retention: Endless hours spent scrubbing data for report input reduces employee performance and leads to poor morale and ultimately staff churn.

The list of impacts on the business of poor quality data is endless.

Perspective

Taking a step back, it is a matter of perspective.  Some aspects of Data Quality are critical to the business, others less so.  It is a matter of prioritisation and understanding the impact / risk and/ or advantage to the business of seeking to pursue Data quality.  But therein lies the Catch 22, if your data quality is not good enough how can you make balanced informed decisions?

References

Continue reading Data Quality – Garbage in Garbage out