in

The Essential Guide to Data Quality Monitoring

Data quality issues cost US businesses over $600 billion per year according to Gartner – that‘s 3-4% of revenue for the average company. As data volumes and speed continue exploding, getting ahead of data quality will only grow more critical.

In my role as a data analyst, I‘ve seen firsthand how flawed data slows down operations, causes leaders to make misguided decisions, and frustrates customers. Once errors enter databases and spread downstream, they become exponentially more difficult and expensive to clean up.

That‘s why implementing data quality monitoring has to be a priority.

In this comprehensive guide, we‘ll go deep on how to create an effective data quality monitoring program.

You‘ll learn:

  • Common data quality problems organizations face
  • The far-reaching business impacts of poor data health
  • Steps for building a data quality monitoring framework
  • Metrics and methods for tracking data quality
  • Real-world examples and war stories from other analysts
  • Emerging technologies that automate monitoring

Let‘s dive in!

The High Costs of Poor Data Health

Bad data is a business epidemic. Based on surveys, analysts estimate data accuracy in many organizations sits between 60-70%. That means nearly one-third of data contains defects.

The costs of low data quality are staggering:

  • Inaccurate reporting and metrics lead to poor strategic decisions – up to $15 million lost in revenue per year for large companies.

  • Staff productivity declines up to 30% as workers struggle with faulty data and manual rework.

  • Irrelevant messaging and improper targeting from inaccurate CRM data decreases marketing ROI 15-25%.

  • Operational and system downtime from crashes related to bad data costs up to $100k per hour.

Addressing downstream quality issues has also become phenomenally expensive:

  • Correcting defective data by hand costs firms $150,000 on average per year according to The Data Warehousing Institute.

  • Remediating inaccurate customer and product data costs organizations $5-25 per record according to Gartner.

  • Forrester estimates that bad data-driven decisions waste over $9.7 million per year for Fortune 1000 companies.

But those hard costs don’t account for softer impacts like frustrated customers and employees. It‘s clear organizations can‘t afford to ignore data quality anymore.

Why Monitoring Quality Continuously is Essential

Most organizations perform some type of periodic data quality assessment. But sporadic audits are insufficient – defects easily creep in between evaluations.

Like your car, data requires constant monitoring to catch issues emerging real-time. A proactive approach reduces costly crises down the road.

“Think of data quality monitoring as preventative medicine – it keeps small problems from turning into major diseases.”

Continuous monitoring provides both technical and business users visibility into data health trends and patterns. Teams can collaboratively set data policies and standards when they share common quality metrics and insights.

Monitoring quality at each stage of the data pipeline prevents exponential downstream impacts. Assessing data quality should be an integral part of ingesting, processing, storing and analyzing data.

Let‘s examine the key elements to include in your data quality monitoring program.

Design Data Quality Metrics Tailored to Your Business

The first step is determining metrics that indicate data health for your organization. Here are important categories of metrics to consider:

Accuracy – How consistent is data with real-world values? E.g. % of inaccurate records based on audits or validation checks.

Completeness – What data is missing? E.g. % of null field values or incomplete records.

Uniqueness – What issues exist around duplication? E.g. % of redundant or overlapping records.

Timeliness – How stale or outdated is the data? E.g. % of values not updated per frequency thresholds.

Conformity – Does data meet necessary structural standards? E.g. % of records with incorrect formats.

There are hundreds of potential metrics, so choose ones aligned with business priorities and actual user needs. Avoid “vanity metrics” that sound impressive but offer little value.

"Leading companies like CVS developed data quality metrics linked directly to business outcomes – they measure data quality in terms of impacts to customer satisfaction and revenue."

Different metrics apply to different datasets. Collaborate with stakeholders to determine the 5-10 key data quality metrics for your organization right now.

Continuously Monitor Metrics Across the Data Lifecycle

The next step is actively monitoring your metrics across the lifecycle:

Ingestion – Assess data as it‘s streamed into your systems. Catch quality issues arising at the source.

Processing – Measure quality post-ETL while data is transformed, enriched and cleaned.

Storage – Profile data at rest within databases, data warehouses and lakes to identify defects.

Analysis – Monitor quality metrics on downstream analytic outputs and reports.

Usage – Collect feedback on data quality from end users and customer-facing systems.

Automate measurement where possible. Also scan samples and perform audits to uncover hidden issues. Validate data prior to critical business decisions.

“Think of data quality metrics like Key Performance Indicators (KPIs) – they require active tracking and management.”

Continuous monitoring provides visibility into quality trends. Data stewards can intervene when metrics fall below acceptable thresholds.

Build a Data Quality Monitoring Toolbox

Combining the right tools is key for automated, continuous monitoring. Assemble tools for:

Profiling – Scan data at rest to build quality metadata like value accuracy, conformance, overlap analysis and dependency analysis. Leading options include Ataccama DQ Analyzer, Informatica DQ Director, and IBM InfoSphere Discovery.

Parsing and Standardization – Repair issues around formatting discrepancies, abbreviation inconsistencies, casing differences and more. Human Inference, Melissa and Tamr offer top parsing solutions.

Record Matching – Identify duplicate records and resolve identity confusion across datasets through probabilistic matching. Leaders include WinPure, Talend and Oracle.

Monitoring – Centralize data quality management and enable collaboration between technical and business users through governance hubs. Look to dedicated monitoring tools like those from SAP, Informatica, SAS or Information Builders.

Data Virtualization – Abstract data complexities and enable quality checks via virtual layers rather than moving data. Key players include Denodo, AtScale, and Data Virtuality.

The right mixture of tools can automate significant portions of quality management. But always factor in your in-house skills, data infrastructure and use cases when choosing tools.

Make Data Quality Visible with Scorecards

Visibility into quality empowers action. Data scorecards provide business stakeholders an easy snapshot of enterprise data health.

Sample Data Quality Scorecard

Sample Data Quality Scorecard

Scorecards roll up key quality metrics into green/yellow/red health ratings. Trends expose where quality is improving or worsening.

Share scorecards through centralized data quality portals as well as email reports and dashboards. Quality becomes a priority when deficiencies and impacts are made transparent.

Foster a Data Quality Culture

Sustaining quality necessitates cultural change across the enterprise. Quality must become everyone‘s responsibility – not just IT and data teams.

Promote Awareness – Educate staff on quality challenges and impacts through training and internal marketing campaigns. Tie data quality behaviors to performance management.

Incentivize Actions – Motivate business teams to participate in quality efforts through competitions, recognition and rewards programs.

Empower Users – Provide self-service data quality tools and make it easy to report issues. Users become agents of change.

Lead by Example – Executives and managers should visibly prioritize and invest in quality initiatives.

While challenging, building a workplace culture vigilant about data quality pays off through greater operational excellence and performance.

Overcoming Roadblocks to Quality Monitoring

Here are common challenges I‘ve encountered when implementing quality monitoring, along with some tips to address them:

Problem: No alignment on metrics and priorities

Solution: Workshop with stakeholders to determine metrics delivering business value. Focus initial efforts on high-impact datasets.

Problem: Legacy systems restrict real-time monitoring

Solution: Explore modern data integration and preparation tools to enable more agility. Start monitoring batch ETL processes.

Problem: Lack of skilled resources to perform monitoring

Solution: Train business users through data literacy programs. Automate monitoring activities with tools where possible.

Problem: Poor data quality is an accepted norm

Solution: Perform impact analysis and calculate the hard costs of bad data. Publicize examples of quality crises and wins to change attitudes.

While tough, none of these roadblocks can’t be overcome through education and starting small. Demonstrate quick wins to build support.

Sustaining Data Quality Excellence

The work doesn’t end once a quality monitoring program launches. You need mechanisms to lock in data quality gains long-term:

  • Data governance policies – Maintain standards for entry, storage and use of data. Review policies quarterly.

  • Enterprise data strategy – Include objectives for continuously improving quality. Update the strategy annually.

  • Master data management (MDM) – Centrally define and manage critical domains like customer, product and account data.

  • Training – Provide annual quality training focusing on behaviors and responsibilities.

  • Change management – Introduce new processes gradually with extensive communication and support.

Great data quality requires great habits practiced by everyone touching data across its lifecycle.

Data Quality is a Competitive Advantage

In closing, I hope I’ve convinced you that monitoring and governing data quality must become a top priority.

The costs of ignoring quality are massive – lost revenue, frustrated customers, increased risk. But organizations achieving excellence gain real competitive advantages:

  • Agility – Quality data increases speed and flexibility responding to market changes.

  • Cost Savings – Less waste and fewer quality failures cut expenses.

  • Actionable Insights – Reliable analytics and AI leads to smarter decisions.

  • Customer Loyalty – Consistent and accurate customer data improves experiences.

  • Risk Reduction – Higher quality data minimizes compliance failures and lawsuits.

While becoming a data quality leader takes commitment, the rewards are game-changing. I encourage you to start building a quality monitoring regimen tailored to your organization‘s needs. Reach out if you need help getting started!

AlexisKestler

Written by Alexis Kestler

A female web designer and programmer - Now is a 36-year IT professional with over 15 years of experience living in NorCal. I enjoy keeping my feet wet in the world of technology through reading, working, and researching topics that pique my interest.