Search code examples
regexdata-warehousedata-quality

algorithm for data quality in a data warehouse


I'm looking for a good algorithm / method to check the data quality in a data warehouse. Therefore I want to have some algorithm that "knows" the possible structure of the values and then checks if the values are a member of this structure and then decide if they are correct / not correct.

I thought about defining a regexp and the check each value whether it fits or not.

Is this a good way? Are there some good alternatives? (Any research papers?)


Solution

  • I have seen some authors suggest adding a special dimension called a data quality dimension to describe each facttable-record further.

    Typical values in a data quality dimension could then be “Normal value,” “Out-of-bounds value,” “Unlikely value,” “Verified value,” “Unverified value,” and “Uncertain value.”