Search code examples
databasedatabase-designmetadatadata-quality

Data quality database model


Need an example of a database model to be attached to a database for data quality. Best form of the answer would at the very least be DDL that's executable in MySQL; other RDMS DDL's are okay, I'll just post another question asking for a porting of the code.

A good explaintion would be a huge plus.

Questions, comments, feedback, etc. -- just comment, thanks!!


Solution

  • The biggest problem is identifying meaningful measures of quality. That's so highly application-dependent, I doubt that anybody will be able to help you very much. (At least not without a lot more information--perhaps more than you're allowed to give.)

    But let's say your application records observations of birds by individuals. (I'm just throwing this together off the top of my head. Read it for the gist, and expect the details to crumble under scrutiny.) Under average field conditions,

    • some species are hard for even a beginner to get wrong
    • some species are hard for an expert to get right
    • a specific individual's ability varies irregularly over time (good days, bad days)
    • individuals usually become more skilled over time
    • you might be highly skilled at identifying hawks, and totally suck at identifying gulls
    • individuals are prone to suggestion (who they're with makes a difference in their reliability)

    So, to take a shot at assessing the quality of an identification, you might try to record a lot of information besides the observation "3 red-tailed hawks at Cape May on 05-Feb-2011 at 4:30 pm". You might try to record

    • weather
    • lighting
    • temperature (some birders suck in the cold)
    • hours afield (some birders suck after 3 hours, or after 20 cold minutes)
    • names of others present
    • average difficulty of correctly identifying red-tailed hawks
    • probability that this individual could correctly identify red-tails under these field conditions
    • alcohol intake

    Although this might be "meta" to field birders, to the database designer it's just data. And you'd design the tables just like you'd design them for any other application. (That's what I did, anyway.)