I am building a DQ testing application in Snowflake. Below is a simplified version of the data model (I renamed objects for the sake of clarity). I have multiple config tables for different tests (e.g. unique_tests
each has test_id
column (serving as PK) which is an MD5 hash of key columns to help avoiding duplicate tests).
There is also a result table test_results
, with the results for every test type run keeping history of the jobs.
Now I am adding a table test_impact_category
where the results with issues can be categorized by users in terms of impact (e.g. high impact, low impact etc).
I don't see a use case to create a separate primary key in test_impact_category
table as this is snowflake where primary keys are not enforced and not used for indexes.
So the controversy here is:
It actually makes sense to have foreign key referencing another foreign key (many) (test_results.test_id
(many) = test_impact_category.test_id
(one))
There are reasons not to add impact columns to test config tables
So if in terms of data modelling best practices and common sense, I am curious how other experienced modellers/engineers would model this in Snowflake?
Apparently, there is no dilemma as foreign key can only reference PK/UK. So I guess this relationship cannot be documented in the DDL. Keeping the post, may be there will be some interesting answers and some other people can benefit as well.