Search code examples
sqlsnowflake-cloud-data-platformdata-modeling

Primary and foreign keys dilemma in Snowflake Data Model


I am building a DQ testing application in Snowflake. Below is a simplified version of the data model (I renamed objects for the sake of clarity). I have multiple config tables for different tests (e.g. unique_tests each has test_id column (serving as PK) which is an MD5 hash of key columns to help avoiding duplicate tests).

There is also a result table test_results, with the results for every test type run keeping history of the jobs.

Now I am adding a table test_impact_category where the results with issues can be categorized by users in terms of impact (e.g. high impact, low impact etc).

I don't see a use case to create a separate primary key in test_impact_category table as this is snowflake where primary keys are not enforced and not used for indexes.

So the controversy here is:

  • It actually makes sense to have foreign key referencing another foreign key (many) (test_results.test_id (many) = test_impact_category.test_id (one))

  • There are reasons not to add impact columns to test config tables

So if in terms of data modelling best practices and common sense, I am curious how other experienced modellers/engineers would model this in Snowflake?

enter image description here


Solution

  • Apparently, there is no dilemma as foreign key can only reference PK/UK. So I guess this relationship cannot be documented in the DDL. Keeping the post, may be there will be some interesting answers and some other people can benefit as well.