snowflake-cloud-data-platform query-optimization primary-key

How do Snowflake primary key definitions affect query performance?

I need to define primary keys on the Snowflake tables into which I'm loading data, since the ELT framework I'm using (Singer.io) uses them to define the match condition for merge/upsert. However, the order of the columns in the key seems to affect partitioning and query performance.

My primary key contains both a low and a high cardinality column. I've tried defining the key with the columns listed in either order, and when the low cardinality column is listed first, the table has fewer partitions, and some queries are slightly faster, but other queries can be much slower than with no primary key or when the high cardinality column is first.

So Snowflake appears to use primary keys as clustering/micro-partitioning hints, even though it doesn't enforce them. Are there any guidelines or information on how the choice of primary key (not clustering key) can affect query performance?

Solution

As of April, 2023, there are no performance drawbacks to defining primary and foreign keys. The only advantage is join elimination, which you can read about here:

https://docs.snowflake.com/en/user-guide/join-elimination