Search code examples
snowflake-cloud-data-platformprimary-keyassert

What is the best way to assert that a set of columns could form a primary key in Snowflake?


Infamously primary key constraints are not enforced in snowflake sql:

-- Generating a table with 4 rows that contain duplicates and NULLs:
CREATE OR REPLACE TEMP TABLE PRIMARY_KEY_TEST AS
SELECT
*
FROM (
           SELECT 1    AS PK, 'TEST_TEXT' AS TEXT
UNION ALL  SELECT 1    AS PK, 'TEST_TEXT' AS TEXT
UNION ALL  SELECT NULL AS PK, NULL        AS TEXT
UNION ALL  SELECT NULL AS PK, NULL        AS TEXT
)
;

SELECT *
FROM PRIMARY_KEY_TEST
;
PK TEXT
1 TEST_TEXT
1 TEST_TEXT
NULL NULL
NULL NULL
-- These constraints will NOT throw any errors in Snowflake
ALTER TABLE PRIMARY_KEY_TEST ADD PRIMARY KEY (PK);
ALTER TABLE PRIMARY_KEY_TEST ADD UNIQUE (TEXT);

However knowing that a set of colums has values that are uniuqe for every row and never NULL is vital to check when updating a set of data.

So I'm looking for a easy to write and read (ideally 1-2 lines) piece of code (proably based on some Snowflake function) that throws an error if a set of columns no longer forms a viable primary key in Snowflake SQL.

Any Suggestions?


Solution

  • So I'm looking for a easy to write and read (ideally 1-2 lines) piece of code (proably based on some Snowflake function) that throws an error if a set of columns no longer forms a viable primary key in Snowflake SQL

    Such test query is easy to write using QUALIFY and windowed COUNT. The pattern is to place primary key column list into PARTITION BY part and search for non-unique values, additional check for nulls could be added too. If the column list is a valid candidate for Primary key, it will not return any rows, if there are rows violating the rules they will be returned:

    -- checking if PK is applicable
    SELECT *
    FROM PRIMARY_KEY_TEST
    QUALIFY COUNT(*) OVER(PARTITION BY PK) > 1
         OR PK IS NULL;
      
     -- chekcing if TEXT column is applicable 
    SELECT *
    FROM PRIMARY_KEY_TEST
    QUALIFY COUNT(*) OVER(PARTITION BY TEXT) > 1
         OR TEXT IS NULL;
         
     -- chekcing if PK,TEXT columns are applicable 
    SELECT *
    FROM PRIMARY_KEY_TEST
    QUALIFY COUNT(*) OVER(PARTITION BY PK,TEXT) > 1
        OR PK IS NULL
        OR TEXT IS NULL;
    

    I'd still prefer code that can throw an error though

    It is possible using Snowflake Scripting and RAISE exception:

    BEGIN
       LET my_exception EXCEPTION (-20002, 'Columns cannot be used as PK.');
    
       IF (EXISTS(SELECT *
                 FROM PRIMARY_KEY_TEST
                  QUALIFY COUNT(*) OVER(PARTITION BY PK) > 1
                   OR PK IS NULL
           )) THEN
         RAISE my_exception;
      END IF;
    END;
    

    -20002 (P0001): Uncaught exception of type 'MY_EXCEPTION' on line 8 at position 5 : Columns cannot be used as PK.