Search code examples
data-warehousesnowflake-cloud-data-platform

How to define a cluster key based on a expression in Snowflake?


As per Snowflake documentation (https://docs.snowflake.net/manuals/user-guide/tables-clustering-keys.html) a cluster key can be defined as one or more table columns/expressions. The example they bring is:

-- cluster by expressions
create or replace table t2 (c1 timestamp, c2 string, c3 number) cluster by (to_date(c1), substring(c2, 0, 10));

I want to extract from a date column the year,month and day and to create a cluster key based on those expressions, but didn't find a workaround. This is what i tried already:

 CREATE TABLE TBL_DATECREATED  (DATECREATED_UTC)  
 CLUSTER BY (
              TO_DATE(DATECREATED_UTC)
              )  
 AS
 SELECT DATECREATED_UTC FROM BASETABLE_CONTACTS

Result:

 SQL compilation error: invalid type [TO_DATE(TBL_DATECREATED.DATECREATED_UTC)] for parameter 'TO_DATE'

**Mention: SELECT TO_DATE(DATECREATED_UTC) FROM BASETABLE_CONTACTS works fine!

CREATE MATERIALIZED VIEW MV_DATECREATED (DATECREATED_UTC, EMAILADDRESS)
CLUSTER BY ( year(DATECREATED_UTC)
             -- extract(year from DATECREATED_UTC)
             ,EMAILADDRESS
            )
AS
SELECT DATECREATED_UTC, EMAILADDRESS FROM BASETABLE_CONTACTS 

Result:

SQL compilation error: Function EXTRACT does not support UNKNOWN argument type
(for commented expression i received the same error message)

CREATE MATERIALIZED VIEW MV_DATECREATED (DATECREATED_UTC, EMAILADDRESS)
CLUSTER BY (  DATECREATED_UTC
             ,substring(EMAILADDRESS, 1, 3)
            )
AS
SELECT DATECREATED_UTC, EMAILADDRESS FROM BASETABLE_CONTACTS 

Result:

SQL compilation error: error line 3 at position 14 Invalid argument types for function 'SUBSTRING': (UNKNOWN, NUMBER(1,0), NUMBER(1,0))

Thanks in advance for each suggestion/solution!


Solution

  • We have to use the below statements to define a cluster key based on the expression for a materialized view.

    CREATE or replace MATERIALIZED VIEW MV_DATECREATED (DATECREATED_UTC,EMAILADDRESS) cluster by(DATECREATED_UTC,EMAILADDRESS ) AS SELECT to_date(DATECREATED_UTC), EMAILADDRESS FROM BASETABLE_CONTACTS;
    
    CREATE or replace MATERIALIZED VIEW MV_DATECREATED (DATECREATED_UTC,EMAILADDRESS) cluster by(DATECREATED_UTC,EMAILADDRESS ) AS SELECT DATECREATED_UTC, EMAILADDRESS FROM BASETABLE_CONTACTS;
    
    alter materialized view MV_DATECREATED cluster by(TO_DATE(DATECREATED_UTC),EMAILADDRESS );