Search code examples
performancesnowflake-cloud-data-platformclustering-key

Snowflake delete query scanning all partitions


I have an ETL process that it's deleting a couple hundred thousand rows from a table with 18 billion rows using a unique hashed surrogate key like: 1801b08dd8731d35bb561943e708f7e3

delete from CUSTOMER_CONFORM_PROD.c360.engagement
            where (
                engagement_surrogate_key) in (
                select (engagement_surrogate_key)
                from CUSTOMER_CONFORM_PROD.c360.engagement__dbt_tmp
            );

This is taking from 4 to 6 minutes each time on a Small warehouse. I have added a clustering key on the engagement_surrogate_key but since it's unique with high cardinality it didn't help. I have also enabled search optimization service but that also didn't help and it's still scanning all partitions. How can I speed up the deletion?


Solution

  • The deletion can be speed up limiting the scan on the destination table by adding a date range, for example, filtering for only the past month worth of data: loaded_date>=dateadd(MM, -1, current_date). If you are using dbt they have implemented that functionality using this macro:

    {% macro default__get_incremental_merge_sql(arg_dict) %}
    
      {% do return(get_merge_sql(arg_dict["target_relation"], arg_dict["temp_relation"], arg_dict["unique_key"], arg_dict["dest_columns"], arg_dict["predicates"])) %}
    
    {% endmacro %}
    

    So you can add the predicate to the dbt incremental model config like this:

    {{ config(materialized= 'incremental', unique_key='engagement_surrogate_key', predicates=['loaded_date>=dateadd(M, -1, current_date)'])}}
    

    When you run your model, the code generated will be this:

    delete from CUSTOMER_CONFORM_PROD.c360.engagement
            where (
                engagement_surrogate_key) in (
                select (engagement_surrogate_key)
                from CUSTOMER_CONFORM_PROD.c360.engagement__dbt_tmp
            )
                    and loaded_date>=dateadd(MM, -1, current_date);