Search code examples
sqlsnowflake-cloud-data-platform

Snowflake SQL finding the rows diffs between 2 large tables


I have 2 tables, around 300M rows each

CREATE TABLE AllIDs
(
ID int NULL,
isUsed BOOLEAN NULL
)
CREATE TABLE UsedIDs
(
ID int NULL
)

I want to find all the IDs that are not used, that is, all the ID rows in AllIDs that do not appear in UsedIDs

I have tried the below and a few similar variations but the query runs indefinitely (16+ hours) even with a Medium warehouse.

    UPDATE AllIDs t 
SET t.IsUsed = 0  
from AllIDs  
where not exists (select 1 FROM UsedIDs f 
WHERE t.ID = f.ID )

Any ideas how this can be done efficiently?


Solution

  • I believe the from clause in the update statement is unnecessary and results in a a more complex query plan, including an unnecessary Cartesian join.

    The following may be more performant:

    UPDATE allids t
    SET t.isused = 0
    WHERE NOT EXISTS (
        SELECT 1
        FROM usedids f
        WHERE t.id = f.id
    )