Search code examples
sqlsnowflake-cloud-data-platform

Update null values in a column with values from same table but by referencing indirectly related matching value in another table


I need to write a Snowflake SQL to update a null value(cid) in a table transaction from cid values in the same table but by referencing indirectly related matching value(hash) in another table payment.

Please refer the attached sample data & expected result for high level view. Also added insert queries now. I've tried explaining the data below:

  • Suppose, I have two tables in snowflake - transaction and payment.
  • transaction table has columns - tid, cid, revenue1, revenue2.
    • tid is primary key. cid can be null.
  • payment table has columns - id, tid, hash, amount.
    • id is primary key.
    • tid is always not null but not unique
    • hash can be null.

Rules:

  • not all tid in transaction are present in payment.
  • for some tid in transaction table there can be a hash in the payment table.
  • other tid can be present in payment table but with null hash.

Example: Please refer the attached sample data and expected result. let's say there is a tid 123 with null cid in transaction but has a corresponding hash in the payment table. the same hash value in payment table is also mapped with transaction tid 770. Now this tid 770 back in transaction table has its cid populated as cid_2. we want to use this situation so that cid value cid_2 of tid 770 gets copied to cid value of tid 123 in transaction table due to their common hash in payment table.

Question: I need to write an snowflake query for this to populate all such null cid in transaction table wherever possible. some cid might still remain blank in transaction table if a matching hash is not found in payment table for their respective tid

Sample data:

Transaction:

TID CID revenue1 revenue2
123 null 1313121 1309075
456 null 1312312 1309076
789 null 1311503 1309077
312 null 1310694 1309078
345 null 1309885 1309079
678 null 1309076 1309080
679 null 1309077 1309081
770 cid_2 309885 1309079
880 cid_1 1309076 1309080
234 null 1309885 1309079

Payment:

ID TID hash Revenue
23423 123 3333 56
34534 456 1111 65
23423 789 2222 67
32432 880 1111 87
54754 770 3333 45
98797 312 null 65
14284 234 3333 54

Expected Output for final transaction table: (The bold values are new and achieved through payment's hash reference.

TID CID revenue1 revenue2
123 cid_2 1313121 1309075
456 cid_1 1312312 1309076
789 null 1311503 1309077
312 null 1310694 1309078
345 null 1309885 1309079
678 null 1309076 1309080
679 null 1309077 1309081
770 cid_2 1309885 1309079
880 cid_1 1309076 1309080
234 cid_2 1309885 1309079

Insert queries for transaction & payment:

CREATE TABLE transaction (
    TID INT PRIMARY KEY,
    CID VARCHAR(255),  -- Assuming CID is a string, adjust type as needed
    revenue1 INT,
    revenue2 INT
);

INSERT INTO transaction (TID, CID, revenue1, revenue2)
VALUES
(123, NULL, 1313121, 1309075),
(456, NULL, 1312312, 1309076),
(789, NULL, 1311503, 1309077),
(312, NULL, 1310694, 1309078),
(345, NULL, 1309885, 1309079),
(678, NULL, 1309076, 1309080),
(679, NULL, 1309077, 1309081),
(770, 'cid_2', 1309885, 1309079),
(880, 'cid_1', 1309076, 1309080),
(234, NULL, 1309885, 1309079);

------------------------------------------------------------------------------------------
CREATE TABLE payment (
    TPMID INT,
    TID INT,
    cardhash INT,
    Revenue INT,
    PRIMARY KEY (TPMID, TID)  -- Assuming the combination of TPMID and TID is unique
);

INSERT INTO payment (TPMID, TID, cardhash, Revenue)
VALUES
(23423, 123, 3333, 56),
(34534, 456, 1111, 65),
(23423, 789, 2222, 67),
(32432, 880, 1111, 87),
(54754, 770, 3333, 45),
(98797, 312, NULL, 65),
(14284, 234, 3333, 54);

Solution

  • I got this to work using a CTE and then self joining based on the hash value. This essentially dupes the cardhash value for every unique TID. This was a fun one, hope it helps.

    WITH JOINED AS (SELECT T.TID AS TID
    , CID
    , REVENUE1
    , REVENUE2
    , CARDHASH
    FROM TRANSACTION T
    
    LEFT JOIN PAYMENT P
    ON P.TID = T.TID)
    , CARDHASH_DUPES AS (SELECT J.CARDHASH
    , J.TID
    , R.CID
    FROM JOINED J
    
    INNER JOIN JOINED R
    ON J.CARDHASH = R.CARDHASH
    AND R.CID IS NOT NULL)
    SELECT J.TID
    , H.CID
    , J.REVENUE1
    , J.REVENUE2
    FROM JOINED J
    
    LEFT JOIN CARDHASH_DUPES H
    ON J.TID = H.TID
    ;