I need to write a Snowflake SQL to update a null value(cid) in a table transaction from cid values in the same table but by referencing indirectly related matching value(hash) in another table payment.
Please refer the attached sample data & expected result for high level view. Also added insert queries now. I've tried explaining the data below:
Rules:
Example: Please refer the attached sample data and expected result. let's say there is a tid 123 with null cid in transaction but has a corresponding hash in the payment table. the same hash value in payment table is also mapped with transaction tid 770. Now this tid 770 back in transaction table has its cid populated as cid_2. we want to use this situation so that cid value cid_2 of tid 770 gets copied to cid value of tid 123 in transaction table due to their common hash in payment table.
Question: I need to write an snowflake query for this to populate all such null cid in transaction table wherever possible. some cid might still remain blank in transaction table if a matching hash is not found in payment table for their respective tid
Sample data:
Transaction:
TID | CID | revenue1 | revenue2 |
---|---|---|---|
123 | null | 1313121 | 1309075 |
456 | null | 1312312 | 1309076 |
789 | null | 1311503 | 1309077 |
312 | null | 1310694 | 1309078 |
345 | null | 1309885 | 1309079 |
678 | null | 1309076 | 1309080 |
679 | null | 1309077 | 1309081 |
770 | cid_2 | 309885 | 1309079 |
880 | cid_1 | 1309076 | 1309080 |
234 | null | 1309885 | 1309079 |
Payment:
ID | TID | hash | Revenue |
---|---|---|---|
23423 | 123 | 3333 | 56 |
34534 | 456 | 1111 | 65 |
23423 | 789 | 2222 | 67 |
32432 | 880 | 1111 | 87 |
54754 | 770 | 3333 | 45 |
98797 | 312 | null | 65 |
14284 | 234 | 3333 | 54 |
Expected Output for final transaction table: (The bold values are new and achieved through payment's hash reference.
TID | CID | revenue1 | revenue2 |
---|---|---|---|
123 | cid_2 | 1313121 | 1309075 |
456 | cid_1 | 1312312 | 1309076 |
789 | null | 1311503 | 1309077 |
312 | null | 1310694 | 1309078 |
345 | null | 1309885 | 1309079 |
678 | null | 1309076 | 1309080 |
679 | null | 1309077 | 1309081 |
770 | cid_2 | 1309885 | 1309079 |
880 | cid_1 | 1309076 | 1309080 |
234 | cid_2 | 1309885 | 1309079 |
Insert queries for transaction & payment:
CREATE TABLE transaction (
TID INT PRIMARY KEY,
CID VARCHAR(255), -- Assuming CID is a string, adjust type as needed
revenue1 INT,
revenue2 INT
);
INSERT INTO transaction (TID, CID, revenue1, revenue2)
VALUES
(123, NULL, 1313121, 1309075),
(456, NULL, 1312312, 1309076),
(789, NULL, 1311503, 1309077),
(312, NULL, 1310694, 1309078),
(345, NULL, 1309885, 1309079),
(678, NULL, 1309076, 1309080),
(679, NULL, 1309077, 1309081),
(770, 'cid_2', 1309885, 1309079),
(880, 'cid_1', 1309076, 1309080),
(234, NULL, 1309885, 1309079);
------------------------------------------------------------------------------------------
CREATE TABLE payment (
TPMID INT,
TID INT,
cardhash INT,
Revenue INT,
PRIMARY KEY (TPMID, TID) -- Assuming the combination of TPMID and TID is unique
);
INSERT INTO payment (TPMID, TID, cardhash, Revenue)
VALUES
(23423, 123, 3333, 56),
(34534, 456, 1111, 65),
(23423, 789, 2222, 67),
(32432, 880, 1111, 87),
(54754, 770, 3333, 45),
(98797, 312, NULL, 65),
(14284, 234, 3333, 54);
I got this to work using a CTE and then self joining based on the hash value. This essentially dupes the cardhash value for every unique TID. This was a fun one, hope it helps.
WITH JOINED AS (SELECT T.TID AS TID
, CID
, REVENUE1
, REVENUE2
, CARDHASH
FROM TRANSACTION T
LEFT JOIN PAYMENT P
ON P.TID = T.TID)
, CARDHASH_DUPES AS (SELECT J.CARDHASH
, J.TID
, R.CID
FROM JOINED J
INNER JOIN JOINED R
ON J.CARDHASH = R.CARDHASH
AND R.CID IS NOT NULL)
SELECT J.TID
, H.CID
, J.REVENUE1
, J.REVENUE2
FROM JOINED J
LEFT JOIN CARDHASH_DUPES H
ON J.TID = H.TID
;