I know there are many questions over this and many solutions, most of them specific to each individual's case. However, in my case I cannot seem to get just the sets of records that are to be deleted, to leave only the 1st ID or the lowest-value ID.
This is a sample of my table:
ID COL_A COL_B what I need to do
24556552 KD PT PERSONAL TRAINING, AND 58 MOU 5691212
24556924 KD PT PERSONAL TRAINING, AND 58 MOU 5691212 <-- remove
24556869 KHETSHI DHARAMSHI & CO LTD, 5690443
24556951 KHETSHI DHARAMSHI & CO LTD, 5690443 <-- remove
24528435 KINNY TINT LDA 5689791
24528436 KINNY TINT LDA 5689791 <-- remove
24528437 KINNY TINT LDA 5689791 <-- remove
24528438 KINNY TINT LDA 5689791 <-- remove
24528439 KINNY TINT LDA 5689791 <-- remove
Is there a way to link this table back to itself, so I can only get the first occurrence of it, keep it and delete the rest? I am not familiar with PARTITION OVER etc.
Thanks
If you just want to view your data sans the duplicates, use ROW_NUMBER()
:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY COL_A, COL_B ORDER BY ID) rn
FROM yourTable
)
SELECT ID, COL_A, COL_B
FROM cte
WHERE rn = 1;
If you really need to delete the duplicate records, then use exists logic:
DELETE
FROM yourTable t1
WHERE EXISTS (
SELECT 1
FROM yourTable t2
WHERE t2.COL_A = t1.COL_A AND
t2.COL_B = t1.COL_B AND
t2.ID < t1.ID
);