Search code examples
sqlsql-serverduplicates

How to keep the 1st record of duplicate records


I know there are many questions over this and many solutions, most of them specific to each individual's case. However, in my case I cannot seem to get just the sets of records that are to be deleted, to leave only the 1st ID or the lowest-value ID.

This is a sample of my table:

enter image description here

ID          COL_A                               COL_B   what I need to do
24556552    KD PT PERSONAL TRAINING, AND 58 MOU 5691212 
24556924    KD PT PERSONAL TRAINING, AND 58 MOU 5691212 <-- remove
24556869    KHETSHI DHARAMSHI & CO LTD,         5690443 
24556951    KHETSHI DHARAMSHI & CO LTD,         5690443 <-- remove
24528435    KINNY TINT LDA                      5689791 
24528436    KINNY TINT LDA                      5689791 <-- remove
24528437    KINNY TINT LDA                      5689791 <-- remove
24528438    KINNY TINT LDA                      5689791 <-- remove
24528439    KINNY TINT LDA                      5689791 <-- remove

Is there a way to link this table back to itself, so I can only get the first occurrence of it, keep it and delete the rest? I am not familiar with PARTITION OVER etc.

Thanks


Solution

  • If you just want to view your data sans the duplicates, use ROW_NUMBER():

    WITH cte AS (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY COL_A, COL_B ORDER BY ID) rn
        FROM yourTable
    )
    
    SELECT ID, COL_A, COL_B
    FROM cte
    WHERE rn = 1;
    

    If you really need to delete the duplicate records, then use exists logic:

    DELETE
    FROM yourTable t1
    WHERE EXISTS (
        SELECT 1
        FROM yourTable t2
        WHERE t2.COL_A = t1.COL_A AND
              t2.COL_B = t1.COL_B AND
              t2.ID < t1.ID
    );