Search code examples
t-sqlsql-server-2008deduplication

SQL Server 2008 De-duping


Long story short, I took over a project and a table in the database is in serious need of de-duping. The table looks like this:

supply_req_id | int      | [primary key]
supply_req_dt | datetime | 
request_id    | int      | [foreign key]
supply_id     | int      | [foreign key]
is_disabled   | bit      |

The duplication is exists with records having the same request_id and supply_id. I'd like to find a best practice way to de-dupe this table.

[EDIT]
@Kirk_Broadhurst, thanks for the question. Since supply_req_id is not referenced anywhere else, I would answer by saying keep the first, delete any subsequent occurances.

Happy Holidays


Solution

  • This creates a rank for each row in the (supply_req_dt, request_id) grouping, starting with 1 = lowest supply_req_id. Any dupe has a value > 1

    ;WITH cDupes AS
    (
        SELECT
         supply_req_id,
         ROW_NUMBER() OVER (PARTITION BY supply_req_dt, request_id ORDER BY supply_req_id) AS RowNum
        FROM
         MyTable
    )
    DELETE
       cDupes
    WHERE
       RowNum > 1
    

    Then add a unique constraint or INDEX

    CREATE UNIQUE INDEX IXU_NoDupes ON MyTable (supply_req_dt, request_id)