Reallocate duplicate record references and remove redundant records

I have duplicated data in Table 1. All information in the fields listed below must be identical to be considered a duplicate. The table has other columns but these are inconsequential. The unique identifier is PDS_Item_Ref.

PDS_Item_Ref	ItemRef	Age	Life	Condition	Comments	Picturefile	DETAILS_REF
1830	32976	5	26	Average	No access	NULL	16
1872	32976	5	26	Average	No access	NULL	16
1900	32976	5	26	Average	No access	NULL	16

These records are referenced to other data in Table 2 using the PDS_Item_Ref column. Some or all of the various records may be referenced.

Collection_Ref	PDS_Item_Ref	Quantity
1000	1830	3
1001	1872	5
1002	1900	6
1003	1830	6

I need to update the PDS_Item_Ref field in Table 2 to select the duplicated item with the lowest [PDS_Item_Ref] and then delete the remainder of the now redundant records in Table 1. In the example above, item 1830 would be retained and all others would be removed.

Solution

This can be done in a lot of ways, I prefer to use a three-step model:

Get duplicates into a temporary table
Update referenced tables
Remove duplicates from main table

DROP TABLE #maintable
DROP TABLE #dupes
DROP TABLE #updatetable
SELECT  *
INTO #maintable
FROM    (
        VALUES  (1830, 32976, 5, 26, N'Average', N'No access', 0, N'NULL', 0, 16)
        ,   (1872, 32976, 5, 26, N'Average', N'No access', 0, N'NULL', 0, 16)
        ,   (1900, 32976, 5, 26, N'Average', N'No access', 0, N'NULL', 0, 16)
        ,   (1901, 32976, 5, 26, N'Average', N'Has access', 0, N'NULL', 0, 16)
    ) t (PDS_Item_Ref,ItemRef,Age,Life,Condition,Comments,HS,Picturefile,Immediate,DETAILS_REF)
    
SELECT  PDS_item_ref, min_ref
INTO #dupes
FROM    (
    SELECT  *
    ,   MIN(PDS_item_ref) OVER(partition BY ItemRef,Age,Life,Condition,Comments,HS,Picturefile,Immediate,DETAILS_REF ORDER BY pds_item_ref) AS min_ref
    FROM    #maintable
) x
WHERE   x.min_ref <> PDS_Item_Ref

-- create index...
CREATE UNIQUE CLUSTERED INDEX IX_#dupes ON  #dupes (PDS_item_ref)

SELECT  *
INTO #updatetable
FROM    (

    VALUES  (1000, 1830, 3)
    ,   (1001, 1872, 5)
    ,   (1002, 1900, 6)
    ,   (1003, 1830, 6)
) t (Collection_Ref,PDS_Item_Ref,Quantity)


-- update dups...
SET xact_abort ON;

BEGIN TRAN
    
    UPDATE  t
    SET PDS_Item_Ref = d.min_ref
    FROM    #updatetable t
    INNER JOIN #dupes d
        ON  d.PDS_item_ref = t.PDS_item_ref
    
    -- update others...
    
    -- Finally delete
    
    DELETE  t
    FROM    #maintable t
    INNER JOIN #dupes d
        ON  d.PDS_item_ref = t.PDS_Item_Ref

    SELECT  *
    FROM    #updatetable
    SELECT  *
    FROM    #maintable
COMMIT TRAN;

First I create some test data and then get all duplicate rows. MIN(PDS_item_ref) OVER(partition BY ItemRef,Age,Life,Condition,Comments,HS,Picturefile,Immediate,DETAILS_REF ORDER BY pds_item_ref) is a window aggregate designed for two things, to find groups by all the columns and get the minimum ID for each group.

WHERE x.min_ref <> PDS_Item_Ref this ensures we only get the duplicate rows of the groups

Finally we can update the related tables and delete from the main one. It's a good idea to do things in transaction so you don't end up with partially updated data in case something goes wrong.

And most importantly, create a backup before starting working on this.