Search code examples
sql-server

How do I update my record when there is a duplicate value in SQL Server?


I am trying to figure out how to update my record when there is a duplicate value in SQL server.

My data warehouse table has an old record and a new record in it that looks like this:

dw_id hash_key hash_diff week_day active_ind expiry_date updated_date created_date
699 oaijfoieae 93onjios SATURDAY 1 12/31/3030 12/31/3030 04/24/2024
699 oaijfoieae 34hrjios SUNDAY 1 12/31/2030 12/31/3030 04/25/2024

I want it to look like this:

dw_id hash_key hash_diff week_day active_ind expiry_date updated_date created_date
699 oaijfoieae 93onjios SATURDAY 0 04/25/2024 04/25/2024 04/24/2024
699 oaijfoieae 34hrjios SUNDAY 1 12/31/2030 04/25/2024 04/25/2024

I want to update the value where the hash_key is the same for that record but the hash_diff is different for that record. So above, both hash_keys are the same but the hash_diffs are different. I want my query to:

UPDATE DW.dbo.stg_table_name
WHERE hash_key is duplicated for the single record but the hash_difs are different
SET active_ind = 0 WHERE active_ind = 1
AND where expiry_date = 12/31/3030
SET expiry_date = GETDATE()

but then update the new record (SUNDAY) updated_date to the current date as well?

I am not sure if I can do this because I know what I want to do in my mind but my SQL doesn't make sense, outside of not being written in appropriate syntax. Wouldn't both records be updated since both initially have an active_ind of 1 and technically both hash_diff's are different so how would it know which record to update? Maybe I should pick which created_date is more recent? I have it set to the current date so when it gets imported from the source to the data warehouse, it should update the record to the current day...


Solution

  • Try this:

    DROP TABLE IF EXISTS #source;
    CREATE TABLE #source (dw_id integer, hash_key nvarchar(50),hash_diff nvarchar(50),week_day nvarchar(50),active_ind bit, expiry_date date, updated_date date, created_date date);
     
    INSERT INTO #source
    VALUES
    (699, 'oaijfoieae','93onjios','SATURDAY',1,   '3030-12-31','3030-12-31', '2024-04-24'),
    (699, 'oaijfoieae','34hrjios','SUNDAY', 1,  '2030-12-31', '2030-12-31', '2024-04-25')
    
    -- Get all duplicated hash_keys 
    ;with cte as (
    select hash_key  from #source
    group by hash_key
    having count(hash_diff)>1
    )
    
    -- update rows based on hash_keys
    -- rw = 1 is the first one based on created_date= 04/24/2024
    update #source set active_ind = case when tbl.rw=1 then 0 else 1 end,
    expiry_date = case when tbl.rw=1 then GETDATE() else expiry_date end,
    updated_date=GETDATE()
    
    from
    (select hash_key, hash_diff, ROW_NUMBER() over(partition by hash_key order by created_date) rw  from #source
    where hash_key in (select hash_key from cte)
    ) tbl
    
    where #source.hash_key = tbl.hash_key and #source.hash_diff=tbl.hash_diff
    
    select * from #source
    

    First I got the duplicated rows based on hash_key and then updated the table based on Row_Number.

    SQL Fiddle