Search code examples
sqlsql-serverduplicatessql-delete

I want to design a sql code to delete duplicates, the only thing is it is a bit more complex than just using a CTE


Below is the how the table looks

PatientKey     Accotno     Date    EHR
yur567          123         4/5    HUB
yur567          123         4/5    REF 
yur789          780         4/9    HUB
yte789          567         7/8    HUB
yte765          569         7/7    REF
yte765          564         7/8    HUB
yur789          654         4/5    REF
yur789          654         4/5    HUB

I want to write a sql code so that where the patientKey is the same in both rows, delete REF and Keep HUB. At the sametime I dont want an other 'REF' data to be deleted. I only want to delete data where other REF(which are not duplicated to be deleted)

Desired Output:

PatientKey     Accotno     Date    EHR
yur567          123         4/5    HUB
yur789          780         4/9    HUB
yte789          567         7/8    HUB
yte765          569         7/7    REF
yte765          564         7/8    HUB
yur789          654         4/5    HUB

If you can see in the output all the data that had EHR=REF is deleted and now we have unique rows.


Solution

  • You can use exists:

    delete t
    from mytable t
    where ehr = 'REF' and exists (
        select 1
        from mytable t1
        where t1.PatientKey = t.PatientKey and t1.date = t.date and t1.ehr = 'HUB'
    )
    

    This deletes "REF"s for which another "HUB" row exists for the same date and PatientKey.