Search code examples
sqlsql-serversql-delete

Delete duplicate rows using certain conditions in MS SQL Server


sid cid arrival departure country state pid email phone
ab1 a101 2023-01-01 2023-04-05 US ohio ly1 [email protected] 12345
ab2 a101 2023-04-05 2023-04-06 US ohio ly1 [email protected] 12345
ab7 a106 2023-01-20 2023-04-26 US Texas ly6 [email protected] 87468
ab8 a107 2023-01-20 2023-04-26 US Texas ly7 [email protected] 55555
ab1 a109 2023-01-20 2023-04-28 US Texas ly9 [email protected] 55555

If there are more than 1 row with same cid delete it if departure dates between them are 30 days apart. (Here Cid 101 is present more than 1 so we check departure date here, one day difference therefore we keep the latest departure date)


Solution

  • Using the window lead() to get the next departure date, then we check date diff if less than 30 days then we remove the record :

    with cte as (
      select *, 
          lead(stayid) over (partition by cid order by departure) as lead_stayid,
          DATEDIFF(day, departure, lead(departure) over (partition by cid order by departure)) as date_diff
      from mytable
    )
    delete from mytable
    Where stayid in (
                    select stayid
                    from cte
                    where date_diff is not null and date_diff < 30
                    );
    

    Demo here