Search code examples
sqlsql-serverdatetimegreatest-n-per-groupsql-delete

Efficient way to have only the latest data and delete the rest from a table


I am trying to find the quickest way to delete the data from my Table. My logic is to only have the latest 2 days of data within Table1 for a given Lot_ID and ID is a unique Primary Key in the Table.

My data isn't much but it still takes me around 8-9 mins for the below query execution.

WITH CTE AS
(
 select t.ID
from (select t1.*,
             DENSE_RANK() over (partition by Lot_ID order by TRY_CONVERT(DATE, DATEADD(second, t1.starttime, '19700101') )           
              desc) as seqnum
      from  Table1 t1
     ) t
where seqnum >2
)
DELETE Table1 WHERE EXISTS(select 1 from CTE where CTE.ID = Table1.ID )

Is there a fastest or a better approach to do this?


Solution

  • You could try and delete directly from the cte rather than reopening the table:

    with cte as(
        select dense_rank() over (
            partition by lot_id 
            order by try_convert(date, dateadd(second, t1.starttime, '19700101')) desc
        ) as seqnum
        from  table1 t1
    )
    delete from cte where seqnum > 2
    

    Your query suggests that startime is an epoch timestamp (so an int datatype), so another possible optimization is to use arithmetics rather than date conversion:

    with cte as(
        select dense_rank() over (
            partition by lot_id 
            order by t1.starttime / 60 / 60 / 24 desc
        ) as seqnum
        from  table1 t1
    )
    delete from cte where seqnum > 2
    

    If none of that helps, then you might want to consider inversing the logic: that is, moving the records you want to retain to a temporary table, then tuncating and refilling the original table.