Search code examples
sql-serverperformancestored-proceduresdatabase-performance

Fastest way to delete millions of registers in sql server


I have a big table in SQL server which contains 136.651.894 of rows. One month ago were inserted 7.668.309 new rows with an error in one field. In the moment of the insert I create a copy table to ensure that this does not happen, but nobody saw the error and I dropped the table one week after the load process.

To delete the rows I create a procedure which select the max values per myID from the original table (this is because the error was that we add more zeros in the field value, so the max value is wrong and this register has to be deleted), and in a backup table I search this registers for myID and value and I delete the row.

For example one row is:

ID myId Value Other fields...
2 2345 25948238400 Other values...

The procedure is:

CREATE PROCEDURE [dbo].[P_DELETE_ROWS] 
AS BEGIN
    DECLARE @myId VARCHAR(22)
    DECLARE @value VARCHAR(20)

  DECLARE c_max CURSOR 
  FOR
    SELECT myId,max ([value]) as maxValue
    FROM t1Original
    group by  myId
    order by maxValuedesc

  OPEN c_max
    FETCH NEXT FROM c_max
    INTO @myId ,@value

  WHILE @@FETCH_STATUS = 0  
    BEGIN
        print 'Deleting '+@myId+' y caudal '+@value +''
        DELETE FROM [t1OriginalCopy] WHERE myId=@myIdAND value=@value
        FETCH NEXT FROM c_max 
        INTO @myId,@value
    END

  CLOSE c_maxs
  DEALLOCATE c_max
END

The problem of this is that it takes too much time, now is over 1 day of execution...

How can I improve the performance of this process?


Solution

  • Use DENSE_RANK instead of CURSOR.

    Try this

    ;WITH cte
         AS (SELECT Dense_rank()OVER(partition BY myId ORDER BY value DESC) rn,*
             FROM   [t1OriginalCopy])
    DELETE FROM cte
    WHERE  rn = 1 
    

    If the table(t1OriginalCopy) will be used in parallel in some other operation then you may have to split the delete into batches to avoid locks