Search code examples
sqlsql-serverdatabase-administration

Deleting rows without overloading the database


The table contains about 100m rows of data per year. I wanted to remove all data except for last 2 years of data. Data starts from 2017.
To do so, instead of simple delete, I used while loop for delete top 10,000 rows.
During the process, I was wondering, how can I observe the best number of rows to be deleted per loop?
How can I do so without leaving trail in transaction log?
I want to loop delete with most efficiency without burdening the database.

DECLARE
@TargetDT VARCHAR(10),@Counter INT = 1
SET @TargetDT = '2018-01-01'

WHILE (@Counter <= 100)
BEGIN
delete top(10000) from  MYTABLE where DATE_COLUMN < @TargetDT 
SET @Counter = @Counter + 1
END

I tried different number of rows to be deleted per loop: 500, 1000, 2000, 5000, 10000


Solution

  • The best method is not to delete at all. This can be very slow, cause huge amounts of transaction log growth, and if you try to SHRINK the log you can get even more problems.

    Instead, copy the data you need into a new table then do a switcheroo and drop the other table.

    The best way to do the switcheroo is not to use sp_rename, but instead to use ALTER TABLE SWITCH. For this it's best to use two separate staging tables, which have the exact same definition as the main table.

    Switch your database to bulk-logged, which means big inserts get much less logging. Use the TABLOCK hint on the INSERT for minimal logging, as well as parallel insert.

    CREATE TABLE Staging1 (
      -- exact same columns, types, nullable
      -- exact same primary, unique, foreign keys, check constraints
    );
    
    CREATE TABLE Staging2 (
      -- exact same columns, types, nullable
      -- exact same primary, unique, foreign keys, check constraints
    );
    
    INSERT Staging2 WITH (TABLOCK)
      (ColumsHere)
    SELECT ColumnsHere
    FROM YourTable
    WHERE whatever;  --all data to keep
    

    If you are constantly inserting then you need to find a way to keep the new table in sync while copying (maybe using triggers). You can also use SNAPSHOT ISOLATION to avoid blocking the old table.


    Now you can do the switch

    SET XACT_ABORT, NOCOUNT ON;      -- force immediate rollback if session is killed
    
    BEGIN TRAN;
    
    ALTER TABLE YourTable SWITCH TO Staging1
    WITH ( WAIT_AT_LOW_PRIORITY ( MAX_DURATION = 1 MINUTES, ABORT_AFTER_WAIT = BLOCKERS ));
    -- not strictly necessary to use WAIT_AT_LOW_PRIORITY but better for blocking
    -- use SELF instead of BLOCKERS to kill your own session
    
    ALTER TABLE Staging2 SWITCH TO YourTable
    WITH (WAIT_AT_LOW_PRIORITY (MAX_DURATION = 0 MINUTES, ABORT_AFTER_WAIT = BLOCKERS));
    -- force blockers off immediately
    
    COMMIT TRAN;
    

    Finally drop or delete the old data

    TRUNCATE TABLE Staging1;
    

    All of this switching is entirely metadata-only, and very very fast (a matter of milliseconds). It takes up a tiny amount of logging. You also benefit by having rebuilt the indexes for the new size of data.


    If you don't have enough space on your disk, you can add a temporary disk (internal/external SSD drive etc) and add a FILE to the same filegroup. Then after you truncate the old data, you can delete the file from the filegroup as explained here. Do not create a new filegroup, this code won't work in that case.