Search code examples
sql-servert-sqlsql-deletelarge-data

Deleting 3 million data takes a lot of time


I want to delete three million data on data basis in sql server. I am using a very simple query and batch deleting the files but the command is executing and still running for more than 3 hours. What could be the possible reasons for such low performance. Please find the code below that I am using

Create PROC [dbo].[DeleteOneWeekOldData] 
@StartDate DateTime,
@EndDate DateTime

AS

DECLARE @continue INT
DECLARE @rowcount INT

SET @continue = 1
WHILE @continue = 1
BEGIN
    SET ROWCOUNT 1000
    BEGIN TRANSACTION
    DELETE FROM WorkflowContentDetails WHERE  StartDateTime BETWEEN @StartDate AND @EndDate
    SET @rowcount = @@rowcount 
    COMMIT
       IF @rowcount = 0
    BEGIN
        SET @continue = 0
    END
END

GO

Solution

  • A few things that can cause delete to be slow. You already read about it in the comments.

    For ex. you need 3M rows / 1k rows = 3000 times to do search without idndex. And what's more, you're imposes an exclusive lock on the table 3000 times because of the absence an index.

    This is my "DELETE" pattern on production. It solves these problems, even if there is no suitable index. A lock is only on the primary key and only during delete.

    SET NOCOUNT ON
    
    DECLARE 
        @StartDate DateTime = '20130101',
        @EndDate DateTime = '20131201'
    
    DECLARE @chunk_size bigint = 1000
    DECLARE @RowCount bigint;
    DECLARE @delay      DATETIME = '00:00:01'       --- 1 second by defaul, Used for delaying the updates inside the loop, can be 0
    
    DECLARE @Chunk_IDs as TABLE (
        [ID] int NOT NULL,
        [SourcePKID] int not null
    );
    
    IF OBJECT_ID('tempdb..#temp_OldDataIDs') is not null
        DROP TABLE #temp_OldDataIDs;
    
    CREATE TABLE #temp_OldDataIDs (
        [ID]            INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
        [SourcePKID]    INT NOT NULL,
        [DeletedStatus] BIT NOT NULL DEFAULT 0);
    
    INSERT INTO #temp_OldDataIDs ([SourcePKID])
    SELECT [ID]
    FROM WorkflowContentDetails 
    WHERE StartDateTime BETWEEN @StartDate AND @EndDate
    
    SET @RowCount = @@ROWCOUNT;
    
    CREATE NONCLUSTERED INDEX IX_#temp_OldDataIDs on #temp_OldDataIDs ([DeletedStatus]) include([ID],SourcePKID)
    
    WHILE (@RowCount != 0)
    BEGIN
    
        DELETE @Chunk_IDs;
    
        INSERT INTO @Chunk_IDs ([ID], [SourcePKID])
        SELECT TOP (@chunk_size)
            [ID], [SourcePKID]
        FROM #temp_OldDataIDs
        WHERE [DeletedStatus] = 0
        ORDER BY [ID];
    
        SET @RowCount = @@ROWCOUNT;
        IF @RowCount = 0 BREAK;
    
    
        DELETE WorkflowContentDetails
        FROM WorkflowContentDetails
            INNER JOIN @Chunk_IDs ChunkIDs ON WorkflowContentDetails.[ID] = ChunkIDs.[SourcePKID];
    
        UPDATE OldIDs
            SET [DeletedStatus] = 1
        FROM #temp_OldDataIDs OldIDs
            INNER JOIN @Chunk_IDs ChunkIDs ON OldIDs.[ID] = ChunkIDs.[SourcePKID];
    
    -- debug
    -- PRINT CAST(@RowCount as varchar(30)) + ' ' + CONVERT(varchar(30), GETDATE(),121)
    
        -- The requested delay will hold the loop here as requested.
        WAITFOR DELAY  @delay
    END
    
    
    GO