Search code examples
sql-serverdatabase-performancebulkupdate

Updating billion rows table


We are having a table with billion rows. We just want to UPDATE two columns to NULL for all rows. We were trying to do this in batches. But, we are concerned about the performance issues. We are having clustered columnstore index on this table. Does creating composite non-clustered index on these three columns would help ? or can we re-load the table to new table & swap the tables ?

Any inputs will be very helpful.

DECLARE @notNULLRecordsCount INT; 

SET @notNULLRecordsCount = 1;

WHILE @notNULLRecordsCount > 0 
    BEGIN 
    BEGIN TRANSACTION;

    UPDATE  TOP (100000)
    dbo.BillionRowsTable
    SET     Column1 = NULL,
            Column2 = NULL,
            Column3 = NULL     
    WHERE   Column1 IS NOT NULL OR Column2 IS NOT NULL OR Column3 IS NOT NULL;


SET @notNULLRecordsCount = @@ROWCOUNT;
COMMIT TRANSACTION;
END 

Solution

  • My experience is that indexes actually slow this down as the indexes must be maintained.

    Leaving a transaction open is not good. Wrapping all these in a transaction is not good. If you are going to wrap them all in a transaction then no purpose to breaking them up.

    If you could key on just one column (no or) it would be faster.

    You can shorten this

    select 1;
    WHILE @@ROWCOUNT > 0 
    BEGIN 
        UPDATE  TOP (100000)
        dbo.BillionRowsTable
        SET     Column1 = NULL,
                Column2 = NULL,
                Column3 = NULL     
        WHERE   Column1 IS NOT NULL 
             OR Column2 IS NOT NULL 
             OR Column3 IS NOT NULL
    END
    

    To get rid of the or

    select 1;
    WHILE @@ROWCOUNT > 0 
    BEGIN 
        UPDATE  TOP (100000)
        dbo.BillionRowsTable
        SET     Column1 = NULL,
                Column2 = NULL,
                Column3 = NULL     
        WHERE   Column1 IS NOT NULL
    END
    
    select 1;
    WHILE @@ROWCOUNT > 0 
    BEGIN 
        UPDATE  TOP (100000)
        dbo.BillionRowsTable
        SET     Column2 = NULL,
                Column3 = NULL     
        WHERE   Column2 IS NOT NULL
    END
    
    select 1;
    WHILE @@ROWCOUNT > 0 
    BEGIN 
        UPDATE  TOP (100000)
        dbo.BillionRowsTable
        SET     Column3 = NULL     
        WHERE   Column3 IS NOT NULL
    END