We are having a table with billion rows. We just want to UPDATE two columns to NULL for all rows. We were trying to do this in batches. But, we are concerned about the performance issues. We are having clustered columnstore index on this table. Does creating composite non-clustered index on these three columns would help ? or can we re-load the table to new table & swap the tables ?
Any inputs will be very helpful.
DECLARE @notNULLRecordsCount INT;
SET @notNULLRecordsCount = 1;
WHILE @notNULLRecordsCount > 0
BEGIN
BEGIN TRANSACTION;
UPDATE TOP (100000)
dbo.BillionRowsTable
SET Column1 = NULL,
Column2 = NULL,
Column3 = NULL
WHERE Column1 IS NOT NULL OR Column2 IS NOT NULL OR Column3 IS NOT NULL;
SET @notNULLRecordsCount = @@ROWCOUNT;
COMMIT TRANSACTION;
END
My experience is that indexes actually slow this down as the indexes must be maintained.
Leaving a transaction open is not good. Wrapping all these in a transaction is not good. If you are going to wrap them all in a transaction then no purpose to breaking them up.
If you could key on just one column (no or
) it would be faster.
You can shorten this
select 1;
WHILE @@ROWCOUNT > 0
BEGIN
UPDATE TOP (100000)
dbo.BillionRowsTable
SET Column1 = NULL,
Column2 = NULL,
Column3 = NULL
WHERE Column1 IS NOT NULL
OR Column2 IS NOT NULL
OR Column3 IS NOT NULL
END
To get rid of the or
select 1;
WHILE @@ROWCOUNT > 0
BEGIN
UPDATE TOP (100000)
dbo.BillionRowsTable
SET Column1 = NULL,
Column2 = NULL,
Column3 = NULL
WHERE Column1 IS NOT NULL
END
select 1;
WHILE @@ROWCOUNT > 0
BEGIN
UPDATE TOP (100000)
dbo.BillionRowsTable
SET Column2 = NULL,
Column3 = NULL
WHERE Column2 IS NOT NULL
END
select 1;
WHILE @@ROWCOUNT > 0
BEGIN
UPDATE TOP (100000)
dbo.BillionRowsTable
SET Column3 = NULL
WHERE Column3 IS NOT NULL
END