I have two tables with same column structure in the same database: TableA
and TableB
.
TableA
doesn't have any indexes, but TableB
has a non-clustered unique index.
TableA
has 290 Million rows of data that needs to be copied to TableB
.
As they both have same structure, I've tried
INSERT INTO TableB
SELECT *
FROM TableA;
It was executing for hours and produced a huge log file that filled the disk. As a result the disk ran out of space and the query was killed.
I can shrink the log file. How can I copy these many rows of data to another table efficiently?
First of all, disable the index on TableB
before inserting the rows. You can do it using T-SQL:
ALTER INDEX IX_Index_Name ON dbo.TableB DISABLE;
Make sure to disable all the constraints (foreign keys, check constraints, unique indexes) on your destination table.
Re-enable (and rebuild) them after the load is complete.
Now, there's a couple of approaches to solve the problem:
INSERT INTO ... SELECT ... FROM ...
syntax you have but switch your database to Bulk-logged recovery mode first (read before switching). Won't help if you're already in Bulk-logged or Simple.ROWS_PER_BATCH
option.Old-school "I don't give a damn" method: to prevent the log from filling up you will need to perform the inserts in batches of rows, not everything at once. If your database is running in Full recovery mode you will need to keep log backups running, maybe even trying to increase the frequency of the job.
To batch-load your rows you will need a WHILE
(don't use them in
day-to-day stuff, just for batch loads), something like the
following will work if you have an identifier in the dbo.TableA
table:
DECLARE @RowsToLoad BIGINT;
DECLARE @RowsPerBatch INT = 5000;
DECLARE @LeftBoundary BIGINT = 0;
DECLARE @RightBoundary BIGINT = @RowsPerBatch;
SELECT @RowsToLoad = MAX(IdentifierColumn) dbo.FROM TableA
WHILE @LeftBoundary < @RowsToLoad
BEGIN
INSERT INTO TableB (Column1, Column2)
SELECT
tA.Column1,
tB.Column2
FROM
dbo.TableA as tA
WHERE
tA.IdentifierColumn > @LeftBoundary
AND tA.IdentifierColumn <= @RightBoundary
SET @LeftBoundary = @LeftBoundary + @RowsPerBatch;
SET @RightBoundary = @RightBoundary + @RowsPerBatch;
END
For this to work effectively you really want to consider creating an
index on dbo.TableA (IdentifierColumn)
just for the time you're
running the load.