Search code examples
sql-serversql-server-2014

How to copy large number of data from one table to another in same database?


I have two tables with same column structure in the same database: TableA and TableB.

TableA doesn't have any indexes, but TableB has a non-clustered unique index.

TableA has 290 Million rows of data that needs to be copied to TableB.

As they both have same structure, I've tried

INSERT INTO TableB 
    SELECT * 
    FROM TableA;

It was executing for hours and produced a huge log file that filled the disk. As a result the disk ran out of space and the query was killed.

I can shrink the log file. How can I copy these many rows of data to another table efficiently?


Solution

  • First of all, disable the index on TableB before inserting the rows. You can do it using T-SQL:

    ALTER INDEX IX_Index_Name ON dbo.TableB DISABLE;  
    

    Make sure to disable all the constraints (foreign keys, check constraints, unique indexes) on your destination table.

    Re-enable (and rebuild) them after the load is complete.

    Now, there's a couple of approaches to solve the problem:

    1. You have to be OK with a slight chance of data loss: use the INSERT INTO ... SELECT ... FROM ... syntax you have but switch your database to Bulk-logged recovery mode first (read before switching). Won't help if you're already in Bulk-logged or Simple.
    2. With exporting the data first: you can use the BCP utility to export/import the data. It supports loading data in batches. Read more about using the BCP utility here.
    3. Fancy, with exporting the data first: With SQL 2012+ you can try exporting the data into binary file (using the BCP utility) and load it by using the BULK INSERT statement, setting ROWS_PER_BATCH option.
    4. Old-school "I don't give a damn" method: to prevent the log from filling up you will need to perform the inserts in batches of rows, not everything at once. If your database is running in Full recovery mode you will need to keep log backups running, maybe even trying to increase the frequency of the job.

      To batch-load your rows you will need a WHILE (don't use them in day-to-day stuff, just for batch loads), something like the following will work if you have an identifier in the dbo.TableA table:

      DECLARE @RowsToLoad BIGINT;
      DECLARE @RowsPerBatch INT = 5000;
      DECLARE @LeftBoundary BIGINT = 0;
      DECLARE @RightBoundary BIGINT = @RowsPerBatch;
      
      SELECT @RowsToLoad = MAX(IdentifierColumn) dbo.FROM TableA
      
      WHILE @LeftBoundary < @RowsToLoad
      BEGIN
          INSERT INTO TableB (Column1, Column2)
          SELECT
              tA.Column1,
              tB.Column2
          FROM
              dbo.TableA as tA
          WHERE
              tA.IdentifierColumn > @LeftBoundary
              AND tA.IdentifierColumn <= @RightBoundary
      
          SET @LeftBoundary = @LeftBoundary + @RowsPerBatch;
          SET @RightBoundary = @RightBoundary + @RowsPerBatch;
      END
      

      For this to work effectively you really want to consider creating an index on dbo.TableA (IdentifierColumn) just for the time you're running the load.