Search code examples
sql-serveroracle-databaset-sqldatabase-administrationdata-migration

Is there a way to optimize data migration from Oracle to SQL Server and reduce the time spent?


I am migrating data from an Oracle (12c non-pluggable) database to SQL server (2012) using T-SQL scripts in SSMS with the Oracle DB set up as a Linked server. The schemas are already synchronized so the only thing left is the data. The issues I face is with the time the migration takes and the amount of memory it consumes.

When running my script, it takes a very long time and initially I ran into memory issues on the SQL Server-server when migrating full tables, so I decided to split the migration of the tables into chunks of 1,000,000 rows at a time. However, it seems as if there is a "memory leak" in my code as the memory consumed by the query is increasing with every iteration, and the query is taking a very long time.

EDIT: I have dropped the indexes in the MS SQL Server database.

My script works, but when migrating the larger tables, the query will close in on the available memory and the migration is taking around 5 min per 1 mill rows in the start (and increasing slowly with every iteration). Of course the time is dependent on the amount of rows in the table etc. as well.

Data statistics: * Tables: ~1600 * Rows in total: ~1 Bill. (Largest table being 300 mill. rows)

USE INFODBA
GO

SET NOCOUNT ON

DECLARE @start BIGINT
DECLARE @end BIGINT
DECLARE @maxrows BIGINT
DECLARE @step BIGINT
DECLARE @sqlstr NVARCHAR(4000)
DECLARE @table_name VARCHAR(255)
DECLARE @counter INT
DECLARE @time TIME
DECLARE @error NVARCHAR(4000)

-- Iterates in @step rows at a time
SET @step = 1000000;
SET @start = 0;
SET @end = @start + @step;
SET @counter = 1;

SET @table_name = 'sourceTable'
PRINT @table_name;

-- GET exact rowcount of Oracle table
SELECT @maxrows = NUM_ROWS FROM OPENQUERY(ORACLETC, 'SELECT COUNT(*) AS NUM_ROWS FROM sourceTable')

WHILE @start < @maxrows
BEGIN
    SELECT @time = CONVERT (time, CURRENT_TIMESTAMP)

    SET @sqlstr = 'INSERT INTO targetTable SELECT * FROM OPENQUERY(ORACLETC,''SELECT COL1,COL2,COL3,COL4 FROM sourceTable'
    SET @sqlstr = @sqlstr + ' OFFSET ' + CAST(@start AS NVARCHAR(255)) + ' ROWS FETCH NEXT ' + CAST(@step AS NVARCHAR(255)) + ' ROWS ONLY'') AS ROWSET_1';

    -- Print output immediatly to capture progress
    PRINT 'Iteration;' + CAST(@counter AS VARCHAR(255)) + ';Time;' + CAST(@time AS VARCHAR(255)) + ';Start;' + CAST(@start AS VARCHAR(255)) + ';End;' + CAST(@end AS VARCHAR(255)) + ';MAX;' + CAST(@maxrows AS VARCHAR(255)) + ';Query;' + @sqlstr
    RAISERROR (N'', 0, 1) WITH NOWAIT

    -- Start the migration query and catch error messages
    BEGIN TRY
        BEGIN TRANSACTION;
            EXEC dbo.sp_executesql @sqlstr
        COMMIT;
    END TRY
    BEGIN CATCH
        SELECT @error = ERROR_MESSAGE();
        PRINT 'ERROR on iteration: ' + CAST(@counter AS VARCHAR(255)) + ' with query: ' + @sqlstr + ' - Error: ' + @error
        SELECT ERROR_MESSAGE() AS ErrorMessage;
        RETURN
    END CATCH

    SET @counter += 1
    SET @start = @end
    SET @end += @step
END

This script will migrate the data in sets of 1 mill rows, but now it seems as if the oracle query is taking the most time (about 80 %) according to the execution plan. Also, although I try to use the "begin transaction" and "commit" (might be that there is a better way to use it), the script is as mentioned increasing the memory footprint for each iteration (most of the memory is released on commit, but it increases slowly in the background)


Solution

  • There are a number of approaches.

    1. Remove indexes on SQL Server table you import the data to and create them again after the import.

    2. Separate export, transfer and load. That means extract data on Oracle server to plain text files, then transfer them to SQL Server machine by ftp, then load them to SQL Server.

    3. (Most effective way) Use MS SQL Server bulk load: https://learn.microsoft.com/en-us/sql/relational-databases/import-export/bulk-import-and-export-of-data-sql-server?view=sql-server-2017 In this case your 1m of records will not be processed through whole database but will be written directly to database files, it's much faster.