Search code examples
c#sqlt-sqletlstaging-table

Best strategy to perform ETL from staging to warehouse


I am building a C# console application that will be executed by SQL Server Agent in a scenario where we want to transfer a big amount of employee data:

  1. Between two employee tables with the same schema on two different networks from one SQL Server instance towards a second SQL Server instance.

  2. Between two employee tables on the same SQL Server instance (without network latency), one table being the staging table and the other being the warehouse.

The process is sequential, so the ETL job would have to transfer #1 first, then #2.

I was wondering what was the best strategy to adopt in order to get the best performance considering that the process #1 is inter-network and process #2 involves deleting the destination table before transferring the new data to it.

Here is what I thought of doing:

Process #2:

BEGIN TRANSACTION EmployeesInsert
    WITH MARK N'Transfering employees from Staging to Warehouse';
GO
USE CorporateWarehouse;
GO
DELETE FROM CorporateWarehouse.WarehouseEmployee
INSERT INTO CorporateWarehouse.WarehouseEmployee
            (FirstName,
             LastName,
             Address,
             PhoneNumber)
SELECT FirstName,
       LastName,
       Address,
       PhoneNumber
FROM CorporateWarehouse.StagingEmployee
GO
COMMIT TRANSACTION EmployeesInsert;
GO

Solution

  • After considering SSIS, BCP and linked servers, I finally went with the decision to implement a solution using the SqlBulkCopy class. Starting from a list of users obtained by the GetUserDataReader() method that returns a IDataReader type object.

    This option was chosen because of the UseInternalTransaction option, you can explicitly cause it to execute in its own transaction:

    var usersDataReader = _warehouseRepository.GetUserDataReader();
    var connectionString = ConfigurationManager.ConnectionStrings["CorporateWarehouse"].ToString();
    
    using (var bulkCopy = new SqlBulkCopy(connectionString, SqlBulkCopyOptions.UseInternalTransaction | SqlBulkCopyOptions.TableLock))
    {
        bulkCopy.BatchSize = extractInfo.BatchSize;
        bulkCopy.BulkCopyTimeout = extractInfo.BatchTimeout;
        bulkCopy.DestinationTableName = "StagingEmployee";
    
        try
        {
            bulkCopy.WriteToServer(usersDataReader);
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex.Message);
        }
        finally
        {
            usersDataReader.Close();
        }
    }