Search code examples
sql-serverparallel-processingbulkinsertsqlbulkcopybulk-load

Importing Data in Parallel in SQL Server


I have more than 100 files to import into the sql server and most of them are of 500 MB. I want to leverage the the parallel import utility of SQL server and have read a number of webpages, like the following ones:

How to load 1 TB data in 30 minutes

https://technet.microsoft.com/en-us/library/dd537533(v=sql.100).aspx

Importing Data in Parallel with Table Level Locking

https://technet.microsoft.com/en-us/library/ms186341(v=sql.105).aspx

Controlling Locking Behavior for Bulk Import

https://technet.microsoft.com/en-us/library/ms180876(v=sql.105).aspx

and the answers in stackoverflow

Fastest way to insert in parallel to a single table

However, none of them have given a simple example with code. I know how to use bulk insert/bcp, but I don't know where to start with parallel import? Can anyone help me with it?

My system is Windows and I'm using SQL server 2016. The source data file is in txt format.

Thanks in advance for your help!

Jason


Solution

    1. Load the file path details into a tracking table

      Create table FileListCollection TABLE (Id int identity(1,1), filepath VARCHAR(500), ThreadNo tinyint, isLoaded int)
      
      DECLARE @FileListCollection TABLE (filepath VARCHAR(500))
      DECLARE @folderpath NVARCHAR(500)
      DECLARE @cmd NVARCHAR(100)
      SET @folderpath = '<FolderPath>' 
      SET @cmd = 'dir ' + @folderpath + ' /b /s'
      
      INSERT INTO @FileListCollection
      EXECUTE xp_cmdshell @cmd
      
      DELETE
      FROM @FileListCollection
      WHERE filepath IS NULL
      
      insert into FileListCollection(filepath, isLoaded)
      select filepath, 0
      from @FileListCollection
      
    2. Schedule for each thread

      declare @ThreadNo int = 3
      update f set ThreadNo=(id%@ThreadNo)
      from FileListCollection f 
      
    3. Open three session and assign thread number to each

    4. Run the below script to load the data

      DECLARE @filepath NVARCHAR(500)
      DECLARE @filepath NVARCHAR(500)
      DECLARE @bcpquery NVARCHAR(MAX);
      DECLARE @ThreadNo int = 1
      WHILE EXISTS (
              SELECT TOP 1 *
              FROM FileListCollection
              where ThreadNo = @ThreadNo
              and isLoaded = 0
              )
      BEGIN
          SELECT TOP 1 @filepath = filepath
          FROM FileListCollection
          where ThreadNo = @ThreadNo
              and isLoaded = 0
      
          SET @bcpquery = 'bulk insert <Database>.dbo.Table from '''+ @filepath+''' with (fieldterminator = ''|'', rowterminator = ''\n'')';
      
          print @bcpquery
          --Load the Content in table
          execute sp_executesql @bcpquery;
      
          Update FileListCollection set isLoaded = 1
          WHERE filepath = @filepath
      END