Search code examples
ssislarge-data-volumesforeach-loop-container

How to use for each loop to help load large dataset


I'm trying to load a large dataset from SQL Server 2008 in SSIS. However, it's too slow for Visual Studio load everything at once. Then I decide to use for-each loop to just load part of the table each time.

E.g. If there are 10 million records, I wish I could just load 1 million each time and run for 10 times to complete processing.

This is just my "brain design" and I've no idea how to make it with Foreach Loop component. Is there any other approach to deal with a large dataset?


Solution

  • The best way in my opinion is to functionally partition your data. A date column is in most cases appropriate to do this. Let's take an order date as an example.

    For that column, find the best denominator, for example each year of your order date produces about a million rows.

    Instead of a for each loop container, use a for loop container.

    To make this loop work, you'll have to find the minimum and maximum year of all order dates in your source data. These can be retrieved with SQL statements that save their scalar result into SSIS variables.

    Next, set up your for loop container to loop between the minimum and maximum year that you stored in variables earlier, adding one year per iteration.

    Lastly, to actually retrieve your data, you'll have to save your source SQL statement as an expression in a variable with a where clause that specifies the current year of produced by your for loop container:

    "SELECT * FROM transactions WHERE YEAR(OrderDate) = " + @[User::ForLoopCurrentYear]
    

    Now you can use this variable in a data flow source to retrieve your partitioned data.

    Edit:

    A different solution using a for each loop container would be to retrieve your partition keys with a Execute SQL Task and saving that resultset in a SSIS variable of type Object:

    SELECT YEAR(OrderDate) FROM transaction GROUP BY YEAR(OrderDate)
    

    With a for each loop container you can loop through the object using the ADO enumerator and use the same method as above to inject the current year into your source SQL statement.