Search code examples
concurrencyssisscaleetl

How to scale concurrent ETL tasks up to an arbitrary number in SSIS?


Problem (See Context below)

How can I scale individual tasks (e.g. downloading and parsing) to an arbitrary number of concurrent executions (e.g. 500) in SSIS?

Setup Description

Our setup is that we have a list of feed urls we want to visit, get all items and insert them into the database.

Currently a php script downloads them concurrently, parses them sequentially and dumps them into csv which are later on inserted into the database using load data infile. ETL packages can handle one way or another all steps above.


Solution

  • This is controlled by the Package Property: MaxConcurrentExecutables. The default is -1 which means machine cores x 2, and usually works well.

    You can also affect this by setting EngineThreads on each Data Flow Task.

    Here's a good summary: http://blogs.msdn.com/b/sqlperf/archive/2007/05/11/implement-parallel-execution-in-ssis.aspx