Search code examples
.netvb.netvisual-studio-2010task-parallel-librarytaskfactory

parallel.for or task.startnew in multithreading process


I have a list of strings that I need to pass to a process in a different class. What I want to know which of the two ideas would be a better approach to use in terms of speed, efficiency and parallel processing. The list contains +- 10000 strings and I want to limit the threads appropriately to have only about 5threads running at one time:

For i as integer = 0 to searchPages.Count - 1
    Parallel.For(0,10,Sub(x)
                        ps.processPage(searchPages.Item(i))
                 End Sub)
Next

The task factory seems to work fine but not sure which to implement.

For i as integer = 0 to searchPages.Count - 1
    Dim fact as Task=Task.Factory.StartNew(Sub() ps.processPage(searchPages.Item(i)))
    If i = 11 then
           Tasks.Task.WaitAll()
    Endif

Next

Any ideas appreciated.


Solution

  • For this type of pure data parallelism, I would recommend using Parallel.ForEach:

    Parallel.ForEach(searchPages, Sub(page) ps.processPage(page))
    

    If you want to restrict this to use 5 threads, you can do that via ParallelOptions.MaxDegreeOfParallelism:

    Dim po as New ParallelOptions
    po.MaxDegreeOfParallelism = 5
    Parallel.ForEach(searchPages, po, Sub(page) ps.processPage(page))
    

    This will have less overhead than Task.Factory.StartNew, since the partitioning within the Parallel class will reuse Tasks, and prevent over scheduling from occurring. It will also use the current thread for some of the processing instead of forcing it into a wait state, which also will reduce the total overhead involved.