I have a list of strings that I need to pass to a process in a different class. What I want to know which of the two ideas would be a better approach to use in terms of speed, efficiency and parallel processing. The list contains +- 10000 strings and I want to limit the threads appropriately to have only about 5threads running at one time:
For i as integer = 0 to searchPages.Count - 1
Parallel.For(0,10,Sub(x)
ps.processPage(searchPages.Item(i))
End Sub)
Next
The task factory seems to work fine but not sure which to implement.
For i as integer = 0 to searchPages.Count - 1
Dim fact as Task=Task.Factory.StartNew(Sub() ps.processPage(searchPages.Item(i)))
If i = 11 then
Tasks.Task.WaitAll()
Endif
Next
Any ideas appreciated.
For this type of pure data parallelism, I would recommend using Parallel.ForEach
:
Parallel.ForEach(searchPages, Sub(page) ps.processPage(page))
If you want to restrict this to use 5 threads, you can do that via ParallelOptions.MaxDegreeOfParallelism:
Dim po as New ParallelOptions
po.MaxDegreeOfParallelism = 5
Parallel.ForEach(searchPages, po, Sub(page) ps.processPage(page))
This will have less overhead than Task.Factory.StartNew
, since the partitioning within the Parallel
class will reuse Tasks, and prevent over scheduling from occurring. It will also use the current thread for some of the processing instead of forcing it into a wait state, which also will reduce the total overhead involved.