I have C# list which contains around 8000 items (file paths). I want to run a method on all of these items in parallel. For this i have below 2 options:
1) Manually divide list into small-small chunks (say of 500 size each) and create array of actions for these small lists and then call Parallel.Invoke like below:
var partitionedLists = MainList.DivideIntoChunks(500);
List<Action> actions = new List<Action>();
foreach (var lst in partitionedLists)
{
actions.Add(() => CallMethod(lst));
}
Parallel.Invoke(actions.ToArray())
2) Second option is to run Parallel.ForEach like below
Parallel.ForEach(MainList, item => { CallMethod(item) });
Please suggest, thanks in advance.
The first option is a form of task-parallelism
, in which you divide your task into group of sub-tasks and execute them in parallel. As is obvious from the code you provided, you are responsible for choosing the level of granularity [chunks] while creating the sub-tasks. The selected granularity might be too big or too low, if one does not rely on appropriate heuristics, and the resulting performance gain might not be significant. Task-parallelism
is used in scenarios where the operation to be performed takes similar time for all input values.
The second option is a form of data-parallelism
, in which the input data is divided into smaller chunks based on the number of hardware threads/cores/processors available, and then each individual chunk is processed in isolation. In this case, the .NET library chooses the right level of granularity for you and ensures better CPU utilization. Conventionally, data-parallelism
is used in scenarios when the operation to be performed can vary in terms of time taken, depending on the input value.
In conclusion, if your operation is more or less uniform over the range of input values and you know the right granularity [chunk size], go ahead with the first option. If however that's not the case or if you are unsure about the above questions, go with the second option which usually pans out better in most scenarios.
NOTE: If this is a very performance critical component of your application, I will advise bench-marking the performances in production like environment with both approaches to get more data, in addition to the above recommendations.