I've written a parallel algorithm in C# to partition an array into two lists, one that contains elements which satisfies a given predicate and the other list contains the elements that fails to satisfy the predicate. It is an order preserving algorithm.
I have written it as follows, but I want to know how to maximize the opportunity to profit from hardware concurrency.
static void TestPLinqPartition(int cnt = 1000000)
{
Console.WriteLine("PLINQ Partition");
var a = RandomSequenceOfValuesLessThan100(cnt).ToArray();
var sw = new Stopwatch();
sw.Start();
var ap = a.AsParallel();
List<int> partA = null;
List<int> partB = null;
Action actionA = () => { partA = (from x in ap where x < 25 select x).ToList(); };
Action actionB = () => { partB = (from x in ap where !(x < 25) select x).ToList(); };
Parallel.Invoke(actionA, actionB);
sw.Stop();
Console.WriteLine("Partion sizes = {0} and {1}", partA.Count, partB.Count);
Console.WriteLine("Time elapsed = {0} msec", sw.ElapsedMilliseconds);
}
I'd partition the data into small segments (e.g. using the Partitioner class), and assign an index to each partition in relation to its position. For each numbered partition I'd create a Task that splits the partition into two groups, one that matches the predicate and one that doesn't and return the two groups, together with the index of the partition that they originated from, as the Task's return value. I'd then wait for all the tasks to complete, and then .Concat()
(to prevent wasting time on actually merging all the data) the matching groups according to their index, and the same for the non-matching groups. You should be able to achieve an arbitrary degree of parallelism this way, while preserving relative item order.