Search code examples
c#parallel-processingenumerator

C# parallel mapping iterator retaining order


I have an Enumerator<T> as input, a mapping function that converts a T to an S, and I want to deliver an Enumerator<S> containing the results of the mapping. The mapping function is potentially expensive (for example, it might parse a 1Mb XML document) so I want to process multiple items in parallel; but I want the output Enumerator<S> to retain order. This means there will have to be some buffering; but I don't want to retain the whole sequence in memory.

I currently have this working using a Parallel.ForEach() and a ChannelWriter/ChannelReader pair: the callback in the ForEach applies the mapping function and writes the result to the ChannelWriter, and the result Enumerator reads the items using the ChannelReader. The only problem is that this doesn't retain order (mapped items are delivered in the order they become available, which is different from the input order).

I get the impression that there's a role here for an OrderablePartitioner, but I'm having great difficulty finding any usable information on how to use this class.


Solution

  • The simplest solution is probably the PLINQ library. The Select operator does the mapping, the AsOrdered ensures the preservation of order, and a couple of options regarding buffering ensures that the PLINQ buffers as little as possible:

    public static IEnumerable<TResult> Map<TSource, TResult>(
        this IEnumerable<TSource> source,
        Func<TSource, TResult> mapping,
        int degreeOfParallelism)
    {
        return Partitioner
            .Create(source, EnumerablePartitionerOptions.NoBuffering)
            .AsParallel()
            .AsOrdered()
            .WithDegreeOfParallelism(degreeOfParallelism)
            .WithMergeOptions(ParallelMergeOptions.NotBuffered)
            .Select(mapping);
    }
    

    In case you wish to chain the Map method multiple times consecutively in a pattern of a pipeline, see this answer.