Right way to parallelize pixel access across multiple images using ImageSharp

I'm trying to parallelize the processing of an image using ImageSharp. The documentation here: https://docs.sixlabors.com/articles/imagesharp/pixelbuffers.html has an example of processing two images in parallel with the following code:

// Extract a sub-region of sourceImage as a new image
private static Image<Rgba32> Extract(Image<Rgba32> sourceImage, Rectangle sourceArea)
{
    Image<Rgba32> targetImage = new(sourceArea.Width, sourceArea.Height);
    int height = sourceArea.Height;
    sourceImage.ProcessPixelRows(targetImage, (sourceAccessor, targetAccessor) =>
    {
        for (int i = 0; i < height; i++)
        {
            Span<Rgba32> sourceRow = sourceAccessor.GetRowSpan(sourceArea.Y + i);
            Span<Rgba32> targetRow = targetAccessor.GetRowSpan(i);

            sourceRow.Slice(sourceArea.X, sourceArea.Width).CopyTo(targetRow);
        }
    });

    return targetImage;
}

But that scenario has a key difference to mine, which is that I need to access totally arbitrary pixels from the source image. Like so:

Image<Rgb24> sourceImage = GetImage();
Image<Rgb24> outImage = GetImage();

for (var outY = 0; outY < outImage.Height; outY++)
{
    for (int outX = 0; outX < outImage.Width; outX++)
    {
        var outColor = GetArbitraryPixelFromAnywhereInsideSourceImage(sourceImage, outX, outY); // access arbitrary pixels from the source image based on some calculation, probably a block of between 2x2 and 4x4 pixels
         outImage[outX, outY] = outColor;
    }
}

I've already tried using the ProcessPixelRows method on the outImage, but I suspect that accessing the pixels in the sourceImage while inside that block prevents parallelization.

Simply replacing the for loops with Parallel.For scrambles the output image.

Note that each outImage pixel is written to exactly once, the sourceImage never changes, and the calculation of the value for the outImage pixel is deterministic based on the source sample.

Solution

Following up on my own question, I had a chance to work through @James's answer and made some discoveries that I thought might be useful to share. If you don't give a shit about what is happening and just want code, skip to the end.

First, I looked into his suggestion to avoid using the Advanced namespace, and instead consider using the ProcessPixelRowsAsVector4 variant of the higher level pixel buffer manipulation API. I discovered that I couldn't use that because in my case I need the row index (ie. int y) to do my calculations, and ProcessPixelRowsAsVector4 doesn't provide it.

I opened a discussion here about providing an overload of ProcessPixelRowsAsVector4 that actually does have the row index, so it's possible by the time you're reading this that the library actually has a signature like this that you should try using.

Meanwhile, in the present, I went about implementing James's other suggestion, the ParallelRowIterator.IterateRows solution.

I did get it working, but as I was putting final touches, I realized that Invoke(int y, Span<Rgba32> span) was giving me a span that I wasn't using. What is that span? Why was I not using it? Could I use it? Should I discard it?

This mattered to me because in my case these spans could be between 25,000 and 50,000 pixels long, and about that same number of spans could be allocated, so not doing that if it wasn't necessary seemed like a good idea. (It was possible that it's just returning a pointer to a particular place in already-allocated memory, which makes this less of an issue, but I wanted to know.)

My first guess was that the normal usecase for IterateRows involved looping over a source image, so maybe that span was like sourceRowSpan, and if that was true maybe I could somehow get the iterator to loop over the destination image and just return destinationRowSpan without me having to additionally get it inside the loop using DangerousGetRowSpan which sounds like the kind of method call that wears a leather jacket and disrespects your mother.

But I couldn't see how the iterator was even choosing where to get the span from at all, like my source and destination buffers were just members on my custom RowOperation class with no special relationship to the IRowOperation interface.

So I looked inside the ParallelRowIterator class and followed the rabbit hole into RowOperationWrapper which seemed to be what was actually calling the Invoke, and I discovered 3 things:

The span being passed in was just a newly allocated memory span of whatever pixel type. So it WAS, in fact, allocating a bunch of memory for this and not just returning a pointer to the memory.
The span wasn't referencing either the source or destination image really, so I could safely ignore it if I wanted, which the originally suggested code de facto did, but what I mean is that I could explicitly discard it with _ if I wanted.
There is a whole other signature that doesn't allocate or pass a span at all! It just passes the row index, which is all I actually need!

So having discovered that, I modified the originally suggested code as follows:

using Image<Rgba32> source = new(100, 100);
using Image<Rgba32> destination = new(100, 100);

Configuration configuration = Configuration.Default;

// You need access to individual frame pixel buffers in order
// to access some of the advanced APIs
RowOperation operation = new RowOperation(
configuration,
source.Frames[0].PixelBuffer,
destination.Frames[0].PixelBuffer);

// Ensure we don't go out of bounds
var interest = Rectangle.Intersect(source.Bounds(), destination.Bounds());
ParallelRowIterator.IterateRows<RowOperation>(
                configuration,
                interest,
                in operation);

// Save the output.

private readonly struct RowOperation : IRowOperation
{
    private readonly Random random;
    private readonly Buffer2D<Rgba32> source;
    private readonly Buffer2D<Rgba32> destination;
    private readonly Configuration configuration;

    public RowOperation(
        Configuration configuration,
        Buffer2D<Rgba32> source,
        Buffer2D<Rgba32> destination)
    {

        this.source = source;
        this.destination = destination;
        this.random = new();
        this.configuration = configuration;
    }

    public void Invoke(int y)
    {
        Span<Rgba32> destinationRowSpan = this.destination.DangerousGetRowSpan(y);
        for (int x = 0; x < destinationRowSpan.Length; x++)
        {
            destinationRowSpan[x] = this.GetRandomPixel();
        }
    }

    private Rgba32 GetRandomPixel()
    {
        int y = this.random.Next(this.source.Height);
        int x = this.random.Next(this.source.Width);
        return this.source[x, y];
    }
}

Basically I made RowOperation implement IRowOperation instead of IRowOperation<Rgba32>, and invoked it with .IterateRows<RowOperation>() instead of .IterateRows<RowOperation, Rgba32>(), which invokes with just the row index instead of the row index plus Span object.