Search code examples
c#.net-coregarbage-collectionunsafe-pointers

Unsafe slicing a 2D array with ReadOnlySpan<T>


On .Net Core 3.1, I have many large 2D arrays where I need to operate on a slice of a single row in the array. The same slice may be used by multiple operations so I would like to perform the slice just once and reuse the slice.

The sample code below slices an array and then calls 2 functions to operate on the slice.

public void MyFunc()
{
    double[,] array = ...;  // populate the array

    // select which part of the array to slice, values not important
    int index0 = 0;
    int startIndex1 = 1;
    int sliceLength = 2;

    // slice the array
    ReadOnlySpan<double> slice = Slice(array, index0, startIndex1, sliceLength);

    // do things with the slice
    DoSomething1(slice);
    DoSomething2(slice);
}

public unsafe ReadOnlySpan<double> Slice(double[,] array, int index0, int startIndex1, int sliceLength)
{
    int arrayLength = array.GetLength(0) * array.GetLength(1);
    int arrayStartIndex = index0 * array.GetLength(1) + startIndex1;
    ReadOnlySpan<double> slice;
    fixed (double* arrayPtr = array)
    {
        slice = new ReadOnlySpan<double>(arrayPtr, arrayLength).Slice(arrayStartIndex, sliceLength);
    }

    // does it matter if slice is returned inside or outside of the fixed block?
    return slice;
}

public void DoSomething1(ReadOnlySpan<double> slice)
{
    ...
}

public void DoSomething2(ReadOnlySpan<double> slice)
{
    ...
}

"Fixed" ensures that the GC won't move "array" while I'm creating "slice". After creating "slice", if the GC moves "array", will it also update "slice" to refer to the new "array" address or will "slice" still reference the old address? In other words, will DoSomething1(...) and DoSomething2(...) always operate on the intended slice of the original array, or could they inadvertently operate on a random block of memory?

Also, does it matter whether "return slice;" is inside or outside the "fixed" block?

EDIT With inspiration from https://stackoverflow.com/a/40589439/13532170 I managed write a test to prove V0ldek is correct about GC updating the ReadOnlySpan's address when the parent array is moved.

public static unsafe void ReadOnlySpanTest()
{
    // create 2D array
    double[,] array = new double[,] { {1, 2, 3}, {4, 5, 6} };

    // parameters to convert 2D array to 1D span
    int arrayLength = array.GetLength(0) * array.GetLength(1);
    int sliceStartIndex = 1;
    int sliceLength = 2;

    // create span
    IntPtr arrayAddressBeforeMove;
    ReadOnlySpan<double> spanFromPointer;
    fixed (double* arrayPtr = array)
    {
        arrayAddressBeforeMove = (IntPtr)arrayPtr;

        // spanFromPointer should contain { 2, 3 }
        spanFromPointer = new ReadOnlySpan<double>(arrayPtr, arrayLength).Slice(sliceStartIndex, sliceLength);
    }

    // trick GC into moving the array
    GC.AddMemoryPressure(10000000);
    GC.Collect();
    GC.RemoveMemoryPressure(10000000);

    // check array address and span contents again
    IntPtr arrayAddressAfterMove;
    fixed (double* arrayPtr = array)
    {
        // arrayAddressAfterMove should be different from arrayAddressBeforeMove
        arrayAddressAfterMove = (IntPtr) arrayPtr;

        // spanFromPointer should still contain { 2, 3 }
    }
}

Stepping over ReadOnlySpanTest() in the debugger, I can see that arrayAddressAfterMove != arrayAddressBeforeMove, indicating that GC did move my array. I can also see that spanFromPointer contains { 2, 3 } both before and after the array was moved. So doesn't matter that the ReadOnlySpan was created with a "fixed" block, it can still be safely used after leaving the "fixed" block.


Solution

  • After creating a Span<T>, ReadOnlySpan<T> or Memory<T> all subsequent uses are safe.

    Here's a reference by Stephen Toub.

    First, Span is a value type containing a ref and a length, defined approximately as follows:

    public readonly ref struct Span<T>
    {
      private readonly ref T _pointer;
      private readonly int _length;
      ...
    }
    

    The concept of a ref T field may be strange at first—in fact, one can’t actually declare a ref T field in C# or even in MSIL. But Span is actually written to use a special internal type in the runtime that’s treated as a just-in-time (JIT) intrinsic, with the JIT generating for it the equivalent of a ref T field.

    Span is a ref-like type as it contains a ref field, and ref fields can refer not only to the beginning of objects like arrays, but also to the middle of them (...) These references are called interior pointers, and tracking them is a relatively expensive operation for the .NET runtime’s garbage collector. As such, the runtime constrains these refs to only live on the stack, as it provides an implicit low limit on the number of interior pointers that might be in existence.

    So the GC actually does track the pointers from your ReadOnlySpan<T>, so after being constructed spans are always safe. The span will always point to the array you sliced and it doesn't matter where you return it. Implementation details on how exactly it's done are specific to the CLR. Keywords to search for are "managed pointers" and "interior pointers". I recommend this article if you want to get more nitty-gritty.