Search code examples
c#.net-coreintrinsics

Can you pass generics to .NET Core hardware intrinsics methods?


I'm writing a basic library to experiment with C# hardware intrinsics (System.Runtime.Intrinsics* namespaces) and have a method that could support any 'hardware' type (Byte, SByte ... UInt64, Double)

When trying to use a generic signature the compiler is unable to work with the generic and cannot choose the correct overload; For example:

public static unsafe void GenericSimd<T>(T value, ReadOnlySpan<T> span) where T : unmanaged
{
    fixed (T* fixedSpan = span)
    {
        Vector128<T> vec0 = Vector128.Create(value);       // CS1503, Cannot convert T to byte
        Vector128<T> vec1 = Sse2.LoadVector128(fixedSpan); // CS1503, Cannot convert T* to byte*
    }
}

ref: CS1503

I think this is due to the unmanaged constraint allowing additional non-'hardware' types (Decimal, enum etc.), therefore not being restrictive enough to guarantee an appropriate overload will exist.

Defining an interface to use as an additional constraint alongside unmanaged is also unworkable as it would require partial-ing built-in types.

Is there a way to implement this method using generics and avoid writing an overload for each type?


Solution

  • In general you can't do that for generics. At least because Vectors have no generic create methods or cast options. But there's an option for Span<T>.

    public static unsafe void GenericSimd<T>(ReadOnlySpan<T> span) 
        where T : struct
    {
        ReadOnlySpan<byte> bytes = MemoryMarshal.Cast<T, byte>(span); // no data copy here involved, it's lightning fast
        fixed (byte* fixedSpan = bytes)
        {
            // this way
            Vector128<byte> vec1 = *(Vector128<byte>*)fixedSpan;
            // or this way
            Vector128<byte> vec2 = Sse2.LoadVector128(fixedSpan);
        }
    }
    

    But ensure that you have enough bytes (16 or more) in Span to fill the full Vector128<byte>.

    Also you may get a size of T

    int size = Marshal.SizeOf(typeof(T));
    

    And then switch-case depending on size of the variable. But there's different behavior of handling data needed for integer and floating-point numbers.

    A lot of switching logic that's not a friend of SSE/AVX code. At least because it must be as fast as possible but either switch or if, even Cast consumes CPU resources.

    I suggest you to make non-generic overloads similar to .NET SSE/AVX methods.

    Btw, if you need pure generic hardware-accelerated Vector<T> - welcome to System.Numerics.Vectors. I tested, in most cases it shows the same performance on my Core i7 as Intrinsics.

    public static void GenericSimd<T>(T value, ReadOnlySpan<T> span)
        where T : struct
    {
        Vector<T> vector1 = new Vector<T>(value); // fine
        Vector<T> vector2 = new Vector<T>(span); // also fine
    }
    

    Also you may check e.g. Vector<int>.Count to get the vector's capacity.