I'm writing a basic library to experiment with C# hardware intrinsics (System.Runtime.Intrinsics*
namespaces) and have a method that could support any 'hardware' type (Byte
, SByte
... UInt64
, Double
)
When trying to use a generic signature the compiler is unable to work with the generic and cannot choose the correct overload; For example:
public static unsafe void GenericSimd<T>(T value, ReadOnlySpan<T> span) where T : unmanaged
{
fixed (T* fixedSpan = span)
{
Vector128<T> vec0 = Vector128.Create(value); // CS1503, Cannot convert T to byte
Vector128<T> vec1 = Sse2.LoadVector128(fixedSpan); // CS1503, Cannot convert T* to byte*
}
}
ref: CS1503
I think this is due to the unmanaged
constraint allowing additional non-'hardware' types (Decimal
, enum
etc.), therefore not being restrictive enough to guarantee an appropriate overload will exist.
Defining an interface to use as an additional constraint alongside unmanaged
is also unworkable as it would require partial-ing built-in types.
Is there a way to implement this method using generics and avoid writing an overload for each type?
In general you can't do that for generics. At least because Vectors have no generic create methods or cast options. But there's an option for Span<T>
.
public static unsafe void GenericSimd<T>(ReadOnlySpan<T> span)
where T : struct
{
ReadOnlySpan<byte> bytes = MemoryMarshal.Cast<T, byte>(span); // no data copy here involved, it's lightning fast
fixed (byte* fixedSpan = bytes)
{
// this way
Vector128<byte> vec1 = *(Vector128<byte>*)fixedSpan;
// or this way
Vector128<byte> vec2 = Sse2.LoadVector128(fixedSpan);
}
}
But ensure that you have enough bytes (16 or more) in Span
to fill the full Vector128<byte>
.
Also you may get a size of T
int size = Marshal.SizeOf(typeof(T));
And then switch-case
depending on size of the variable. But there's different behavior of handling data needed for integer and floating-point numbers.
A lot of switching logic that's not a friend of SSE/AVX code. At least because it must be as fast as possible but either switch
or if
, even Cast
consumes CPU resources.
I suggest you to make non-generic overloads similar to .NET SSE/AVX methods.
Btw, if you need pure generic hardware-accelerated Vector<T>
- welcome to System.Numerics.Vectors
. I tested, in most cases it shows the same performance on my Core i7 as Intrinsics.
public static void GenericSimd<T>(T value, ReadOnlySpan<T> span)
where T : struct
{
Vector<T> vector1 = new Vector<T>(value); // fine
Vector<T> vector2 = new Vector<T>(span); // also fine
}
Also you may check e.g. Vector<int>.Count
to get the vector's capacity.