Recently I've tried to implement compile-time dispatch using generics (example below)
public interface IAbstraction
{
public void Initialize();
}
public sealed class Implementation : IAbstraction
{
public void Initialize()
{
}
}
public sealed class GenericUsage<T> where T : class, IAbstraction
{
private readonly T _abstraction;
public GenericUsage(T abstraction)
{
_abstraction = abstraction;
}
public void CallAction()
{
_abstraction.Initialize();
}
}
public static void Main()
{
var genericUsage = new GenericUsage<Implementation>(new Implementation());
genericUsage.CallAction();
}
As you can see, I've also explicitly used sealed
keyword as a signal that there is no chance that the type that will be passed to the constructor will be different from that used in Generic.
On the side of JIT Asm I see no de-virtualization. The call is still happening via vtable.
Is there any chance of implementing compile-time dispatch via generics? Maybe I'm missing something. If not, why the example above doesn't work?
There is a 2015 open issue that is very similar to what you try to achieve: RyuJIT call optimization and aggressive inlining with known generic types
From the comments (2018):
For generics instantiated over ref types we're unlikely to do devirtualization anytime soon, as the jit only sees the shared version. This might change down the road, if we somehow enabled unshared ref type instantiations or started looking into speculative devirtualization.
The "shared version" is better explained in Shared Generics Design
The idea is that for certain instantiations, the generated code will almost be identical with the exception of a few instructions, so in order to reduce the memory footprint, and the amount of time we spend jitting these generic methods, the runtime will generate a single special canonical version of the code, which can be used by all compatible instantiations of the method.
This feature is currently only supported for instantiations over reference types because they all have the same size/properties/layout/etc... For instantiations over primitive types or value types, the runtime will generate separate code bodies for each instantiation.
If we look at the disassembly (.NET 6 and .NET 7, Debug, x86) it makes sense:
public void CallAction()
{
...
je ConsoleApp64.GenericUsage`1[[System.__Canon, System.Private.CoreLib]].CallAction()+01Fh (057C997h)
call 10045230
//callvirt instance void IAbstraction::Initialize()
_abstraction.Initialize();
mov ecx,dword ptr [ebp-38h]
mov ecx,dword ptr [ecx+4]
call dword ptr [Pointer to: CLRStub[VSD_LookupStub]@d82df850017a042 (0170020h)]
}
There is a "canonical" jitted method that is used for every GenericUsage<T>.CallAction()
where T
is a class
.
je ConsoleApp64.GenericUsage`1[[System.__Canon, System.Private.CoreLib]].CallAction()+01Fh (057C997h)
with the body:
//callvirt instance void IAbstraction::Initialize()
_abstraction.Initialize();
mov ecx,dword ptr [ebp-38h]
mov ecx,dword ptr [ecx+4]
call dword ptr [Pointer to: CLRStub[VSD_LookupStub]@d82df850017a042 (0170020h)]
The JITter cannot devirtualize and insert a direct call to (or inline) Implementation.Initialize()
because the same jitted code would be "shared" for GenericUsage<SecondImplementation>.CallAction()
, GenericUsage<ThirdImplementation>.CallAction()
.
As mentioned in the original github issue comment, there needs to be "unshared ref type" instantiations for this to work.
EDIT: Either this or something else was implemented in .NET 8 when T
is a sealed class
(will edit if somebody comments the exact issue - but maybe Dynamic PGO?) which is visible when we compare a generic with a class and a struct. The struct would normally always get an "unshared" implementation so the JIT can make a faster static dispatch to it. Slightly modified code to test and benchmark:
public interface IAbstraction {
public int Initialize(int param);
}
public sealed class Implementation : IAbstraction {
public int Initialize(int param) {
return param;
}
}
public struct StructImplementation : IAbstraction {
public int Initialize(int param) {
return param;
}
}
public sealed class GenericUsage<T> where T : IAbstraction {
private readonly T _abstraction;
public GenericUsage(T abstraction) {
_abstraction = abstraction;
}
public int CallAction(int param) {
return _abstraction.Initialize(param);
}
}
// Benchmark methods
public int TestRef() {
var genericUsage = new GenericUsage<Implementation>(new Implementation());
var sum = 0;
for (int i = 0; i < 100_000; i++) {
sum += genericUsage.CallAction(i);
}
return sum;
}
public int TestStruct() {
var genericUsage = new GenericUsage<StructImplementation>(new StructImplementation());
var sum = 0;
for (int i = 0; i < 100_000; i++) {
sum += genericUsage.CallAction(i);
}
return sum;
}
.NET 6:
Case ResultsGraph Mean Min Max Range AllocatedBytesΞΞ OperationsΞΞ Phase
TestRef 450.70 μs 346.59 μs 637.78 μs 65 % 49 97,280 Complete
TestStruct 85.76 μs 49.76 μs 123.15 μs 86 % 24 819,200 Complete
.NET 7
Case ResultsGraph Mean Min Max Range AllocatedBytesΞΞ OperationsΞΞ Phase
TestRef 434.15 μs 331.40 μs 572.71 μs 56% 49 95,232 Complete
TestStruct 89.63 μs 50.85 μs 119.15 μs 76% 24 819,200 Complete
.NET 8 (sealed T matters):
Case ResultsGraph Mean Min Max Range AllocatedBytesΞΞ OperationsΞΞ Phase
TestRefNonSealed 1.03 ms 626.93 μs 1.34 ms 69% 49 102,400 Complete
TestRef 84.58 μs 57.82 μs 112.69 μs 65% 48 819,200 Complete
TestStruct 63.55 μs 40.90 μs 95.03 μs 85% 24 811,008 Complete