Search code examples
c#generics

How to achieve compile-time dispatch via generics?


Recently I've tried to implement compile-time dispatch using generics (example below)

public interface IAbstraction
{
    public void Initialize();
}

public sealed class Implementation : IAbstraction
{
    public void Initialize()
    {
    }
}

public sealed class GenericUsage<T> where T : class, IAbstraction
{
    private readonly T _abstraction;

    public GenericUsage(T abstraction)
    {
        _abstraction = abstraction;
    }

    public void CallAction()
    {
        _abstraction.Initialize();
    }
}

public static void Main()
{
    var genericUsage = new GenericUsage<Implementation>(new Implementation());
    genericUsage.CallAction();
}

As you can see, I've also explicitly used sealed keyword as a signal that there is no chance that the type that will be passed to the constructor will be different from that used in Generic.

On the side of JIT Asm I see no de-virtualization. The call is still happening via vtable.

Is there any chance of implementing compile-time dispatch via generics? Maybe I'm missing something. If not, why the example above doesn't work?


Solution

  • There is a 2015 open issue that is very similar to what you try to achieve: RyuJIT call optimization and aggressive inlining with known generic types

    From the comments (2018):

    For generics instantiated over ref types we're unlikely to do devirtualization anytime soon, as the jit only sees the shared version. This might change down the road, if we somehow enabled unshared ref type instantiations or started looking into speculative devirtualization.

    The "shared version" is better explained in Shared Generics Design

    The idea is that for certain instantiations, the generated code will almost be identical with the exception of a few instructions, so in order to reduce the memory footprint, and the amount of time we spend jitting these generic methods, the runtime will generate a single special canonical version of the code, which can be used by all compatible instantiations of the method.

    This feature is currently only supported for instantiations over reference types because they all have the same size/properties/layout/etc... For instantiations over primitive types or value types, the runtime will generate separate code bodies for each instantiation.

    If we look at the disassembly (.NET 6 and .NET 7, Debug, x86) it makes sense:

            public void CallAction() 
                {
    ...
    je          ConsoleApp64.GenericUsage`1[[System.__Canon, System.Private.CoreLib]].CallAction()+01Fh (057C997h)  
    call        10045230   
                //callvirt instance void IAbstraction::Initialize()
                _abstraction.Initialize();
    mov         ecx,dword ptr [ebp-38h]  
    mov         ecx,dword ptr [ecx+4]  
    call        dword ptr [Pointer to: CLRStub[VSD_LookupStub]@d82df850017a042 (0170020h)]  
            }
    

    There is a "canonical" jitted method that is used for every GenericUsage<T>.CallAction() where T is a class.

    je          ConsoleApp64.GenericUsage`1[[System.__Canon, System.Private.CoreLib]].CallAction()+01Fh (057C997h) 
    

    with the body:

                //callvirt instance void IAbstraction::Initialize()
                _abstraction.Initialize();
    mov         ecx,dword ptr [ebp-38h]  
    mov         ecx,dword ptr [ecx+4]  
    call        dword ptr [Pointer to: CLRStub[VSD_LookupStub]@d82df850017a042 (0170020h)]  
    

    The JITter cannot devirtualize and insert a direct call to (or inline) Implementation.Initialize() because the same jitted code would be "shared" for GenericUsage<SecondImplementation>.CallAction(), GenericUsage<ThirdImplementation>.CallAction().

    As mentioned in the original github issue comment, there needs to be "unshared ref type" instantiations for this to work.

    EDIT: Either this or something else was implemented in .NET 8 when T is a sealed class (will edit if somebody comments the exact issue - but maybe Dynamic PGO?) which is visible when we compare a generic with a class and a struct. The struct would normally always get an "unshared" implementation so the JIT can make a faster static dispatch to it. Slightly modified code to test and benchmark:

    public interface IAbstraction {
        public int Initialize(int param);
    }
    
    public sealed class Implementation : IAbstraction {
        public int Initialize(int param) {
            return param;
        }
    }
    
    public struct StructImplementation : IAbstraction {
        public int Initialize(int param) {
            return param;
        }
    }
    
    public sealed class GenericUsage<T> where T : IAbstraction {
        private readonly T _abstraction;
    
        public GenericUsage(T abstraction) {
            _abstraction = abstraction;
        }
    
        public int CallAction(int param) {
            return _abstraction.Initialize(param);
        }
    }
    
    // Benchmark methods
    public int TestRef() {
        var genericUsage = new GenericUsage<Implementation>(new Implementation());
    
        var sum = 0;
        for (int i = 0; i < 100_000; i++) {
            sum += genericUsage.CallAction(i);
        }
        
        return sum;
    
    }
    
    public int TestStruct() {
        var genericUsage = new GenericUsage<StructImplementation>(new StructImplementation());
    
        var sum = 0;
        for (int i = 0; i < 100_000; i++) {
            sum += genericUsage.CallAction(i);
        }
    
        return sum;
    }
    

    .NET 6:

    Case ResultsGraph    Mean Min Max Range   AllocatedBytesΞΞ OperationsΞΞ    Phase
    TestRef     450.70 μs   346.59 μs   637.78 μs   65 % 49  97,280  Complete
    TestStruct  85.76 μs    49.76 μs    123.15 μs   86 % 24  819,200 Complete
    

    .NET 7

    Case    ResultsGraph    Mean    Min Max Range   AllocatedBytesΞΞ    OperationsΞΞ    Phase
    TestRef     434.15 μs   331.40 μs   572.71 μs   56% 49  95,232  Complete
    TestStruct  89.63 μs    50.85 μs    119.15 μs   76% 24  819,200 Complete
    

    .NET 8 (sealed T matters):

    Case    ResultsGraph    Mean    Min Max Range   AllocatedBytesΞΞ    OperationsΞΞ    Phase
    TestRefNonSealed        1.03 ms 626.93 μs   1.34 ms 69% 49  102,400 Complete
    TestRef     84.58 μs    57.82 μs    112.69 μs   65% 48  819,200 Complete
    TestStruct  63.55 μs    40.90 μs    95.03 μs    85% 24  811,008 Complete