How does C# handle calling an interface method on a struct?

Consider:

interface I { void M(); }
struct S: I { public void M() {} }
// in Main:
S s;
I i = s;
s.M();
i.M();

And the IL for Main:

.maxstack 1
.entrypoint
.locals init (
    [0] valuetype S s,
    [1] class I i
)

IL_0000: nop
IL_0001: ldloc.0
IL_0002: box S
IL_0007: stloc.1
IL_0008: ldloca.s s
IL_000a: call instance void S::M()
IL_000f: nop
IL_0010: ldloc.1
IL_0011: callvirt instance void I::M()
IL_0016: nop
IL_0017: ret

First (IL_000a), S::M() is called with a value type for this. Next (IL_0011), it's called with a reference (boxed) type.

How does this work?

I can think of three ways:

Two versions of I::M are compiled, for value/ref type. In the vtable, it stores the one for ref type, but statically dispatched calls use the one for value types. This is ugly and unlikely, but possible.
In the vtable, it stores a "wrapper" method that unboxes this, then calls the actual method. This sounds inefficient because all the method's arguments would have to be copied through two calls.
There's special logic that checks for this in callvirt. Even more inefficient: all callvirts incur a (slight) penalty.

Solution

The short answer is that in the method itself, the value of the struct is always accessed through a pointer. That means that method does not operate as if the struct was passed as a normal parameter, it's more like a ref parameter. It also means that the method does not know whether it's operating on boxed value or not.

The long answer:

First, if I compile your code, then s.M(); does not generate any code. The JIT compiler is smart enough to inline the method and inlining an empty method results in no code. So, what I did is to apply [MethodImpl(MethodImplOptions.NoInlining)] on S.M to avoid this.

Now, here is the native code your method generates (omitting function prolog and epilog):

// initialize s in register AX
xor         eax,eax  
// move s from register AX to stack (SP+28h)
mov         qword ptr [rsp+28h],rax  
// load pointer to MethodTable for S to register CX
mov         rcx,7FFDB00C5B08h  
// allocate memory for i on heap
call        JIT_TrialAllocSFastMP_InlineGetThread (07FFE0F824C10h)  
// copy contents of s from stack to register C
movsx       rcx,byte ptr [rsp+28h]  
// copy from register CX to heap
mov         byte ptr [rax+8],cl  
// copy pointer to i from register AX to register SI
mov         rsi,rax  
// load address to c on stack to register CX
lea         rcx,[rsp+28h]  
// call S::M
call        00007FFDB01D00C8  
// copy pointer to i from register SI to register CX
mov         rcx,rsi  
// move address of stub for I::M to register 11
mov         r11,7FFDB00D0020h  
// ???
cmp         dword ptr [rcx],ecx  
// call stub for I::M
call        qword ptr [r11]

In both cases, the call ends up calling the same code (which is just a single ret instruction). The first time, the CX register points to the stack-allocated s (SP+28h in the above code), the second time to the heap-allocated i (AX+8 just after the call to the heap allocation function).