Efficiency of load-value instructions versus load-address instructions for fields of structs

Consider the following C# struct definitions:

public struct A
{
    public B B;
}

public struct B
{
    public int C;
}

Also consider the following static method:

public static int Method(A a) => a.B.C;

Calling this method will result in a copy of the struct type A. For example, in the following code:

A a = default;
Method(a);

the call to Method will compile to IL that looks something like this:

IL_0008: ldloc.0      // V_0
IL_0009: call         int32 Class::Method(valuetype A)

ldloc will copy the value of local variable a (V_0) onto the evaluation stack, and that value will be used in Method. If A (or B) was a large struct, this copy could supposedly be expensive. The IL for Method also results in load-value instructions:

IL_0000: ldarg.0      // a
IL_0001: ldfld        valuetype B A::B
IL_0006: ldfld        int32 B::C
IL_000b: ret

Recent versions of C# include features that can help make working with structs more efficient. C# 7.2 introduced the in modifier on parameters that enables the passing of a value type by reference when the compiler can verify that the argument will not be modified by the called method. For example, applying the in modifier to parameter a:

public static int Method(in A a) => a.B.C;

will result in the following compiled IL at the call site:

IL_0008: ldloca.s     a
IL_000a: call         int32 Class::Method(valuetype A&)

and in the implementation of Method:

IL_0000: ldarg.0      // a
IL_0001: ldflda       valuetype B A::B
IL_0006: ldfld        int32 B::C
IL_000b: ret

Note the load-address instructions. My assumption (please correct me if I am wrong) is that for deep field reads (such as reading C that's inside of B that's inside of A), load-address instructions are more efficient than load-value instructions.

With that in mind, consider changing the example code:

A a = default;
var c = a.B.C;

The second line then compiles to:

IL_0008: ldloc.1      // V_1
IL_0009: ldfld        valuetype B A::B
IL_000e: ldfld        int32 B::C
IL_0013: stloc.0      // c

Why wouldn't the compiler prefer to use load-address instructions in this case too? Is there an efficiency difference simply because a is a local variable versus a method parameter, or is there something else I'm missing here?

Solution

It's definitely not related to a being a local variable vs a method argument. Not from efficiency point of view, at least.

The first thing to understand is that structs in C# sit (in the memory) directly where they are declared - so directly on the stack, for local variables. More importantly - nested structs behave the same. It is possible for the JIT, in any point during runtime (not always during compilation, read more about StructLayoutAttribute) , to know exactly where B is inside of A, where C is inside of B, and where B.C lies inside of a.

When looking at the assembly code after the JIT compiles the method (it's important to compile in Release - debug builds will not get optimized the same way. Make sure the compiler doesn't optimize the variables away as well), you'll see that no matter where you type a.B.C it will always be a direct assignment from memory (in relation to where A stands in memory).

In my case, I added another variable int a1 inside A to move the memory a bit - this is the resulting code:

A a = default;

xor         ecx,ecx  
mov         qword ptr [rbp-30h],rcx

var c = a.B.C;

mov         esi,dword ptr [rbp-2Ch]

where esi is a temporary register for var c and [rbp-30h] is where a sits in the stack. B has an integer sitting in offset 0, A has an integer sitting in offset 0 and B sitting in offset 4, so the final address of a.B.C is always a+4 ([rbp-2Ch]).