Is there any penalty in returning members of nested objects?

Consider the following code, regarding the nested-access of members:

struct A
{
  size_t m_A;
};
struct B
{  
   A m_A_of_B;
};
class D
{
  B instance_B;
  A instance_A;
  size_t m_D;
public:
  size_t direct (void) { return m_D; }
  size_t ind1 (void) { return instance_A.m_A; }
  size_t ind2 (void) { return instance_B.m_A_of_B.m_A; }
};

I can imagine two different cases here:

1. No difference

To my understanding there should be no difference since all of the functions return a value that has a compile-time constant position in relation to this / in the class memory layout.

I expect the compiler to recognize it.

Therefore, I assume that there is no penalty in returning members from such nested structures like I've shown above (or even deeper).

2. Pointer indirection

It may be possible that the whole "indirection" is carried out here. In ind2 for example:

fetch this -> fetch relative position of instance_B -> fetch relative position of m_A_of_B -> return m_A

Questions

Is it compiler-depedant how this nested-access is handled?
Is there any difference in those three functions?

I ask this since I only have an assumption about this issue from what I know about how things work. Because some of my assumptions have proven to be wrong in the past, I want to ask to be sure here.

Excuse me if this has already been asked, point me to the appropriate answer if possible.

_{PS: You don't need to give any hints on "premature optimization being the root of all evil" or about profiling. I can profile on this issue using the compiler I am developing with but the program I am aiming at may be compiled with any conforming compiler. So even if I'm not able to determine any differences they may still be present.}

Solution

The standard places no constraints on this. A compiler writer with a really twisted mind could, for example, generate a loop which does nothing at the start of every function, with the number of times through the loop depending on the number of letters in the function name. Fully conforming, but... I rather doubt he'd have many users for his compiler.

In practice, it's just (barely) conceivable that the compiler work out the address of each sub-object; e.g. on an Intel, do something like:

D::direct:
    mov eax, [ecx + offset m_D]
    return

D::ind1:
    lea ebx, [ecx + offest instance_A]
    mov eax, [ebx + offset m_D]
    return

D::ind2:
    lea ebx, [ecx + offset instance_B]
    lea ebx, [ebx + offset m_A_of_B]
    mov eax, [ebx + offset m_D]
    return

In fact, all of the compilers I've ever seen work out the complete layout of the directly contained objects, and would generate something like:

D::direct:
    mov eax, [ecx + offset m_D]
    return

D::ind1:
    mov eax, [ecx + offset instance_A + offset m_D]
    return

D::ind2:
    mov eax, [ecx + offset instance_A + offset m_A_of_B + offset m_D]
    return

(The additions of the offsets in the square brackets occurs in the assembler; the expressions correspond to a single constant within the instruction in the actual executable.)

So in answser to your questions: 1 is that it's completely compiler-dependent, and 2 is that in actual practice, there will be absolutely no difference.

Finally, all of your functions are inline. And they are simple enough that every compiler will inline them, at least with any degree of optimization activated. And once inlined, the optimizer may find additional optimizations: it may be able to detect that you initialized D::instance_B::m_A_of_B::m_A with a constant, for example; in which case, it will just use the constant, and there won't be any access what so ever. In fact, you're wrong to worry about this level of optimization, because the compiler will take care of it for you, better than you can.