Search code examples
c++performancechaining

Performance of chained public member access compared to pointer


Since I couldn't find any question relating to chained member access, but only chained function access, I would like to ask a couple of questions about it.

I have the following situation:

for(int i = 0; i < largeNumber; ++i)
{
  //do calculations with the same chained struct:
  //myStruct1.myStruct2.myStruct3.myStruct4.member1
  //myStruct1.myStruct2.myStruct3.myStruct4.member2
  //etc.
}

It is obviously possible to break this down using a pointer:

MyStruct4* myStruct4_pt = &myStruct1.myStruct2.myStruct3.myStruct4;
for(int i = 0; i < largeNumber; ++i)
{
  //do calculations with pointer:
  //(*myStruct4_pt).member1
  //(*myStruct4_pt).member2
  //etc.
}

Is there a difference between member access (.) and a function access that, e.g., returns a pointer to a private variable?

Will/Can the first example be optimized by the compiler and does that strongly depend on the compiler?

If no optimizations are done during compilation time, will/can the CPU optimize the behaviour (e.g. keeping it in the L1 cache)?

Does a chained member access make a difference at all in terms of performance, since variables are "wildly reassigned" during compilation time anyway?

I would kindly ask to leave discussions out regarding readability and maintainability of code, as the chained access is, for my purposes, clearer.

Update: Everything is running in a single thread.


Solution

  • This is a constant offset that you're modifying, a modern compiler will realize that.

    But - don't trust me, lets ask a compiler (see here).

    #include <stdio.h>
    
    struct D { float _; int i; int j; };
    
    struct C { double _; D d; };
    
    struct B { char _; C c; };
    
    struct A { int _; B b; };
    
    int bar(int i);
    int foo(int i);
    
    void foo(A &a) {
      for (int i = 0; i < 10; i++) {
        a.b.c.d.i += bar(i);
        a.b.c.d.j += foo(i);
      }
    }
    

    Compiles to

    foo(A&):
        pushq   %rbp
        movq    %rdi, %rbp
        pushq   %rbx
        xorl    %ebx, %ebx
        subq    $8, %rsp
    .L3:
        movl    %ebx, %edi
        call    bar(int)
        addl    %eax, 28(%rbp)
        movl    %ebx, %edi
        addl    $1, %ebx
        call    foo(int)
        addl    %eax, 32(%rbp)
        cmpl    $10, %ebx
        jne .L3
        addq    $8, %rsp
        popq    %rbx
        popq    %rbp
        ret
    

    As you see, the chaining has been translated to a single offset in both cases: 28(%rbp) and 32(%rbp).