Search code examples
c++assemblyinline

Inline c++ method in assembly


Lets have a small class called myClass. I was interested how does look the difference in .asm when method is inlined or not. I made two programs, with and without inline keyword in cpp file, but the .asm output was the same. I know that the inline is just a hint for compiler, and with the high probability I was a victim of an optimization, but is it possible to see the difference on a small cpp example of inlined and not inlined method in asm?

h:

#ifndef CLASS_H
#define CLASS_H

class myClass{
private:
  int a;
public:
  int getA() const;
};

#endif

cpp:

#include <class.h>
inline int myCLass::getA() const{
  return a;
};

main:

#include "class.h"

int main(){
    myClass a;
    a.getA();
    return 0;
}

gcc:

gcc -S -O0 main.cpp

asm output in both cases:

    .section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 10, 14
    .globl  _main                   ## -- Begin function main
    .p2align    4, 0x90
_main:                                  ## @main
    .cfi_startproc
## %bb.0:
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register %rbp
    subq    $16, %rsp
    leaq    -8(%rbp), %rdi
    movl    $0, -4(%rbp)
    callq   __ZNK7myClass4getAEv
    xorl    %ecx, %ecx
    movl    %eax, -12(%rbp)         ## 4-byte Spill
    movl    %ecx, %eax
    addq    $16, %rsp
    popq    %rbp
    retq
    .cfi_endproc
                                        ## -- End function

.subsections_via_symbols

Solution

  • gcc -O0 doesn't enable -finline-functions, so even if the functions were in the same file it wouldn't try. See also Why is this C++ wrapper class not being inlined away?. (Don't bother trying to use __attribute__((always_inline)): you'll get inlining, things won't optimize away.

    You could get things inlined with gcc -O3 -fwhole-program *.cpp to enable inlining across source files. (Regardless of whether they were declared inline or not, it's just up to the compiler to decide what's best).

    The main point of inline is to let the compiler know that it doesn't need to emit a stand-alone definition of a function if it does choose to inline it into all callers. (Because a definition, not just a declaration, of this function will appear in all translation units that use it. So if some other file decides not to inline it, a definition can be emitted there.)

    Modern compilers still use their normal heuristics to decide whether it's worth inlining or not. e.g. a large function with multiple callers will probably not be inlined, to avoid code bloat. static tells the compiler that no other translation unit can see the function, so if there's only one caller in this file it will very likely inline there. (If you have a large function, it's a bad idea to make it static inline. You'll get a copy of the definition in each file where it doesn't inline, and too aggressive inlining. For a small function that's probably going to inline everywhere, you should probably still just use inline, not static inline, so in case anything takes the address of the function there will only be one definition shared across all files. inline tells the linker to merge duplicate definitions of a function instead of erroring. This behaviour is one of the more important parts of what inline really does, not the actual hint to the compiler that you want it to inline.)


    gcc -fwhole-program (with all the source files on the same command line) gives the compiler enough information to make all these decision itself. It can see if a function only has one caller across the whole program, and inline it instead of creating a stand-alone definition plus arg setup and a call.

    gcc -flto allows link-time optimization similar to whole-program, but doesn't require all the .cpp files on the command line at once. Instead it stores GIMPLE code in the .o files and finishes optimizing at link time.