Lets have a small class called myClass. I was interested how does look the difference in .asm when method is inlined or not. I made two programs, with and without inline keyword in cpp file, but the .asm output was the same. I know that the inline is just a hint for compiler, and with the high probability I was a victim of an optimization, but is it possible to see the difference on a small cpp example of inlined and not inlined method in asm?
h:
#ifndef CLASS_H
#define CLASS_H
class myClass{
private:
int a;
public:
int getA() const;
};
#endif
cpp:
#include <class.h>
inline int myCLass::getA() const{
return a;
};
main:
#include "class.h"
int main(){
myClass a;
a.getA();
return 0;
}
gcc:
gcc -S -O0 main.cpp
asm output in both cases:
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 10, 14
.globl _main ## -- Begin function main
.p2align 4, 0x90
_main: ## @main
.cfi_startproc
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $16, %rsp
leaq -8(%rbp), %rdi
movl $0, -4(%rbp)
callq __ZNK7myClass4getAEv
xorl %ecx, %ecx
movl %eax, -12(%rbp) ## 4-byte Spill
movl %ecx, %eax
addq $16, %rsp
popq %rbp
retq
.cfi_endproc
## -- End function
.subsections_via_symbols
gcc -O0
doesn't enable -finline-functions
, so even if the functions were in the same file it wouldn't try. See also Why is this C++ wrapper class not being inlined away?. (Don't bother trying to use __attribute__((always_inline))
: you'll get inlining, things won't optimize away.
You could get things inlined with gcc -O3 -fwhole-program *.cpp
to enable inlining across source files. (Regardless of whether they were declared inline
or not, it's just up to the compiler to decide what's best).
The main point of inline
is to let the compiler know that it doesn't need to emit a stand-alone definition of a function if it does choose to inline it into all callers. (Because a definition, not just a declaration, of this function will appear in all translation units that use it. So if some other file decides not to inline it, a definition can be emitted there.)
Modern compilers still use their normal heuristics to decide whether it's worth inlining or not. e.g. a large function with multiple callers will probably not be inlined, to avoid code bloat. static
tells the compiler that no other translation unit can see the function, so if there's only one caller in this file it will very likely inline there. (If you have a large function, it's a bad idea to make it static inline
. You'll get a copy of the definition in each file where it doesn't inline, and too aggressive inlining. For a small function that's probably going to inline everywhere, you should probably still just use inline
, not static inline
, so in case anything takes the address of the function there will only be one definition shared across all files. inline
tells the linker to merge duplicate definitions of a function instead of erroring. This behaviour is one of the more important parts of what inline
really does, not the actual hint to the compiler that you want it to inline.)
gcc -fwhole-program
(with all the source files on the same command line) gives the compiler enough information to make all these decision itself. It can see if a function only has one caller across the whole program, and inline it instead of creating a stand-alone definition plus arg setup and a call
.
gcc -flto
allows link-time optimization similar to whole-program, but doesn't require all the .cpp
files on the command line at once. Instead it stores GIMPLE code in the .o
files and finishes optimizing at link time.