Search code examples
c++optimizationcompiler-constructiong++llvm

Are the optimizations done in LTO the same as in normal compilation?


While compiling a translation unit the compiler is doing a lot of optimizations - inlining, constant folding/propagation, alias analysis, loop unrolling, dead code elimination and many others I haven't even heard of. Are all of them done when using LTO/LTCG/WPO between multiple translation units or is just a subset (or a variant) of them done (I've heard about inlining)? If not all optimizations are done I would consider unity builds superior to LTO (or maybe using them both when there are more than 1 unity source files).

My guess is that it's not the same (unity builds having the full set of optimizations) and also that it varies a lot across compilers.

The documentation on lto of each compiler doesn't precisely answer this (or I am failing at understanding it).

Since lto involves saving the intermediate representation in the object files in theory LTO could do all the optimizations... right?

Note that I am not asking about build speed - that is a separate issue.

EDIT: I am mostly interested in gcc/llvm.


Solution

  • If you have a look at the gcc documentation you find:

    -flto[=n]

    This option runs the standard link-time optimizer. When invoked with source code, it generates GIMPLE (one of GCC's internal representations) and writes it to special ELF sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.

    To use the link-time optimizer, -flto and optimization options should be specified at compile time and during the final link. For example:

              gcc -c -O2 -flto foo.c
              gcc -c -O2 -flto bar.c
              gcc -o myprog -flto -O2 foo.o bar.o
    

    The first two invocations to GCC save a bytecode representation of GIMPLE into special ELF sections inside foo.o and bar.o. The final invocation reads the GIMPLE bytecode from foo.o and bar.o, merges the two files into a single internal image, and compiles the result as usual. Since both foo.o and bar.o are merged into a single image, this causes all the interprocedural analyses and optimizations in GCC to work across the two files as if they were a single one. This means, for example, that the inliner is able to inline functions in bar.o into functions in foo.o and vice-versa.

    As the documentation tells, yes, all! optimizations are as the program is compiled in a single file. This also can be done with -fwhole-program to get the "same" optimization result.

    If you compile this very simple example:

    f1.cpp:

    int f1() { return 10; }
    

    f2.cpp:

    int f2(int i) { return 2*i; }
    

    main.cpp:

    int main()
    {   
        int res=f1();
        res=f2(res);
        res++;
    
        return res;
    } 
    

    I got as assembler output:

    00000000004005e0 <main>:
      4005e0:   b8 15 00 00 00          mov    $0x15,%eax
      4005e5:   c3                      retq   
      4005e6:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
      4005ed:   00 00 00
    

    All code is inlined as expected.

    My experience is, that the actual gcc optimizes with lto exactly as compiled in a single file. On very rare conditions I got ICE while using lto. But with actual 5.2.0 version I have not seen any ICE again.

    [ICE]-> Internal Compiler Error