Search code examples
c++g++cygwinfilesize

Massive executable size with -L/usr/local/lib


I encountered today a bizarre problem relating to binary sizes produced by (cygwin) g++.

When compiling a C++ program that uses standard library functions and passing -L/usr/local/lib as an option, the binary size is absolutely massive (12MB massive).

iostream seems to have the largest effect from what I've tested.

I've identified the 30MB file libstdc++.a in /usr/local/lib as the source of the problem through trial and error. That is, I copied the contents of /usr/local/lib into a separate dir, added that one to the link path instead of /usr/local/lib, and deleted files until the binary size dropped to normal.

Trials:

KEY:
(<group>)
`<command>` -> <size of resulting binary in bytes>

control 1 (literally nothing) [no effect]:

int main() {}
(control)
`g++ -Wall test.cpp` -> 159,574
`g++ -Wall -Os test.cpp` -> 159,610
`g++ -Wall -Os -s test.cpp` -> 8,704
(test)
`g++ -Wall test.cpp -L/usr/local/lib` -> 159,574
`g++ -Wall -Os test.cpp -L/usr/local/lib` -> 159,610
`g++ -Wall -Os -s test.cpp -L/usr/local/lib` -> 8,704

control 2 (using some other library dynamically - eg stb_image...) [no effect]:

#include "stb_image.h"
int main() {
    int width, height, channels;
    unsigned char *data = stbi_load("image.jpg", &width, &height, &channels, 0);
}
(control)
`g++ -Wall test.cpp stb_image.so` -> 159,944
`g++ -Wall -Os test.cpp stb_image.so` -> 159,980
`g++ -Wall -Os -s test.cpp stb_image.so` -> 8,704
(test)
`g++ -Wall test.cpp -L/usr/local/lib stb_image.so` -> 159,944
`g++ -Wall -Os test.cpp -L/usr/local/lib stb_image.so` -> 159,980
`g++ -Wall -Os -s test.cpp -L/usr/local/lib stb_image.so` -> 8,704

vector (templates) [slight effect]:

#include <vector>
int main() {
    std::vector<int> v;
    v.push_back(2);
}
(control)
`g++ -Wall test.cpp` -> 190,228
`g++ -Wall -Os test.cpp` -> 160,429
`g++ -Wall -Os -s test.cpp` -> 8,704
(test)
`g++ -Wall test.cpp -L/usr/local/lib` -> 1,985,106
`g++ -Wall -Os test.cpp -L/usr/local/lib` -> 906,760
`g++ -Wall -Os -s test.cpp -L/usr/local/lib` -> 72,192

iostream [major effect]:

#include <iostream>
int main() {
    std::cout << "iostream" << std::endl;
}
(control)
`g++ -Wall test.cpp` -> 161,829
`g++ -Wall -Os test.cpp` -> 161,393
`g++ -Wall -Os -s test.cpp` -> 8,704
(test)
`g++ -Wall test.cpp -L/usr/local/lib` -> 11,899,614
`g++ -Wall -Os test.cpp -L/usr/local/lib` -> 11,899,344
`g++ -Wall -Os -s test.cpp -L/usr/local/lib` -> 828,416

Cannot be replicated in C w/ gcc, which makes sense I guess given the problem file is named libstdc++.

If you need more trials, let me know.

My question is: Why? Why does adding a directory with libstdc++.a to the search path increase the binary size so? To my knowledge, nothing should be linked from the linker search path unless stated explicitly with -l<library>. Does it have something to do with /usr/local/lib being searched first, and -lstdc++ being added implicitly, therefore perhaps linking the wrong library...?


Solution

  • The immediate (and perhaps obvious) reason for the program bloat - when you link the program in the bloating way - is that all of the symbols that it references in the Standard C++ library are statically bound to the definitions found in the archive /usr/local/lib/libstdc++.a. All of the archived object files that contain those definitions are extracted by the linker and physically merged into the output program.

    When you link the program in normal way rather than the bloating way, the same symbols are dynamically bound to the definitions provided by the DSO libstdc++.so that the linker locates in one of its search directories other than /usr/local/lib/. In this case no object code is merged into the program. The linker merely annotates it so that the runtime loader, on launching it, will load that libstdc++.so into the same process and patch the dynamic references with their runtime addresses.

    Why does adding a directory with libstdc++.a to the search path increase the binary size so? To my knowledge, nothing should be linked from the linker search path unless stated explicitly with -l

    Your statement in the second sentence is strictly true. However, you are not composing or seeing the whole linker commandline. g++ - the GNU frontend driver for C++ compilation and linkage - is composing the linker's commandline, behind the scenes, when compilation is done and it is ready to do linkage. It accepts whatever linkage options you have specified in its own commandline, converts them if necessary into equivalent options understood by the the system linker, ld, adds them to a new commandline, then appends to it a lot of boilerplate linker options that are invariant for C++ programs, and finally passes this new commandline to ld to perform the linkage. (This is simplifiying somewhat, but essentially what happens)

    If you had to invoke ld yourself to link a C++ program at the command prompt, you would have to type all the boilerplate yourself each time and would never be able to remember it all. If you'd like to to see all of it, then add the -v ( = verbose) option when you invoke g++. Other GCC frontends perform the same function for other languages: gcc for C linkages; gfortran for Fortran linkages, gnat for ADA linkages, etc.

    In among all the boilerplate that g++ by default adds to the linkage options you will find the like of this:-

    ...
    -L/usr/lib/gcc/x86_64-linux-gnu/9 \
    -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu \
    -L/usr/lib/gcc/x86_64-linux-gnu/9/../../../../lib \
    -L/lib/x86_64-linux-gnu \
    -L/lib/../lib \
    -L/usr/lib/x86_64-linux-gnu \
    -L/usr/lib/../lib
    -L/usr/lib/gcc/x86_64-linux-gnu/9/../../.. \
    ...
    -lstdc++
    ...
    

    That's from my own Ubuntu 19.10 system. So you see, if link a program with g++ then you are passing -lstdc++ to the linker by default. If you didn't pass it, then any external references made in your code to symbols of the Standard C++ library could not be resolved and the linkage would fail for undefined references.

    The next question is how the linker resolves -lstdc++ to a physical static or shared library somewhere in its search path and uses it.

    By default, its like this. A library option -lname directs the linker to search, first in the specified -Ldir directories, in their commandline order, and then in its default search directories, in their configured order, for either of the files libname.a (a static library) or libname.so (a dynamic library). If and when it finds either one of them it stops searching and inputs that file to the linkage. If it finds both of them in the same search directory then it chooses libname.so. If it inputs a shared library then it performs dynamic symbol binding with the library, which does not add any object files to the program. If it inputs a static library then it performs static symbol binding, which does add object files to the program.

    If you'd like to know that the linker's default search directories - where it will search after exhausting the -L directories - and what order they come in you need run the ld itself with the -verbose option. You can do this by passing -Wl,-verbose to the front end.

    Near the start of the -verbose linker output you will find the like of:

    SEARCH_DIR("=/usr/local/lib/x86_64-linux-gnu"); \
    SEARCH_DIR("=/lib/x86_64-linux-gnu"); \
    SEARCH_DIR("=/usr/lib/x86_64-linux-gnu"); \
    SEARCH_DIR("=/usr/lib/x86_64-linux-gnu64"); \
    SEARCH_DIR("=/usr/local/lib64"); \
    SEARCH_DIR("=/lib64"); \
    SEARCH_DIR("=/usr/lib64"); \
    SEARCH_DIR("=/usr/local/lib"); \
    SEARCH_DIR("=/lib"); \
    SEARCH_DIR("=/usr/lib"); \
    SEARCH_DIR("=/usr/x86_64-linux-gnu/lib64"); \
    SEARCH_DIR("=/usr/x86_64-linux-gnu/lib");
    

    That is the linker's built-in directory search order. Notice that it contains /usr/local/lib. So it is never necessary for you to specify -L/usr/local/lib (or any of those directories) in the frontend commandline, for g++ or any other frontend, unless you want to change the directory search-order.

    The positions in which -Ldir options appear in the linker commandline in relation to -lname options doesn't matter. All of the -Ldir options apply for all of the -lname options. But the order in which -Ldir options appear with respect to each other matters, and likewise for the -lname options.

    If you link your program with no unncessary linkage options:

    g++ -Wall test.cpp
    

    the linker is going to search for a physical library satisfying -lstdc++.

    On my system, the first directory it is going to search is /usr/lib/gcc/x86_64-linux-gnu/9, and there it will find:

    $ ls /usr/lib/gcc/x86_64-linux-gnu/9/libstdc++.*
    /usr/lib/gcc/x86_64-linux-gnu/9/libstdc++.a  /usr/lib/gcc/x86_64-linux-gnu/9/libstdc++.so
    

    So it has a choice of both libstdc++.a and libstdc++.so and it picks libstdc++.so. Dynamic symbol binding is done. There is no code bloat.

    But if you link your program like:

    g++ -Wall test.cpp -L/usr/local/lib`
    

    when /usr/local/lib/libstdc++.a exists and /usr/local/lib/libstdc++.so does not, then /usr/local/lib/ is searched first; libstdc++.a alone is found there and is statically linked. There is code bloat.

    That situation is abnormal because a conventional and proficient install of of libstd++ into /usr/local/lib should place both the static and shared libraries there, so there would still be no code bloat. Your question gives me no insight into how that situation might have arisen.

    When you deleted /usr/local/lib/libstdc++.a you found that the program size reverted to normal. That is because in the absense of that file, the first library satisfing -lstdc++ that the linker found was once again its usual libstdc++.so.

    You observed much less bloat in a program referencing only the <vector> facilities than in one referencing the <iostream> facilities. That's because the <vector> facilities pull much less library code into a static linkage than <iostream>

    In a comment you wonder why the presence of the -shared-libgcc option does not prevent the linkage with /usr/local/lib/libstdc++.a. It is because libgcc is not libstdc++, and -shared-libgcc merely requires the linkage of libgcc.so