I encountered today a bizarre problem relating to binary sizes produced by (cygwin) g++.
When compiling a C++ program that uses standard library functions and passing -L/usr/local/lib
as an option, the binary size is absolutely massive (12MB massive).
iostream
seems to have the largest effect from what I've tested.
I've identified the 30MB file libstdc++.a
in /usr/local/lib
as the source of the problem through trial and error. That is, I copied the contents of /usr/local/lib
into a separate dir, added that one to the link path instead of /usr/local/lib
, and deleted files until the binary size dropped to normal.
Trials:
KEY:
(<group>)
`<command>` -> <size of resulting binary in bytes>
control 1 (literally nothing) [no effect]:
int main() {}
(control)
`g++ -Wall test.cpp` -> 159,574
`g++ -Wall -Os test.cpp` -> 159,610
`g++ -Wall -Os -s test.cpp` -> 8,704
(test)
`g++ -Wall test.cpp -L/usr/local/lib` -> 159,574
`g++ -Wall -Os test.cpp -L/usr/local/lib` -> 159,610
`g++ -Wall -Os -s test.cpp -L/usr/local/lib` -> 8,704
control 2 (using some other library dynamically - eg stb_image...) [no effect]:
#include "stb_image.h"
int main() {
int width, height, channels;
unsigned char *data = stbi_load("image.jpg", &width, &height, &channels, 0);
}
(control)
`g++ -Wall test.cpp stb_image.so` -> 159,944
`g++ -Wall -Os test.cpp stb_image.so` -> 159,980
`g++ -Wall -Os -s test.cpp stb_image.so` -> 8,704
(test)
`g++ -Wall test.cpp -L/usr/local/lib stb_image.so` -> 159,944
`g++ -Wall -Os test.cpp -L/usr/local/lib stb_image.so` -> 159,980
`g++ -Wall -Os -s test.cpp -L/usr/local/lib stb_image.so` -> 8,704
vector (templates) [slight effect]:
#include <vector>
int main() {
std::vector<int> v;
v.push_back(2);
}
(control)
`g++ -Wall test.cpp` -> 190,228
`g++ -Wall -Os test.cpp` -> 160,429
`g++ -Wall -Os -s test.cpp` -> 8,704
(test)
`g++ -Wall test.cpp -L/usr/local/lib` -> 1,985,106
`g++ -Wall -Os test.cpp -L/usr/local/lib` -> 906,760
`g++ -Wall -Os -s test.cpp -L/usr/local/lib` -> 72,192
iostream [major effect]:
#include <iostream>
int main() {
std::cout << "iostream" << std::endl;
}
(control)
`g++ -Wall test.cpp` -> 161,829
`g++ -Wall -Os test.cpp` -> 161,393
`g++ -Wall -Os -s test.cpp` -> 8,704
(test)
`g++ -Wall test.cpp -L/usr/local/lib` -> 11,899,614
`g++ -Wall -Os test.cpp -L/usr/local/lib` -> 11,899,344
`g++ -Wall -Os -s test.cpp -L/usr/local/lib` -> 828,416
Cannot be replicated in C w/ gcc, which makes sense I guess given the problem file is named libstdc++.
If you need more trials, let me know.
My question is: Why? Why does adding a directory with libstdc++.a
to the search path increase the binary size so? To my knowledge, nothing should be linked from the linker search path unless stated explicitly with -l<library>
. Does it have something to do with /usr/local/lib
being searched first, and -lstdc++
being added implicitly, therefore perhaps linking the wrong library...?
The immediate (and perhaps obvious) reason for the program bloat - when you link the program in
the bloating way - is that all of the symbols that it references in the Standard
C++ library are statically bound to the definitions found in the
archive /usr/local/lib/libstdc++.a
. All of the archived object files that
contain those definitions are extracted by the linker and physically merged into
the output program.
When you link the program in normal way rather than the bloating way, the same
symbols are dynamically bound to the definitions provided by the DSO libstdc++.so
that the linker locates in one of its search directories other than
/usr/local/lib/
. In this case no object code is merged into the program. The linker merely annotates
it so that the runtime loader, on launching it, will load that libstdc++.so
into the same process and patch the dynamic references with their runtime
addresses.
Why does adding a directory with libstdc++.a to the search path increase the binary size so? To my knowledge, nothing should be linked from the linker search path unless stated explicitly with -l
Your statement in the second sentence is strictly true. However, you are not composing
or seeing the whole linker commandline. g++
- the GNU frontend driver for C++
compilation and linkage - is composing the linker's commandline, behind the scenes,
when compilation is done and it is ready to do linkage. It accepts whatever
linkage options you have specified in its own commandline, converts them
if necessary into equivalent options understood by the the system linker, ld
, adds them
to a new commandline, then appends to it a lot of boilerplate linker options that are invariant
for C++ programs, and finally passes this new commandline to ld
to perform the linkage.
(This is simplifiying somewhat, but essentially what happens)
If you had to invoke ld
yourself to link a C++ program at the command prompt, you
would have to type all the boilerplate yourself each time and would never be able to
remember it all. If you'd like to to see all of it, then add the -v
( = verbose) option when
you invoke g++
. Other GCC frontends perform the same function for other languages: gcc
for C linkages; gfortran
for Fortran linkages, gnat
for ADA linkages, etc.
In among all the boilerplate that g++
by default adds to the linkage options you will
find the like of this:-
...
-L/usr/lib/gcc/x86_64-linux-gnu/9 \
-L/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu \
-L/usr/lib/gcc/x86_64-linux-gnu/9/../../../../lib \
-L/lib/x86_64-linux-gnu \
-L/lib/../lib \
-L/usr/lib/x86_64-linux-gnu \
-L/usr/lib/../lib
-L/usr/lib/gcc/x86_64-linux-gnu/9/../../.. \
...
-lstdc++
...
That's from my own Ubuntu 19.10 system. So you see, if link a program with g++
then you are passing -lstdc++
to
the linker by default. If you didn't pass it, then any external references made
in your code to symbols of the Standard C++ library could not be resolved and
the linkage would fail for undefined references.
The next question is how the linker resolves -lstdc++
to a physical
static or shared library somewhere in its search path and uses it.
By default, its like this. A library option -lname
directs the linker to search,
first in the specified -Ldir
directories, in their commandline order, and then
in its default search directories, in their configured order, for either of the
files libname.a
(a static library) or libname.so
(a dynamic library). If
and when it finds either one of them it stops searching and inputs that file to
the linkage. If it finds both of them in the same search directory then it chooses
libname.so
. If it inputs a shared library then it performs dynamic symbol binding
with the library, which does not add any object files to the program. If it
inputs a static library then it performs static symbol binding, which does add object
files to the program.
If you'd like to know that the linker's default search directories - where it
will search after exhausting the -L
directories - and what order they come in
you need run the ld
itself with the -verbose
option. You can do this by
passing -Wl,-verbose
to the front end.
Near the start of the -verbose
linker output you will find the like of:
SEARCH_DIR("=/usr/local/lib/x86_64-linux-gnu"); \
SEARCH_DIR("=/lib/x86_64-linux-gnu"); \
SEARCH_DIR("=/usr/lib/x86_64-linux-gnu"); \
SEARCH_DIR("=/usr/lib/x86_64-linux-gnu64"); \
SEARCH_DIR("=/usr/local/lib64"); \
SEARCH_DIR("=/lib64"); \
SEARCH_DIR("=/usr/lib64"); \
SEARCH_DIR("=/usr/local/lib"); \
SEARCH_DIR("=/lib"); \
SEARCH_DIR("=/usr/lib"); \
SEARCH_DIR("=/usr/x86_64-linux-gnu/lib64"); \
SEARCH_DIR("=/usr/x86_64-linux-gnu/lib");
That is the linker's built-in directory search order. Notice that it contains
/usr/local/lib
. So it is never necessary for you to specify -L/usr/local/lib
(or any of those directories) in the frontend commandline, for g++
or any
other frontend, unless you want to change the directory search-order.
The positions in which -Ldir
options appear in the linker commandline in relation
to -lname
options doesn't matter. All of the -Ldir
options apply for all
of the -lname
options. But the order in which -Ldir
options appear with respect to
each other matters, and likewise for the -lname
options.
If you link your program with no unncessary linkage options:
g++ -Wall test.cpp
the linker is going to search for a physical library satisfying -lstdc++
.
On my system, the first directory it is going to search is /usr/lib/gcc/x86_64-linux-gnu/9
,
and there it will find:
$ ls /usr/lib/gcc/x86_64-linux-gnu/9/libstdc++.*
/usr/lib/gcc/x86_64-linux-gnu/9/libstdc++.a /usr/lib/gcc/x86_64-linux-gnu/9/libstdc++.so
So it has a choice of both libstdc++.a
and libstdc++.so
and it picks libstdc++.so
.
Dynamic symbol binding is done. There is no code bloat.
But if you link your program like:
g++ -Wall test.cpp -L/usr/local/lib`
when /usr/local/lib/libstdc++.a
exists and /usr/local/lib/libstdc++.so
does not, then /usr/local/lib/
is searched first; libstdc++.a
alone is found
there and is statically linked. There is code bloat.
That situation is abnormal because a conventional and proficient install of
of libstd++
into /usr/local/lib
should place both the static and shared libraries
there, so there would still be no code bloat. Your question gives me no insight
into how that situation might have arisen.
When you deleted /usr/local/lib/libstdc++.a
you found that the program size
reverted to normal. That is because in the absense of that file, the first
library satisfing -lstdc++
that the linker found was once again its usual
libstdc++.so
.
You observed much less bloat in a program referencing only the <vector>
facilities than in one referencing the <iostream>
facilities. That's because
the <vector>
facilities pull much less library code into a static linkage than <iostream>
In a comment you wonder why the presence of the -shared-libgcc
option does
not prevent the linkage with /usr/local/lib/libstdc++.a
. It is because libgcc
is not libstdc++
,
and -shared-libgcc
merely requires the linkage of libgcc.so