Just a simple question but I couldn’t found the answer anywhere. When putting all object files in archive, how to instruct clang++ to only take required objects file for linking in order to avoid undifined symbols errors because of symbols not required in the archive ?
You won't have been able to find the answer you're seeking because what you want to make the linker do is what it does by default. Here's a demonstration. (It's in C rather than C++ merely to spare us the obfuscation of C++ name-mangling).
Three source files:
alice.c
#include <stdio.h>
void alice(void)
{
puts("alice");
}
bob.c
#include <stdio.h>
void bob(void)
{
puts("bob");
}
mary.c
#include <stdio.h>
void mary(void)
{
puts("mary");
}
Compile them and put the object files in an archive:
$ clang -Wall -c alice.c
$ clang -Wall -c bob.c
$ clang -Wall -c mary.c
$ ar rc libabm.a alice.o bob.o mary.o
Here's the member list of the archive:
$ ar -t libabm.a
alice.o
bob.o
mary.o
And here are the symbol tables of those members:
$ nm libabm.a
alice.o:
0000000000000000 T alice
U puts
bob.o:
0000000000000000 T bob
U puts
mary.o:
0000000000000000 T mary
U puts
where T
denotes a defined function and U
an undefined one. puts
is
defined in the standard C library, which will be linked by default.
Now here's a program that calls alice
externally, and so is dependent on
alice.o
:
sayalice.c
extern void alice(void);
int main(void)
{
alice();
return 0;
}
And here's another program that calls alice
and bob
externally, thus
being dependent on alice.o
and bob.o
.
sayalice_n_bob.c
extern void alice(void);
extern void bob(void);
int main(void)
{
alice();
bob();
return 0;
}
Compile both those sources as well:
$ clang -Wall -c sayalice.c
$ clang -Wall -c sayalice_n_bob.c
The linker option -trace
instructs the linker to report the object files and DSOs that are linked. We'll use
it now to link program sayalice
using sayalice.o
and libabm.a
:
$ clang -o sayalice sayalice.o -L. -labm -Wl,-trace
/usr/bin/ld: mode elf_x86_64
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crt1.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crti.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtbegin.o
sayalice.o
(./libabm.a)alice.o
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtend.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crtn.o
We see all the boilerplate C libraries and runtimes are linked. And of the object files that we created, just two are linked:
sayalice.o
(./libabm.a)alice.o
The two members of libabm.a
that our program does not depend on:
(./libabm.a)bob.o
(./libabm.a)mary.o
were not linked.
Running the program:
$ ./sayalice
alice
it says "alice".
Then for comparison we'll link program sayalice_n_bob
, again with -trace
:
$ clang -o sayalice_n_bob sayalice_n_bob.o -L. -labm -Wl,-trace
/usr/bin/ld: mode elf_x86_64
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crt1.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crti.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtbegin.o
sayalice_n_bob.o
(./libabm.a)alice.o
(./libabm.a)bob.o
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtend.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crtn.o
This time, three of our object files were linked:
sayalice_n_bob.o
(./libabm.a)alice.o
(./libabm.a)bob.o
And the only member of libabm.a
that the program does not depend on:
(./libabm.a)mary.o
was not linked.
This program runs like:
$ ./sayalice_n_bob
alice
bob
Here's the global symbol table of the program:
$ nm -g sayalice_n_bob
0000000000400520 T alice
0000000000400540 T bob
0000000000601030 B __bss_start
0000000000601020 D __data_start
0000000000601020 W data_start
0000000000601028 D __dso_handle
0000000000601030 D _edata
0000000000601038 B _end
00000000004005d4 T _fini
w __gmon_start__
00000000004003d0 T _init
00000000004005e0 R _IO_stdin_used
00000000004005d0 T __libc_csu_fini
0000000000400560 T __libc_csu_init
U __libc_start_main@@GLIBC_2.2.5
00000000004004f0 T main
U puts@@GLIBC_2.2.5
0000000000400410 T _start
0000000000601030 D __TMC_END__
with alice
and bob
, but not mary
.
So as you see, the linker's default behaviour is the behaviour you are asking how
to get. To stop the linker from extracting only archive members that are
referenced in the linkage and instead to link all archive members, you have to
tell it expressly to do so, by placing the archive within the scope of a --whole-archive
option in the linkage commandline:
$ clang -o sayalice_n_bob sayalice_n_bob.o -L. -Wl,--whole-archive -labm -Wl,--no-whole-archive -Wl,-trace
/usr/bin/ld: mode elf_x86_64
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crt1.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crti.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtbegin.o
sayalice_n_bob.o
(./libabm.a)alice.o
(./libabm.a)bob.o
(./libabm.a)mary.o
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtend.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crtn.o
There you see that all the archive members are linked:
(./libabm.a)alice.o
(./libabm.a)bob.o
(./libabm.a)mary.o
And the program now defines all of alice
, bob
and mary
:
$ nm -g sayalice_n_bob
0000000000400520 T alice
0000000000400540 T bob
0000000000601030 B __bss_start
0000000000601020 D __data_start
0000000000601020 W data_start
0000000000601028 D __dso_handle
0000000000601030 D _edata
0000000000601038 B _end
00000000004005f4 T _fini
w __gmon_start__
00000000004003d0 T _init
0000000000400600 R _IO_stdin_used
00000000004005f0 T __libc_csu_fini
0000000000400580 T __libc_csu_init
U __libc_start_main@@GLIBC_2.2.5
00000000004004f0 T main
0000000000400560 T mary
U puts@@GLIBC_2.2.5
0000000000400410 T _start
0000000000601030 D __TMC_END__
although it never calls mary
.
And a step back
You've asked this question because you believe that if you can link from an archive only those object files that define symbols already referenced in the linkage then the linkage cannot fail with undefined references to symbols that the program never uses. But that isn't true, and here is a demonstration that it isn't.
Another source file:
alice2.c
#include <stdio.h>
extern void david(void);
void alice(void)
{
puts("alice");
}
void dave(void)
{
david();
}
Compile that:
$ clang -Wall -c alice2.c
Replace alice.o
with alice2.o
in libabm.a
:
$ ar d libabm.a alice.o
$ ar r libabm.a alice2.o
Then try to link program sayalice
as before:
$ clang -o sayalice sayalice.o -L. -labm -Wl,-trace
/usr/bin/ld: mode elf_x86_64
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crt1.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crti.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtbegin.o
sayalice.o
(./libabm.a)alice2.o
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtend.o
/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crtn.o
./libabm.a(alice2.o): In function `dave':
alice2.c:(.text+0x25): undefined reference to `david'
/usr/bin/ld: link errors found, deleting executable `sayalice'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
This time, the only archive member that gets linked is:
(./libabm.a)alice2.o
because only alice
is called in sayalice.o
. Nevertheless the linkage fails
with an undefined reference to function david
, which the program never calls.
david
is called only in the definition of function dave
, and dave
is never called.
Although dave
is never called, its definition is linked because it lies in an
object file, alice2.o
, that is linked to provide a definition of function alice
-
which is called. And with the definition of dave
in the linkage, the call to david
becomes an unresolved reference for which the linkage by default must find a
definition, or fail. So it fails.
You see then that the failure of a linkage through undefined reference to a symbol that the program never uses is consistent with the fact that the linker does not link unreferenced object files from an archive.
How to survive undefined references to symbols you don't use
If you face this sort of linkage failure, you can avoid it by directing the linker to tolerate undefined references. You can direct it simply to ignore all undefined references, like:
$ clang -o sayalice sayalice.o -L. -labm -Wl,--unresolved-symbols=ignore-all
$ ./sayalice
alice
Or more prudently, you can direct it to just to give warnings, rather than fail, for undefined references, like:
$ clang -o sayalice sayalice.o -L. -labm -Wl,--warn-unresolved-symbols
./libabm.a(alice2.o): In function `dave':
alice2.c:(.text+0x25): warning: undefined reference to `david'
$ ./sayalice
alice
This way, you can check in the diagnostics that the only undefined symbols are the ones you are expecting.