Search code examples
c++clinkerllvm-clangunix-ar

How to take only required object files inside a single (.a) archive


Just a simple question but I couldn’t found the answer anywhere. When putting all object files in archive, how to instruct clang++ to only take required objects file for linking in order to avoid undifined symbols errors because of symbols not required in the archive ?


Solution

  • You won't have been able to find the answer you're seeking because what you want to make the linker do is what it does by default. Here's a demonstration. (It's in C rather than C++ merely to spare us the obfuscation of C++ name-mangling).

    Three source files:

    alice.c

    #include <stdio.h>
    
    void alice(void)
    {
        puts("alice");
    }
    

    bob.c

    #include <stdio.h>
    
    void bob(void)
    {
        puts("bob");
    }
    

    mary.c

    #include <stdio.h>
    
    void mary(void)
    {
        puts("mary");
    }
    

    Compile them and put the object files in an archive:

    $ clang -Wall -c alice.c
    $ clang -Wall -c bob.c
    $ clang -Wall -c mary.c
    $ ar rc libabm.a alice.o bob.o mary.o
    

    Here's the member list of the archive:

    $ ar -t libabm.a
    alice.o
    bob.o
    mary.o
    

    And here are the symbol tables of those members:

    $ nm libabm.a
    
    alice.o:
    0000000000000000 T alice
                     U puts
    
    bob.o:
    0000000000000000 T bob
                     U puts
    
    mary.o:
    0000000000000000 T mary
                     U puts
    

    where T denotes a defined function and U an undefined one. puts is defined in the standard C library, which will be linked by default.

    Now here's a program that calls alice externally, and so is dependent on alice.o:

    sayalice.c

    extern void alice(void);
    
    int main(void)
    {
        alice();
        return 0;
    }
    

    And here's another program that calls alice and bob externally, thus being dependent on alice.o and bob.o.

    sayalice_n_bob.c

    extern void alice(void);
    extern void bob(void);
    
    int main(void)
    {
        alice();
        bob();
        return 0;
    }
    

    Compile both those sources as well:

    $ clang -Wall -c sayalice.c
    $ clang -Wall -c sayalice_n_bob.c
    

    The linker option -trace instructs the linker to report the object files and DSOs that are linked. We'll use it now to link program sayalice using sayalice.o and libabm.a:

    $ clang -o sayalice sayalice.o -L. -labm -Wl,-trace
    /usr/bin/ld: mode elf_x86_64
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crt1.o
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crti.o
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtbegin.o
    sayalice.o
    (./libabm.a)alice.o
    libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
    /lib/x86_64-linux-gnu/libc.so.6
    (/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
    /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
    /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
    libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtend.o
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crtn.o
    

    We see all the boilerplate C libraries and runtimes are linked. And of the object files that we created, just two are linked:

    sayalice.o
    (./libabm.a)alice.o
    

    The two members of libabm.a that our program does not depend on:

    (./libabm.a)bob.o
    (./libabm.a)mary.o
    

    were not linked.

    Running the program:

    $ ./sayalice
    alice
    

    it says "alice".

    Then for comparison we'll link program sayalice_n_bob, again with -trace:

    $ clang -o sayalice_n_bob sayalice_n_bob.o -L. -labm -Wl,-trace
    /usr/bin/ld: mode elf_x86_64
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crt1.o
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crti.o
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtbegin.o
    sayalice_n_bob.o
    (./libabm.a)alice.o
    (./libabm.a)bob.o
    libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
    /lib/x86_64-linux-gnu/libc.so.6
    (/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
    /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
    /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
    libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtend.o
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crtn.o
    

    This time, three of our object files were linked:

    sayalice_n_bob.o
    (./libabm.a)alice.o
    (./libabm.a)bob.o
    

    And the only member of libabm.a that the program does not depend on:

    (./libabm.a)mary.o
    

    was not linked.

    This program runs like:

    $ ./sayalice_n_bob
    alice
    bob
    

    Here's the global symbol table of the program:

    $ nm -g sayalice_n_bob
    0000000000400520 T alice
    0000000000400540 T bob
    0000000000601030 B __bss_start
    0000000000601020 D __data_start
    0000000000601020 W data_start
    0000000000601028 D __dso_handle
    0000000000601030 D _edata
    0000000000601038 B _end
    00000000004005d4 T _fini
                     w __gmon_start__
    00000000004003d0 T _init
    00000000004005e0 R _IO_stdin_used
    00000000004005d0 T __libc_csu_fini
    0000000000400560 T __libc_csu_init
                     U __libc_start_main@@GLIBC_2.2.5
    00000000004004f0 T main
                     U puts@@GLIBC_2.2.5
    0000000000400410 T _start
    0000000000601030 D __TMC_END__
    

    with alice and bob, but not mary.

    So as you see, the linker's default behaviour is the behaviour you are asking how to get. To stop the linker from extracting only archive members that are referenced in the linkage and instead to link all archive members, you have to tell it expressly to do so, by placing the archive within the scope of a --whole-archive option in the linkage commandline:

    $ clang -o sayalice_n_bob sayalice_n_bob.o -L. -Wl,--whole-archive -labm -Wl,--no-whole-archive -Wl,-trace
    /usr/bin/ld: mode elf_x86_64
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crt1.o
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crti.o
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtbegin.o
    sayalice_n_bob.o
    (./libabm.a)alice.o
    (./libabm.a)bob.o
    (./libabm.a)mary.o
    libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
    /lib/x86_64-linux-gnu/libc.so.6
    (/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
    /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
    /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
    libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtend.o
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crtn.o
    

    There you see that all the archive members are linked:

    (./libabm.a)alice.o
    (./libabm.a)bob.o
    (./libabm.a)mary.o
    

    And the program now defines all of alice, bob and mary:

    $ nm -g sayalice_n_bob
    0000000000400520 T alice
    0000000000400540 T bob
    0000000000601030 B __bss_start
    0000000000601020 D __data_start
    0000000000601020 W data_start
    0000000000601028 D __dso_handle
    0000000000601030 D _edata
    0000000000601038 B _end
    00000000004005f4 T _fini
                     w __gmon_start__
    00000000004003d0 T _init
    0000000000400600 R _IO_stdin_used
    00000000004005f0 T __libc_csu_fini
    0000000000400580 T __libc_csu_init
                     U __libc_start_main@@GLIBC_2.2.5
    00000000004004f0 T main
    0000000000400560 T mary
                     U puts@@GLIBC_2.2.5
    0000000000400410 T _start
    0000000000601030 D __TMC_END__
    

    although it never calls mary.

    And a step back

    You've asked this question because you believe that if you can link from an archive only those object files that define symbols already referenced in the linkage then the linkage cannot fail with undefined references to symbols that the program never uses. But that isn't true, and here is a demonstration that it isn't.

    Another source file:

    alice2.c

    #include <stdio.h>
    
    extern void david(void);
    
    void alice(void)
    {
        puts("alice");
    }
    
    void dave(void)
    {
        david();
    }
    

    Compile that:

    $ clang -Wall -c alice2.c
    

    Replace alice.o with alice2.o in libabm.a:

    $ ar d libabm.a alice.o
    $ ar r libabm.a alice2.o
    

    Then try to link program sayalice as before:

    $ clang -o sayalice sayalice.o -L. -labm -Wl,-trace
    /usr/bin/ld: mode elf_x86_64
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crt1.o
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crti.o
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtbegin.o
    sayalice.o
    (./libabm.a)alice2.o
    libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
    /lib/x86_64-linux-gnu/libc.so.6
    (/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
    /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
    /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
    libgcc_s.so.1 (/usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/libgcc_s.so.1)
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/crtend.o
    /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../x86_64-linux-gnu/crtn.o
    ./libabm.a(alice2.o): In function `dave':
    alice2.c:(.text+0x25): undefined reference to `david'
    /usr/bin/ld: link errors found, deleting executable `sayalice'
    clang: error: linker command failed with exit code 1 (use -v to see invocation)
    

    This time, the only archive member that gets linked is:

    (./libabm.a)alice2.o
    

    because only alice is called in sayalice.o. Nevertheless the linkage fails with an undefined reference to function david, which the program never calls. david is called only in the definition of function dave, and dave is never called.

    Although dave is never called, its definition is linked because it lies in an object file, alice2.o, that is linked to provide a definition of function alice - which is called. And with the definition of dave in the linkage, the call to david becomes an unresolved reference for which the linkage by default must find a definition, or fail. So it fails.

    You see then that the failure of a linkage through undefined reference to a symbol that the program never uses is consistent with the fact that the linker does not link unreferenced object files from an archive.

    How to survive undefined references to symbols you don't use

    If you face this sort of linkage failure, you can avoid it by directing the linker to tolerate undefined references. You can direct it simply to ignore all undefined references, like:

    $ clang -o sayalice sayalice.o -L. -labm -Wl,--unresolved-symbols=ignore-all
    $ ./sayalice
    alice
    

    Or more prudently, you can direct it to just to give warnings, rather than fail, for undefined references, like:

    $ clang -o sayalice sayalice.o -L. -labm -Wl,--warn-unresolved-symbols
    ./libabm.a(alice2.o): In function `dave':
    alice2.c:(.text+0x25): warning: undefined reference to `david'
    $ ./sayalice
    alice
    

    This way, you can check in the diagnostics that the only undefined symbols are the ones you are expecting.