Search code examples
xcodeclanglddarwin

xcode ld detect duplicate symbol in static libraries


This question has been asked previously for gcc, but Darwin's ld (clang?) appears to handle this differently.

Say I have a main() function defined in two files, main1.cc and main2.cc. If I attempt to compile these both together I'll get (the desired) duplicate symbol error:

$ g++ -o main1.o -c main1.cc
$ g++ -o main2.o -c main2.cc
$ g++ -o main main1.o main2.o
duplicate symbol _main in:
    main1.o
    main2.o
ld: 1 duplicate symbol for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

But if I instead stick one of these into a static library, when I go to link the application I won't get an error:

$ ar rcs libmain1.a main1.o
$ g++ -o main libmain1.a main2.o
(no error)

With gcc you can wrap the lib with --whole-archive and then gcc's ld will produce an error. This option is not available with the ld that ships w/ xcode.

Is it possible to get ld to print an error?


Solution

  • I'm sure you know that you're not supposed to put an object file containing a main function in a static library. In case any of our readers doesn't: A library is for containing functions that may be reused by many programs. A program can contain only one main function, and the likelihood is negligible that the main function of program will be reusable as the main function of another. So main functions don't go in libraries. (There are a few odd exceptions to this rule).

    On then to the problem you're worried about. For simplicity, I'll exclude linkage of shared/dynamic libraries from consideration in the rest of this.

    Your linker detects a duplicate symbol error (a.k.a. multiple definition error) in the linkage when the competing definitions are in different input object files but doesn't detect it when one definition is an input object file and the other is in an input static library. In that scenario, the GNU linker can detect the multiply defined symbol if it is passed the --whole-archive option before the static library. But your linker, the Darwin Mach-O linker, doesn't have that option.

    Note that while your linker doesn't support --whole-archive, it has an equivalent option -all_load. But don't run away with that, because the worry is groundless anyhow. For both linkers:

    • There really is a multiple definition error in the linkage in the [foo.o ... bar.o] case.

    • There really is not a multiple definition error in the linkage in the [foo.o ... libbar.a] case.

    And in addition for the GNU linker:

    • There really is a multiple definition error in the linkage in the [foo.o ... --whole-archive libbar.a] case.

    In no case does either linker allow multiple definitions of a symbol to get into your program undetected and arbitrarily use one of them.

    What's the difference between linking foo.o and linking libfoo.o?

    The linker will only add object files to your program. More precisely, when it meets an input file foo.o, it adds to your program all the symbol references and symbol definitions from foo.o. (For starters at least: it may finally discard unused definitions if you've requested that, and if it can do so without collaterally discarding any used ones).

    A static library is just a bag of object files. When the linker meets an input file libfoo.a, by default it won't add any of the object files in the bag to your program.

    It will only inspect the contents of the bag if it has to, at that point in the linkage.

    It will have to inspect the contents of the bag if it has already added some symbol references to your program that don't have definitions. Those unresolved symbols might be defined in some of the object files in the bag.

    If it has to look in the bag, then it will inspect the object files to see if any of them contain definitions of unresolved symbols already in the program. If there are any such object files then it will add them to the program and consider afresh whether it needs to keep looking in the bag. It stops looking in the bag when it finds no more object files in it that the program needs or has found definitions for all symbols referenced by the program, whichever comes first.

    If any object files in the bag are needed, this adds at least one more symbol definition to your program, and possibly more unresolved symbols. Then the linker carries on. Once it has met libfoo.a and considered which, if any, object files in that bag it needs for your program, it won't consider it again, unless it meets it again, later in the linkage sequence.

    So...

    Case 1. The input files contain [foo.o ... bar.o]. Both foo.o and bar.o define symbol A. Both object files must be linked, so both definitions of A must be added to the program and that is a multiple definition error. Both linkers detect it.

    Case 2 The input files contain [foo.o ... libbar.a].

    • libbar.a contains object files a.o and b.o.
    • foo.o defines symbol A and references, but does not define, symbol B.
    • a.o also defines A but does not define B, and defines no other symbols that are referenced by foo.o.
    • b.o defines symbol B.

    Then:-

    • At foo.o, the object file must be linked. The linker adds the definition of A and an unresolved reference to B to the program.
    • At libbar.a, the linker needs a definition for unresolved reference B so it looks in the bag.
    • a.o does not define B or any other unresolved symbol. It is not linked. The second definition of A is not added.
    • b.o defines B, so it is linked. The definition of B is added to the program.
    • The linker carries on.

    No two object files that both define A are needed in the program. There is no multiple definition error.

    Case 3 The input files contain [foo.o ... libbar.a].

    • libbar.a contains object files a.o and b.o.
    • foo.o defines symbol A. It references but does not define, symbols B and C.
    • a.o also defines A and it defines B, and defines no other symbols that are referenced by foo.o.
    • b.o defines symbol C.

    Then:-

    • At foo.o, the object file is linked. The linker adds to the program the definition of A and a unresolved references to B and C.
    • At libbar.a, the linker needs definitions for unresolved referencesB and C so it looks in the bag.
    • a.o does not define C. But it does define B. So a.o is linked. That adds the required definition of B, plus the not-required, surplus definition of A.

    That is a multiple definition error. Both linkers detect it. Linkage ends.

    There is a multiple definition error if and only if two definitions of some symbol are contained in object files that are linked in the program. Object files from a static library are linked only to provide definitions of symbols that the program references. If there is a multiple definition error, then both linkers detect it.

    So why does the GNU linker option --whole-archive give different outcomes?

    Suppose that libbar.a contains a.o and b.o. Then:

    foo.o --whole-archive -lbar
    

    tells the linker to link all the object files in libbar.a whether they are needed or not. So this fragment of the linkage command is simply equivalent to:

    foo.o a.o b.o
    

    Thus in case 2 above, the addition of --whole-archive is a way of creating a multiple definition error where there is none without it. Not a way of detecting a multiple definition error that was not detected without it.

    And if --whole-archive is mistakenly is used as a way "detecting" fictitious multiple definition errors, then in those cases where the linkage nevertheless succeeds, it is also a way of adding an unlimited amount of redundant code to the program. The same goes for the -all_load option of the Mach-O linker.

    Not satisfied?

    Even when all that is clear, maybe you still hanker for some way to make it an error when an input object file in your linkage defines a symbol that is also defined in another object file that is not needed by the linkage but happens to be contained in some input static library.

    Well, that might be a situation that you want to know about, but it just isn't any kind of linkage error, multiple-definition or otherwise. The purpose of static libraries in linkage is to provide default definitions of symbols that you don't define in the input object files. Provide your own definition in an object file and the libary default is ignored.

    If you don't want linkage to work like that - the way it is intended to work - but:-

    • You still want to use a static library
    • You don't want any definition from an input object file ever to prevail over one that's in a member of the static library
    • You don't want to link any redundant object files.

    then the simplest solution (though not necessarily the least time-consuming at build time) is this:

    In your project build extract all the members of the static library as a prerequisite of the link step in a manner that also gives you the list of their names, e.g.:

    $ LIBFOOBAR_OBJS=`ar xv libfoobar.a | sed 's/x - //'g`
    $ echo $LIBFOOBAR_OBJS
    foo.o bar.o
    

    (But extract them someplace where they cannot clobber any object files you build). Then, again before the link step, run a preliminary throw-away linkage in which $LIBFOOBAR_OBJS replaces libfoobar.a. E.g instead of

    cc -o prog x.o y.o z.o ... -lfoobar ...
    

    run

    cc -o deleteme x.o y.o z.o ... $LIBFOOBAR_OBJS ...
    

    If the preliminary linkage fails - with a multiple definition error or anything else - then stop there. Otherwise go ahead with the real linkage. You won't link any redundant object files in prog. The price is performing a linkage of deleteme that is redundant unless it fails with a multiple definition error1

    In professional practice, nobody runs builds like that to head off the remote possibility that a programmer has defined a function in one of x.o y.o z.o that knocks out a function defined in a member of libfoobar.a without meaning to. Competence and code-review are counted on to avoid that, in the same way they are counted on to avoid a programmer defining a function in x.o y.o z.o to do anything that should be be done using library resources.


    [1] Rather than extracting all the object files from the static library for use in the throw-away linkage, you might consider a throwaway linkage using --whole-archive, with the GNU linker, or -all_load, with the Mach-O linker. But there are potential pitfalls with this approach I won't delve into here.