Search code examples
creferencelinkeroperating-systemstatic-linking

How does static linking without an archive file work?


I have two files

main.c

void swap();

int buf[2] = {1, 2}; 

int main() 
{
    swap();
    return 0;
}

swap.c

extern int buf[];

int* bufp0 = &buf[0]; /* .data */
int* bufp1; /* .bss */

void swap()
{
    int temp;
    
    bufp1 = &buf[1];
    temp = *bufp0;
    *bufp0 = *bufp1;
    *bufp1 = temp;
}

Here are 2 excerpts from a book

During this scan, the linker maintains a set E of relocatable object files that 
will be merged to form the executable, a set U of unresolved symbols 
(i.e., symbols referred to, but not yet defined), and a set D of symbols that 
have been defined in previous input files.
Initially, E, U , and D are empty.

For each input file f on the command line, the linker determines if f is an
object file or an archive. If f is an object file, the linker adds f to E, updates
U and D to reflect the symbol definitions and references in f , and proceeds
to the next input file.

If f is an archive, the linker attempts to match the unresolved symbols in U
against the symbols defined by the members of the archive. If some archive
member, m, defines a symbol that resolves a reference in U , then m is added
to E, and the linker updates U and D to reflect the symbol definitions and
references in m. This process iterates over the member object files in the
archive until a fixed point is reached where U and D no longer change. At
this point, any member object files not contained in E are simply discarded
and the linker proceeds to the next input file.

If U is nonempty when the linker finishes scanning the input files on the
command line, it prints an error and terminates. Otherwise, it merges and
relocates the object files in E to build the output executable file.
The general rule for libraries is to place them at the end of the command
line. If the members of the different libraries are independent, in that no member
references a symbol defined by another member, then the libraries can be placed
at the end of the command line in any order.

If, on the other hand, the libraries are not independent, then they must be
ordered so that for each symbol s that is referenced externally by a member of an
archive, at least one definition of s follows a reference to s on the command line.

For example, suppose foo.c calls functions in libx.a and libz.a that call func-
tions in liby.a. Then libx.a and libz.a must precede liby.a on the command
line:

unix> gcc foo.c libx.a libz.a liby.a

I ran the following command to statically link the two object files ( without creating any archive file )

gcc -static -o main.o main.c swap.c

I expected the above command to fail because both main.c and swap.c have references that are defined in each other. But contrary to my expectations, it was successful. I expect it to be successful only if I pass main.c again at the end of the command line.

How did the linker resolve the references in both the files in this case? Does the working of a linker differ when it tries to statically link multiple object files instead of archive files? My guess is that the linker circled back to main.c to resolve the reference buf in swap.c.


Solution

  • Generally, the default behavior of linkers is to include everything from each object module file given to it and to take from a library only the object modules that define references the linker is aware of when processing the library.

    So, when the linker processes main.o, it prepares everything in it to go into the output file it is building. That includes remembering (whether in memory or with auxiliary files the linker maintains temporarily) all the symbols defined by main.o and all the symbols that main.o has unresolved references to. When the linker processes swap.o, it adds everything from swap.o into the output file it is building. Further, for any references in main.o that are satisfied by definitions in swap.o, it resolves those references. And, for any references in swap.o that are satisfied by definitions in main.o, it resolves those references.

    As the text you quote says, for an object module file:

    “(...) the linker adds f to E, updates U and D to reflect the symbol definitions and references in f, and proceeds to the next input file.”

    That step is actually the same for each object module the linker adds to the executable, whether the object module comes from an object module file or comes from a library file. The difference is that if the object module is in a file, then the linker adds it to the executable unconditionally, but, if the object module is in a library, the linker adds it to the executable only if it defines a symbol the linker is currently seeking.