Search code examples
c++ccompiler-construction

In the compilation system, how does linker (ld) know who to link myprogram.o to?


I recently read the CSAPP and had some doubts about the compilation system part of it.

Now we have a sample using HelloWorld.c(just print hello world). The book said in Pre-processor phase, they replace the "#include " line with the content of this header file. But when I open the stdio.h, I find that there is only a declaration for printf() and there is no concrete implementation. So in the compilation system, when will the specific implementation of printf() be introduced?

And the book also said, in linking phase, the linker(ld) linked helloworld.o and printf.o . Why the linker knows to link my object file to printf.o? In a compilation system, why does it declare this function in the first step(Pre-processor phase) and link the concrete implementation in the last step(linking phase)?


Solution

  • Practically, over-simplified:

    • You can compile a function into a library (ex. .a or .so file on unix).
    • The library has a function body (assembly instructions) and a function name. Ex. the library libc.so has printf function that starts at character number 0xaabbccdd in the library file libc.so.
    • You want to compile your program.
    • You need to know what arguments printf takes. Does it take int ? Does it take char *? Does it take uint_least64_t? It's in the header file - int printf(const char *, ...);. The header tells the compiler how to call the function (what parameters does the function take and what type it returns). Note that each .c file is compiled separately.
    • The function declaration (what arguments the function takes and what does it return) is not stored in the library file. It is stored in the header (only). The library has function name (only printf) and compiled function body. The header has int printf(const char *, ...); without function body.
    • You compile your program. The compiler generates the code, so that arguments with proper size are pushed onto the stack. And from the stack your code takes variable returned from the function. Now your program is compiled into assembly that looks like push pointer to "%d\n" on the stack; push some int on the stack; call printf; pop from the stack the returned "int"; rest of the instructions;.
    • Linker searches through your compiled program and it sees call printf. It then says: "Och, there is no printf body in your code". So then it searches printf in the libraries, to see where it is. The linker goes through all the libraries you link your program with and it finds printf in the standard library - it's in libc.so at address 0xaabbccdd. So linker substitutes call printf for goto libs.so file to address 0xaabbccdd kind-of instruction.
    • After all "symbols" (ie. function names, variables names) are "resolved" (the linker has found them somewhere), then you can run your program. The call printf will jump into the file libc.so at specified location.

    What I have written above is only for illustration purposes.