Search code examples
linkerldelf

Exactly what is a symbol reference in an object file?


I am reading computer systems from a programmers' perspective, the chapter about linking. It explains how linking works in linux x86-64 using the program ld. The authors claim, that in order to build an executable file from relocatable object files, the linker does 2 things: symbol resolution and relocation. This is their brief overview of what symbol resolution is:

Object files define and reference symbols, where each symbol corresponds to a function, a global variable, or a static variable (i.e., any C variable declared with the static attribute). The purpose of symbol resolution is to associate each symbol reference with exactly one symbol definition.

But they don't clarify what is meant by symbol reference, even when they begin describing symbol resolution in depth. So how exactly are symbols referenced in relocatable object files?


Solution

  • Consider the following source:

    static int foo() { return 42; }
    static int bar() { return foo() + 1; }
    
    extern int baz();
    
    int main()
    {
      return foo() + bar() + baz();
    }
    

    After gcc -c foo.c, the output from objdump -d foo.o on x86_64 Linux is:

    foo.o:     file format elf64-x86-64
    
    Disassembly of section .text:
    
    0000000000000000 <foo>:
       0:   55                      push   %rbp
       1:   48 89 e5                mov    %rsp,%rbp
       4:   b8 2a 00 00 00          mov    $0x2a,%eax
       9:   5d                      pop    %rbp
       a:   c3                      retq
    
    000000000000000b <bar>:
       b:   55                      push   %rbp
       c:   48 89 e5                mov    %rsp,%rbp
       f:   b8 00 00 00 00          mov    $0x0,%eax
      14:   e8 e7 ff ff ff          callq  0 <foo>
      19:   83 c0 01                add    $0x1,%eax
      1c:   5d                      pop    %rbp
      1d:   c3                      retq
    
    000000000000001e <main>:
      1e:   55                      push   %rbp
      1f:   48 89 e5                mov    %rsp,%rbp
      22:   53                      push   %rbx
      23:   48 83 ec 08             sub    $0x8,%rsp
      27:   b8 00 00 00 00          mov    $0x0,%eax
      2c:   e8 cf ff ff ff          callq  0 <foo>
      31:   89 c3                   mov    %eax,%ebx
      33:   b8 00 00 00 00          mov    $0x0,%eax
      38:   e8 ce ff ff ff          callq  b <bar>
      3d:   01 c3                   add    %eax,%ebx
      3f:   b8 00 00 00 00          mov    $0x0,%eax
      44:   e8 00 00 00 00          callq  49 <main+0x2b>
      49:   01 d8                   add    %ebx,%eax
      4b:   48 83 c4 08             add    $0x8,%rsp
      4f:   5b                      pop    %rbx
      50:   5d                      pop    %rbp
      51:   c3                      retq
    

    There are a few things to note here:

    1. Notice how bar calls foo at address 0? How does objdump know that it's foo that's being called? And can it really be at address 0? (Most modern systems map zero page of virtual memory with PROT_NONE, so no read or write access can happen there.)
    2. Notice how call to baz from main is different from calls to foo and bar? The compiler knows where foo and bar are relative to the call instruction itself, but it has no idea where baz will be.

    So, given above info, how can the linker turn this into something sensible? It can't: there is not enough info here.

    In order for the linker to be able to link reference to baz (which we don't yet see) into a call to baz, it needs additional info. On ELF systems, that additional info is written into a special section .rela.text here, which contains:

    $ readelf -Wr foo.o
    
    Relocation section '.rela.text' at offset 0x5d0 contains 1 entries:
        Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
    0000000000000045  0000000b00000002 R_X86_64_PC32          0000000000000000 baz - 4
    

    That is the "reference" that the book talks about, but doesn't define. It tells the linker: if you can find a definition of baz (in some other object), take its address, and put it (actually, &baz - 4 because the CALL instruction is relative to the next instruction after the CALL) into bytes [45-48] of .text section of foo.o.

    And if there is no such definition? The linker will produce an error:

    $ gcc foo.o
    foo.o: In function `main':
    foo.c:(.text+0x45): undefined reference to `baz'
    collect2: error: ld returned 1 exit status
    

    Finally, getting to point 1 above: can the foo really be at address 0?

    No, but the CALL instruction at address 0x14 doesn't actually say CALL 0. It says "call routine at address of the next instruction after the call, minus 25". If that call instruction in the final binary ends up at address 0x400501, then the target of that call will be 0x4004ed, which is where foo will end up (the distance between foo and the CALL will not change when the linker relocates .text section of foo.o to a different address (linker relaxation notwithstanding; but that's a complicated topic for another day).