c linux assembly x86 position-independent-code

Why does the disassembly of a shared object file compiled with -fPIC -shared introduces dummy address like "call 4" and "add 2"?

Consider the following piece of code:

$ cat foo.c
static int foo = 100;

int function(void)
{
    return foo;
}

I understand the dissassembly of libfoo.so

$ gcc -m32 -fPIC -shared -o libfoo.so foo.c
$ objdump -D libfoo.so

000004cc <function>:
 4cc:   55                      push   %ebp
 4cd:   89 e5                   mov    %esp,%ebp
 4cf:   e8 0e 00 00 00          call   4e2 <__x86.get_pc_thunk.cx>
 4d4:   81 c1 c0 11 00 00       add    $0x11c0,%ecx
 4da:   8b 81 18 00 00 00       mov    0x18(%ecx),%eax
 4e0:   5d                      pop    %ebp
 4e1:   c3                      ret    

000004e2 <__x86.get_pc_thunk.cx>:
 4e2:   8b 0c 24                mov    (%esp),%ecx
 4e5:   c3                      ret    
 4e6:   66 90                   xchg   %ax,%ax
...

000016ac <foo>:
    16ac:   64 00 00                add    %al,%fs:(%eax)

In the function the address of foo is computed as 0x4d4 (the value of ecx after the call to __x86.get_pc_thunk.cx) + $0x11c0 + 0x18 = 0x16ac. And 0x16ac is the address of foo.

However I do not understand the disassembly of

$ gcc -m32 -fPIC -shared -c foo.c
$ objdump -D foo.o
00000000 <function>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   e8 fc ff ff ff          call   4 <function+0x4>
   8:   81 c1 02 00 00 00       add    $0x2,%ecx
   e:   8b 81 00 00 00 00       mov    0x0(%ecx),%eax
  14:   5d                      pop    %ebp
  15:   c3                      ret    

00000000 <foo>:
   0:   64 00 00                add    %al,%fs:(%eax)

00000000 <__x86.get_pc_thunk.cx>:
   0:   8b 0c 24                mov    (%esp),%ecx
   3:   c3                      ret

Why call 4 <function+0x4> and why add $0x2,%ecx?

Update: (added -r flag to objdump, -R flag produces the error not a dynamic object, Invalid operation.

$ objdump -D -r foo.o
00000000 <function>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   e8 fc ff ff ff          call   4 <function+0x4>
        4: R_386_PC32   __x86.get_pc_thunk.cx
  8:    81 c1 02 00 00 00       add    $0x2,%ecx
        a: R_386_GOTPC  _GLOBAL_OFFSET_TABLE_
  e:    8b 81 00 00 00 00       mov    0x0(%ecx),%eax
        10: R_386_GOTOFF    .data
 14:    5d                      pop    %ebp
 15:    c3                      ret

Now 4 makes sense in call 4 <function+0x4>, because the offset of this instruction in the text section is 4. I still do not have any clue why 0x2 in add $0x2,%ecx.

Solution

The linker will perform the relocation such that the final value = symbol + offset - PC. Note that the PC in this formula is the address of the relocation itself, not the address of the instruction because the linker has no idea about instruction boundaries. The assembler, however, knows about them and can create the proper offsets.

Let's see how the call __x86.get_pc_thunk.cx works. On x86, the call instruction uses relative addressing, but the value of the PC is already incremented to point to the following instruction. You can verify this in your first dump:

 4cf:   e8 0e 00 00 00          call   4e2 <__x86.get_pc_thunk.cx>
 4d4:   81 c1 c0 11 00 00       add    $0x11c0,%ecx

Notice the offset in the instruction is 0e. The already incremented PC is 4d4 and sure enough the target of the jump 4e2=4d4+0e (all numbers in hex).

Now for the version with the relocation:

   3:   e8 fc ff ff ff          call   4 <function+0x4>
        4: R_386_PC32   __x86.get_pc_thunk.cx

It uses R_386_PC32 but that is at the second byte of the instruction while the call needs an offset from the updated PC which is obviously 4 bytes more. This means the correct result is 4 less, hence the instruction contains fffffffc which is -4. Note that no matter what the address of the call is, this offset is always going to be -4. The disassembler will automatically add this to the updated PC, which in this case is 8, so it arrives at the call 4 by doing 8-4.

Okay, on to the R_386_GOTPC.

   3:   e8 fc ff ff ff          call   4 <function+0x4>
        4: R_386_PC32   __x86.get_pc_thunk.cx
  8:    81 c1 02 00 00 00       add    $0x2,%ecx
        a: R_386_GOTPC  _GLOBAL_OFFSET_TABLE_

The __x86.get_pc_thunk.cx function simply loads the return address from the stack into the register ecx. This return address in this case is 8. The goal to achieve is having the address of _GLOBAL_OFFSET_TABLE_ in ecx. We need to know how far it is from the reference PC already in ecx and add that distance. For this the R_386_GOTPC relocation is used, but that will give an offset from address 0a because that's where the relocation entry is. The offset from address 8 will be of course 2 more. This 2 is what's encoded in the instruction.

To summarize: the relocation offset stored in the instruction is the difference of the relocation address and the required reference point: offset = PC - reference. In the first case, this reference point is 4 bytes higher, in the second case, 2 bytes lower which gives offsets of -4 and 2 respectively.