Search code examples
c++assemblyx86-64abirelocation

Calculation of relative offset in small code model


I am trying to understand the RIP relative offset used in small-code model. Perhaps the only approachable resource on the internet on this topic is: https://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models But in this post also a few things are not clear. I am using this simple program to understand a few things:

// sample.cc
int arr[10] = {0};
int arr_big[100000] = {0};
int arr2[500] = {0};
int main() {
  int t = 0;
  t += arr[7];
  t +=arr_big[6];
  t += arr2[10];
  return 0;
}

Compilation: g++ -c sample.cc -o sample.o

Object code for .text section:(objdump -dS sample.o)

Disassembly of section .text:

0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
   b:   8b 05 00 00 00 00       mov    0x0(%rip),%eax        # 11 <main+0x11>
  11:   01 45 fc                add    %eax,-0x4(%rbp)
  14:   8b 05 00 00 00 00       mov    0x0(%rip),%eax        # 1a <main+0x1a>
  1a:   01 45 fc                add    %eax,-0x4(%rbp)
  1d:   8b 05 00 00 00 00       mov    0x0(%rip),%eax        # 23 <main+0x23>
  23:   01 45 fc                add    %eax,-0x4(%rbp)
  26:   b8 00 00 00 00          mov    $0x0,%eax
  2b:   5d                      pop    %rbp
  2c:   c3                      ret

Relocation table: (readelf -r sample.o)

Relocation section '.rela.text' at offset 0x1a8 contains 3 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000d  000300000002 R_X86_64_PC32     0000000000000000 arr + 18
000000000016  000400000002 R_X86_64_PC32     0000000000000040 arr_big + 14
00000000001f  000500000002 R_X86_64_PC32     0000000000061ac0 arr2 + 24

From this answer what I understand is, offset is the first byte of text section that has to be modified. Compiler does not know the position of any relocatable entry in advance, that's why it creates the sections filled with 00 which will be populated by the linker. This explanation is understandable once we look at the objdump output. The first relocation has offset 0xd and the "d-th" byte in .text is the section containing 0s in this line 8b 05 00 00 00 00.

So, linker will fill the address of arr in this position. R_X86_64_PC32 means "take the symbol value, add the addend and subtract the offset". I am not understanding this calculation. What do they mean by "symbol value"? In small code model all offsets will be relative to instruction pointer (RIP). So, for the line mov 0x0(%rip),%eax, the RIP value will be next instruction address (0x11). Offset is 0xd and addend is 0x18. So, if we add the addend to RIP and subtract offset (0x11 + 0x18 - 0xd) it becomes 0x1c which is 7th integer (1 int = 4 bytes). It makes sense, because that instruction is trying to access 7th index in array arr. What I don't understand is:

  1. How is the relative offset between RIP and arr calculated. Is it something calculated by linker at linking time?
  2. Why does it need to be 32 bit?
  3. What does sym. value signify in relocation table? I am assuming it is the relative position of the symbols in their sections. E.g. arr has sym value 0 as it is the first entry in .bss section. arr_big has 40 as sym. value as it is the second entry after 40 bytes long arr and arr2 has 0x61ac0 as sym. value as it comes after arr_big and arr (40 + 100000 bytes).

Thanks in advance.


Solution

  • Linker stores sections to memory starting with image base absolute address, say 0x0040_0000. Section .text is stored at 0x0040_1000 and .data at 0x0040_2000.
    The value of symbol main is at offset 0 in section .text, which is then 0x0040_1000.
    The value of symbol arr is at offset 0 in section .data, which is 0x0040_2000.
    Only then are the relocations being resolved.

    Offset Info Type Sym. Value Sym. Name + Addend
    00000000000d 000300000002 R_X86_64_PC32 0000000000000000 arr + 18

    Relocation.Info (2) tells that its type is Program Counter relative 32bit field relocation in .text section at offset 0xd, so its absolute virtual address (VA) is 0x0040_100d.

    Relocation.Info (3) tells that it is related to the 3rd item in the symbol table, which represents .data section with VA 0x0040_2000.

    With knowledge of the final addresses of all symbols the linker can now perform the calculations.
    In the statement mentioned above
    b: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 11 <main+0x11>
    it takes the target symbol arr VA 0x0040_2000, subtracts the field offset 0x0040_100d, adds the addend 0x18 and the result 0x100B is then added to whatever was emitted by compiler to the instruction body (to the value 0x00000000).
    CPU will see the statement mov 0x0(%rip),%eax as b: 8b 05 0b 10 00 00, as you could see in debugger. RIP of this instruction is 0x0040_1011 and when you add the 32bit imm value 0x100b to RIP, register EAX will be loaded from VA 0x0040_201c, which is the 7th DWORD of arr.

    1. What does sym. value signify in relocation table?

    This is information calculated from target value+addend by readelf for your convenience, in fact it is contained in relocation record only indirectly as the high dword of Rela.Info. in the form of ordinal number into symbol table.