Difference between GOT and GOTOFF

I'm a beginner to 32 bit assembly and I tried to compile a simple C program into Assembly. I understand most of it except when it uses GOTOFF.

    .file   "main.c"
    .text
    .section    .rodata
.LC0:
    .string "Hello world"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    leal    4(%esp), %ecx
    .cfi_def_cfa 1, 0
    andl    $-16, %esp
    pushl   -4(%ecx)
    pushl   %ebp
    .cfi_escape 0x10,0x5,0x2,0x75,0
    movl    %esp, %ebp
    pushl   %ebx
    pushl   %ecx
    .cfi_escape 0xf,0x3,0x75,0x78,0x6
    .cfi_escape 0x10,0x3,0x2,0x75,0x7c
    call    __x86.get_pc_thunk.ax
    addl    $_GLOBAL_OFFSET_TABLE_, %eax
    subl    $12, %esp
    leal    .LC0@GOTOFF(%eax), %edx     # <- Here
    pushl   %edx
    movl    %eax, %ebx
    call    puts@PLT
    addl    $16, %esp
    movl    $0, %eax
    leal    -8(%ebp), %esp
    popl    %ecx
    .cfi_restore 1
    .cfi_def_cfa 1, 0
    popl    %ebx
    .cfi_restore 3
    popl    %ebp
    .cfi_restore 5
    leal    -4(%ecx), %esp
    .cfi_def_cfa 4, 4
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .section    .text.__x86.get_pc_thunk.ax,"axG",@progbits,__x86.get_pc_thunk.ax,comdat
    .globl  __x86.get_pc_thunk.ax
    .hidden __x86.get_pc_thunk.ax
    .type   __x86.get_pc_thunk.ax, @function
__x86.get_pc_thunk.ax:
.LFB1:
    .cfi_startproc
    movl    (%esp), %eax
    ret
    .cfi_endproc
.LFE1:
    .ident  "GCC: (GNU) 9.2.0"
    .section    .note.GNU-stack,"",@progbits

Why does it use GOTOFF? Isn't the address of GOT already loaded in %eax? What is the difference between GOT and GOTOFF?

Solution

symbol@GOTOFF addresses the variable itself, relative to the GOT base (as a convenient but arbitrary choice of anchor). lea of that gives you symbol address, mov would give you data at the symbol. (The first few bytes of the string in this case.)

symbol@GOT gives you offset (within the GOT) of the GOT entry, for that symbol. A mov load from there gives you the address of the symbol. (GOT entries are filled in by the dynamic linker).

Why use the Global Offset Table for symbols defined in the shared library itself? has an example of accessing an extern variable that does result in getting its address from the GOT and then dereferencing that.

BTW, this is position-independent code. Your GCC is configured that way by default. If you used -fno-pie -no-pie to make a traditional position-dependent executable, you'd just get a normal efficient pushl $.LC0. (32-bit is missing RIP-relative addressing so it's quite inefficient.)

In a non-PIE (or in 64-bit PIE), the GOT barely gets used at all. The main executable defines space for symbols so it can access them without going through the GOT. libc code uses the GOT anyway (mostly because of symbol interposition in 64-bit code) so letting the main executable provide the symbol doesn't cost anything and makes the non-PIE executable faster.

We can get a non-PIE executable to use the GOT directly for shared library function addresses with -fno-plt, instead of calling into the PLT and having it use the GOT.

#include <stdio.h>
void foo() { putchar('\n'); }

gcc9.2 -O3 -m32 -fno-plt on Godbolt (-fno-pie is the default on the Godbolt compiler explorer, unlike your system.)

foo():
        sub     esp, 20                  # gcc loves to waste an extra 16 bytes of stack 
        push    DWORD PTR stdout         # [disp32] absolute address
        push    10
        call    [DWORD PTR _IO_putc@GOT]
        add     esp, 28
        ret

Both push and call have a memory operand using a 32-bit absolute address. push is loading the FILE* value of stdout from a known (link-time-constant) address. (There isn't a text relocation for it.)

call is loading the function pointer saved by the dynamic linker from the GOT. (And loading it directly into EIP.)