mold and lld not linking against libc correctly

I've been writing some x64 assembly on linux - exactly what it is is not relevant - and I've come across a strange problem. In my assembly code, I've declared printf as an external label, and call it using the x64 Linux C Calling Convention. The relevant bits look something like this:

extern printf

segment .rodata
    fmt db "%lld", 0x0a, 0x00

segment .text

    mov     rsi, rax ; i64 i want to print
    mov     rdi, fmt ; pointer to the format string
    call    printf

Assembled using nasm -f elf64 file.asm, I get a correct object file. Linking with GNU ld, ld -o file -lc file.o, I get an executable that runs correctly and produces the expected output. So far, so good.

The strange bit comes in when I attempt to do the same with mold and lld. For starters, neither knows where to find libc offhand. This is fine; I asked GCC where to find libc (gcc --print-file-name libc.so - or libc.a, my system has both and both in the same directory), the answer being /usr/lib. So I attempt to link my object file(s) again with mold and lld, like this.. mold/ld.lld -o file -L/usr/lib -lc file.o ..and they both link without any reported errors. But when I run the generated executables, they both segfault. I haven't investigated the LLVM ld version yet but I threw the mold version into gdb and discovered that the segfault occurs because code within the libc printf implementation performs a jump to 0x00...00.

My question is simple: what is going wrong, and how do I fix it? Both are highly reputable linkers, so I am certain that the problem is me, but what it is that I'm doing wrong is unclear to me. I've attempted to research this problem but, in my admittedly cursory search, I could find no instances of anyone else having a similar problem - or at least, any in which they sorted it out publicly. Are there some flags I'm missing? Is /usr/lib not the place to look? Any assistance would be appreciated.

Solution

Calling libc functions like printf from the ELF entry point (_start) without calling glibc init functions first only works in a dynamically linked executable; the dynamic linker calls libc's init hook functions so it can initialize itself before execution reaches your _start.

But if you link a static executable, then that can't happen before printf expects to find data structures like the stdout buffer already allocated / initialized.

This is why it's generally not recommended and considered a hack to call libc functions from _start instead of main. Some libc implementations don't need init functions to be called, e.g. MUSL doesn't, IIRC. But glibc does.

If you link a dynamic executable, you need to specify the right dynamic linker path because the default isn't useful on most modern systems. I'm surprised ld -o file -lc file.o worked on your system; on my x86-64 Arch GNU/Linux, GNU Binutils ld's default interpreter path of /lib/ld64.so.1 doesn't exist.

Use readelf -l ./file and look at the INTERP header. e.g. this is what I get from building with gcc -nostartfiles -no-pie -o foo foo.o to have it pass the right options to ld to make a dynamic executable that links -lc but not the CRT start files:

  INTERP         0x0000000000000238 0x0000000000400238 0x0000000000400238
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]

If you get "no such file or directory" when you try to run ./file, that's usually the problem when you've been using a linker manually. strace ./file will show the execve system call itself returning -ENOENT, but ls ./file and readelf -a ./file with the same path string can read it.

An ELF interpreter works kind of like the #!/bin/sh line at the top of an executable text file: the kernel parses that line and runs /bin/sh ./file. But for an ELF binary, the kernel also maps the executable into memory so the ELF interpreter doesn't have to do that with system-calls from user-space.

Possible ABI violations that would make `printf` segfault before printing

The x86-64 System V ABI (and Windows x64, BTW) requires RSP % 16 == 0 before a call to a function, thus guarantees that RSP % 16 == 8 on entry to a function (after a call has pushed a return address).

This lets functions use movaps to more efficiently copy locals around on the stack if they want. (Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?)

At the ELF entry point, RSP % 16 == 0 is guaranteed by the x86-64 SysV ABI; it's not a function. (RSP points at argc, not a return address). So if this was your entire actual code, RSP would be aligned correctly.

On calling a variadic function like printf, it's also required that AL >= the number of XMM args, but no more than 8.

Really old GCC used to make variadic functions do a computed jump to skip the exact number of movaps stores that dump XMM regs into an array where VA_ARG code can reference them, but modern GCC just uses test al,al / jz to skip all 8 or not. Back then it was important to strictly follow this part of the ABI, but you can be sloppy these days. (This answer shows both versions of the compiler-generated asm, in a question about it breaking with garbage left in AL so it's greater than 8.)

RAX will hold garbage on entry to _start, since the dynamic linker runs in your process before reaching that. Unless this is a static executable, in which case the ABI doesn't guarantee anything, but in practice Linux zeroes the registers to avoid leaking kernel info.

So modern builds of glibc will happen to work if AL is outside the 0..8 range, as long as it's non-zero if you'reprinting any FP args. Of course it's better to pass the actual number of FP args that are in XMM regs and follow the ABI, e.g. xor eax,eax or mov eax,3 or whatever.

Recent builds of glibc in practice do use movaps to the stack within printf, other than for dumping the XMM registers, so now you can't get away with violating that part of the ABI either, even with AL=0 for printing non-FP stuff. (Similarly, scanf compiles to code that happens to require correct stack alignment: glibc scanf Segmentation faults when called from a function that doesn't align RSP)

So this exact code can happen to work in a dynamic executable if you use the right linker options, crashing only from falling off the end without making an _exit system call. I tested it, and that's what happens.

(Of course, redirecting output to a file would leave it empty, because a full-buffered stdout won't get flushed before you segfault, since you don't call exit. And yes, you should call exit, not make a raw eax=231/syscall exit_group system call which would exit without calling libc atexit functions.)

But apparently this isn't your full code, so maybe you messed up RSP alignment before the call? But probably not, since you say your code worked when you linked it into a dynamic executable with ld. Or else your system is old so your glibc's printf happens not to require RSP alignment.

mold and lld not linking against libc correctly

Possible ABI violations that would make printf segfault before printing

Possible ABI violations that would make `printf` segfault before printing