Search code examples
assemblylinkerldgnu-assemblerobject-files

If an object file defines _start and doesn't use any libraries, why do I still need to link it before I can execute it?


I have a hello world program:

.global _start

.text

_start:
    # write (1, msj, 13)
    mov $1, %rax            # system call 1 is write
    mov $1, %rdi            # file handler 1 is stdout
    mov $message, %rsi      # address of string to output
    mov $13, %rdx           # number of bytes
    syscall

    # exit(0)
    mov $60, %rax           # system call 60 is exit
    xor %rdi, %rdi          # we want to return code 0
    syscall

message:
    .ascii "Hello, world\n"

I can assemble this into an object file with:

as hello.s -o hello.o

This object file is not executable. When I try to execute it, I get:

bash: ./hello.o: cannot execute binary file: Exec format error

I need to invoke the linker to make this viable:

ld hello.o -o hello

At this point, the hello program works. However, the use of the linker here is confusing to me.... I'm not linking in any external libraries! I seem to just be linking the object file to nothing.

What is the linker doing for such a "self-contained" program?


Solution

  • ELF files have different types, like ELFTYPE_EXEC (traditional non-PIE executable) or ELFTYPE_REL (relocatable object file, normally with a .o filename).

    as doesn't have a special-case mode that outputs an executable instead of an object file. There are other assemblers, or at least one: FASM, that do have a special mode to output an ELF executable directly.

    Given the ELF object file that as produces, you could:

    • link it into a simple static executable like you're doing
    • link it into a PIE executable
    • link it into a dynamic executable, possibly even one that links some .so shared libraries; those could have static constructors (init functions) that run before your _start. (For example glibc's libc.so does this, which is why it happens to work to call libc functions from _start on Linux without manually calling glibc init functions, if you dynamically link.)

    The .o needs to be linked because no absolute address has been chosen for it to be loaded at, to fill in things like your 64-bit absolute immediate in mov $message, %rsi.

    (If you'd use lea message(%rip), %rsi the code would be position-independent but the distance between the .text and .rodata sections wouldn't be known yet. Although you put your string right in .text so that would get resolved at assemble time if you hadn't chosen the least efficient way to get an address into a register, so that would give you a stand-alone block of code+data. But the most efficient way, mov $message, %esi, would also need an absolute (32-bit) address.)

    as doesn't know what you want to do, and GNU Binutils was primarily written for use by compiler back-ends, so there was no point making as more complicated to be able to write an ELF-type EXEC file directly since that's what ld is for. This is the Unix philosophy of making small separate tools that do one thing well.

    If you want to assemble + link with one command, make a shell script, or use a compiler front-end:

    gcc -nostdlib -static -no-pie start.s -o static_executable