Search code examples

Can I do `ret` instruction from code at _start in MacOS? Linux?

I am wondering if it is legal to return with ret from a program's entry point.

Example with NASM:

section .text
global _start

; Linux: nasm -f elf64 foo.asm -o foo.o && ld foo.o
; OS X:  nasm -f macho64 foo.asm -o foo.o && ld foo.o -lc -macosx_version_min 10.12.0 -e _start -o foo

ret pops a return address from the stack and jumps to it.

But are the top bytes of the stack a valid return address at the program entry point, or do I have to call exit?

Also, the program above does not segfault on OS X. Where does it return to?


  • MacOS Dynamic Executables

    When you are using MacOS and link with:

    ld foo.o -lc -macosx_version_min 10.12.0 -e _start -o foo

    you are getting a dynamically loaded version of your code. _start isn't the true entry point, the dynamic loader is. The dynamic loader as one of its last steps does C/C++/Objective-C runtime initialization, and then calls your specified entry point specified with the -e option. The Apple documentation about Forking and Executing the Process has these paragraphs:

    A Mach-O executable file contains a header consisting of a set of load commands. For programs that use shared libraries or frameworks, one of these commands specifies the location of the linker to be used to load the program. If you use Xcode, this is always /usr/lib/dyld, the standard OS X dynamic linker.

    When you call the execve routine, the kernel first loads the specified program file and examines the mach_header structure at the start of the file. The kernel verifies that the file appear to be a valid Mach-O file and interprets the load commands stored in the header. The kernel then loads the dynamic linker specified by the load commands into memory and executes the dynamic linker on the program file.

    The dynamic linker loads all the shared libraries that the main program links against (the dependent libraries) and binds enough of the symbols to start the program. It then calls the entry point function. At build time, the static linker adds the standard entry point function to the main executable file from the object file /usr/lib/crt1.o. This function sets up the runtime environment state for the kernel and calls static initializers for C++ objects, initializes the Objective-C runtime, and then calls the program’s main function

    In your case that is _start. In this environment where you are creating a dynamically linked executable you can do a ret and have it return back to the code that called _start which does an exit system call for you. This is why it doesn't crash. If you review the generated object file with gobjdump -Dx foo you should get:

    start address 0x0000000000000000
    Idx Name          Size      VMA               LMA               File off  Algn
      0 .text         00000001  0000000000001fff  0000000000001fff  00000fff  2**0
                      CONTENTS, ALLOC, LOAD, CODE
    0000000000001000 g       03 ABS    01 0010 __mh_execute_header
    0000000000001fff g       0f SECT   01 0000 [.text] _start
    0000000000000000 g       01 UND    00 0100 dyld_stub_binder
    Disassembly of section .text:
    0000000000001fff <_start>:
        1fff:       c3                      retq

    Notice that start address is 0. And the code at 0 is dyld_stub_binder. This is the dynamic loader stub that eventually sets up a C runtime environment and then calls your entry point _start. If you don't override the entry point it defaults to main.

    MacOS Static Executables

    If however you build as a static executable, there is no code executed before your entry point and ret should crash since there is no valid return address on the stack. In the documentation quoted above is this:

    For programs that use shared libraries or frameworks, one of these commands specifies the location of the linker to be used to load the program.

    A statically built executable doesn't use the dynamic loader dyld with crt1.o embedded in it. CRT = C runtime library which covers C++/Objective-C as well on MacOS. The processes of dealing with dynamic loading are not done, C/C++/Objective-C initialization code is not executed, and control is transferred directly to your entry point.

    To build statically drop the -lc (or -lSystem) from the linker command and add -static option:

    ld foo.o -macosx_version_min 10.12.0 -e _start -o foo -static

    If you run this version it should produce a segmentation fault. gobjdump -Dx foo produces

    start address 0x0000000000001fff
    Idx Name          Size      VMA               LMA               File off  Algn
      0 .text         00000001  0000000000001fff  0000000000001fff  00000fff  2**0
                      CONTENTS, ALLOC, LOAD, CODE
      1 LC_THREAD.x86_THREAD_STATE64.0 000000a8  0000000000000000  0000000000000000  00000198  2**0
    0000000000001000 g       03 ABS    01 0010 __mh_execute_header
    0000000000001fff g       0f SECT   01 0000 [.text] _start
    Disassembly of section .text:
    0000000000001fff <_start>:
        1fff:       c3                      retq

    You should notice start_address is now 0x1fff. 0x1fff is the entry point you specified (_start). There is no dynamic loader stub as an intermediary.


    Under Linux when you specify your own entry point it will segmentation fault whether you are building as a static or shared executable. There is good information on how ELF executables are run on Linux in this article and the dynamic linker documentation. The key point that should be observed is that the Linux one makes no mention of doing C/C++/Objective-C runtime initialisation unlike the MacOS dynamic linker documentation.

    The key difference between the Linux dynamic loader ( and the MacOS one (dynld) is that the MacOS dynamic loader performs C/C++/Objective-C startup initialization by including the entry point from crt1.o. The code in crt1.o then transfers control to the entry point you specified with -e (default is main). In Linux the dynamic loader makes no assumption about the type of code that will be run. After the shared objects are processed and initialized control is transferred directly to the entry point.

    Stack Layout at Process Creation

    FreeBSD (on which MacOS is based) and Linux share one thing in common. When loading 64-bit executables the layout of the user stack when a process is created is the same. The stack for 32-bit processes is similar but pointers and data are 4 bytes wide, not 8.

    enter image description here

    Although there isn't a return address on the stack, there is other data representing the number of arguments, the arguments, environment variables, and other information. This layout is not the same as what the main function in C/C++ expects. It is part of the C startup code to convert the stack at process creation to something compatible with the C calling convention and the expectations of the function main (argc, argv, envp).

    I wrote more information on this subject in this Stackoverflow answer that shows how a statically linked MacOS executable can traverse through the program arguments passed by the kernel at process creation.