Search code examples
macosassemblystack-memoryabientry-point

How is MacOS stack initialized at the start of the process?


Out of curiosity how MacOS prepares its stack, I wrote an (x86_64) assembly program to print the top of the stack to stdout right when a process gets started:

global start
start:                      ; entry point of the binary, called by the loader
    push    rsp             ; push the stack pointer to stack so that we'll se that too
    mov     rdi, 1          ; file to write to: file descriptor 1 (STDOUT)
    lea     rsi, [rsp]      ; source of the write: stack
    mov     rdx, 64         ; number of bytes to write: 64 (8 x 64-bit integers)
    mov     rax, 0x02000004 ; MacOS syscall number for write
    syscall
    mov     rsi, [rsp+16]   ; smoke test: argv contents
    mov     rdx, 16         ; we expect the argv[0] ("./inspect_stack\0") to be 16 bytes long
    mov     rax, 0x02000004
    syscall
    mov     rsi, [rsp+32]   ; another smoke test: envp???
    mov     rdx, 11
    mov     rax, 0x02000004
    syscall
    mov     rax, 0x02000001 ; MacOS syscall number for exit
    syscall

Running this program and inspecting the output: nasm -f macho64 inspect_stack.asm && ld inspect_stack.o -static -o inspect_stack && ./inspect_stack | xxd -e -g 8 -c 8

I see something like this: (added some comments of my own)

00000000: 00007ff7bfeff6b0  ........  # this is the stack pointer we pushed
00000008: 0000000000000001  ........  # argc
00000010: 00007ff7bfeff880  ........  # argv; see the smoke test result
00000018: 0000000000000000  ........  # a null pointer???
00000020: 00007ff7bfeff890  ........  # are these part of envp?
00000028: 00007ff7bfeff89f  ........  # ...seems like an array of pointers stored inline?
00000030: 00007ff7bfeff8dc  ........  # ...and they seem to point at a continuous buffer
00000038: 00007ff7bfeff8ed  ........
00000040: 636570736e692f2e  ./inspec  # the result of the 1st smoke test. yes, argv[0]!
00000048: 006b636174735f74  t_stack.
00000050: 6573552f3d445750  PWD=/Use  # the result of the 2nd smoke test... seems like envp?
00000058:           2f7372  rs/

So, I had an understanding that there would be a 64-bit integer (argc) and two pointers (to argv and to envp) stored to the stack at the start of the program. However, this doesn't seem to be true, or then the envp pointer is null for some reason. However, we can see that the envp array, stored inline, seemingly starts after the null. What's the actual layout of the stack when the process starts?


Solution

  • Inspecting a bit more, and adding more arguments, I noticed that my understanding that there would be two pointers to argv and envp at the top of the stack, was mistaken. Instead, argv and envp are stored inline, as arrays of pointers to the associated strings. Both arrays are null-terminated, so the null value I was seeing was actually the terminator of argv. Adding more arguments makes this a lot clearer: nasm -f macho64 inspect_stack.asm && ld inspect_stack.o -static -o inspect_stack && ./inspect_stack first second | xxd -e -g 8 -c 8

    00000000: 00007ff7bfeff698  ........
    00000008: 0000000000000003  ........  # argc
    00000010: 00007ff7bfeff878  x.......  # argv[0]
    00000018: 00007ff7bfeff888  ........  # argv[1]
    00000020: 00007ff7bfeff88e  ........  # argv[2]
    00000028: 0000000000000000  ........  # argv end
    00000030: 00007ff7bfeff895  ........  # envp[0]
    00000038: 00007ff7bfeff8a4  ........  # envp[1] and so on
    00000040: 636570736e692f2e  ./inspec
    00000048: 006b636174735f74  t_stack.
    00000050: 5000646e6f636573  second.P  # the second smoke test now sees argv[2]!
    00000058:           3d4457  WD=       # seems that the envp strings are located right after argc strings
    

    TL;DR: I thought that the second and third 64-bit values in the stack were char **argv and char **envp. Instead, they were argv[0] and argv[1]. Now, to get char **argv that C main would expect I could take [rsp + 8] (8 bytes for skipping argc), and to get char **envp I could mov rax, [rsp] and then take [rsp + 8 + rax*8 + 8] (8 bytes for skipping argc, then skipping argc number of pointers, and finally another 8 bytes for skipping the null terminator).