Cycle Through and Print argv[] in x64 ASM

I have been working on essentially a while loop to go through all CLI arguments. While working on solution to only print 1 element I noticed a few things; this was the thought process that led me to here.

I noticed that if I did lea 16(%rsp), %someRegisterToWrite, I was able to get/print argv[1]. Next I tried lea 24(%rsp), %someRTW and this gave me access to argv[2]. I kept going up to see if it would continue to work and it did.

My thought was to keep adding 8 to %someRTW and increment a "counter" until the counter was equal to argc. This following code works great when a single argument is entered but prints nothing with 2 arguments and when I enter 3 arguments, it will print the first 2 with no space in between.

.section __DATA,__data
.section __TEXT,__text
.globl _main
_main:
    lea (%rsp), %rbx        #argc
    lea 16(%rsp), %rcx      #argv[1]
    mov $0x2, %r14          #counter
    L1:
    mov (%rcx), %rsi        #%rsi = user_addr_t cbuf
    mov (%rcx), %r10
    mov 16(%rcx), %r11      
    sub %r10, %r11          #Get number of bytes until next arg
    mov $0x2000004, %eax    #4 = write
    mov $1, %edi            #edi = file descriptor 
    mov %r11, %rdx          #user_size_t nbyte
    syscall
    cmp (%rbx), %r14        #if counter < argc
    jb L2
    jge L3
    L2:
    inc %r14                
    mov 8(%rcx), %rcx       #mov 24(%rsp) back into %rcx
    mov $0x2000004, %eax
    mov $0x20, %rsi         #0x20 = space
    mov $2, %rdx
    syscall
    jmp L1
    L3:
    xor %rax, %rax
    xor %edi, %edi
    mov $0x2000001, %eax
    syscall

Solution

I am going to assume that on 64-bit OS/X you are assembling and linking in such away that you intentionally want to bypass the C runtime code. One example would be to do a static build without the C runtime startup files and the System library, and that you are specifying that _main is your program entry point. _start is generally the process entry point unless overridden.

In this scenario the 64-bit kernel will load the macho64 program into memory and set up the process stack with the program arguments, and environment variables among other things. Apple OS/X process stack state at startup is the same as what is documented in the System V x86-64 ABI in Section 3.4:

One observation is that the list of argument pointers is terminated with a NULL(0) address. You can use this to loop through all parameters until you find the NULL(0) address as an alternative to relying on the value in argc.

The Problems

One problem is that your code assumes that registers are all preserved across a SYSCALL. The SYSCALL instruction itself will destroy the contents of RCX and R11:

SYSCALL invokes an OS system-call handler at privilege level 0. It does so by loading RIP from the IA32_LSTAR MSR (after saving the address of the instruction following SYSCALL into RCX). (The WRMSR instruction ensures that the IA32_LSTAR MSR always contain a canonical address.)

SYSCALL also saves RFLAGS into R11 and then masks RFLAGS using the IA32_FMASK MSR (MSR address C0000084H); specifically, the processor clears in RFLAGS every bit corresponding to a bit that is set in the IA32_FMASK MSR

One way to avoid this is to try and use registers other than RCX and R11. Otherwise you will have to save/restore them across a SYSCALL if you need their values to be untouched. The kernel will also clobber RAX with a return value.

A list of the Apple OS/X system calls provides the details of all the available kernel functions. In 64-bit OS/X code each of the system call numbers has 0x2000000 added to it:

In 64-bit systems, Mach system calls are positive, but are prefixed with 0x2000000 — which clearly separates and disambiguates them from the POSIX calls, which are prefixed with 0x1000000

Your method to compute the length of a command line argument will not work. The address of one argument doesn't necessarily have to be placed in memory after the previous one. The proper way is to write code that starts at the beginning of the argument you are interested in and searches for a NUL(0) terminating character.

This code to print a space or separator character won't work:

mov 8(%rcx), %rcx       #mov 24(%rsp) back into %rcx
mov $0x2000004, %eax
mov $0x20, %rsi         #0x20 = space
mov $2, %rdx
syscall

When using the sys_write system call the RSI register is a pointer to a character buffer. You can't pass an immediate value like 0x20 (space). You need to put the space or some other separator (like a new line) into a buffer and pass that buffer through RSI.

Revised Code

This code takes some of the ideas in the previous information and additional cleanup, and writes each of the command line parameters (excluding the program name) to standard output. Each will be separated by a newline. Newline on Darwin OS/X is 0x0a (\n).

# In 64-bit OSX syscall numbers = 0x2000000+(32-bit syscall #)
SYS_EXIT  = 0x2000001
SYS_WRITE = 0x2000004

STDOUT    = 1

.section __DATA, __const
newline: .ascii "\n"
newline_end: NEWLINE_LEN = newline_end-newline

.section __TEXT, __text
.globl _main
_main:
    mov (%rsp), %r8             # 0(%rsp) = # args. This code doesn't use it
                                #    Only save it to R8 as an example.
    lea 16(%rsp), %rbx          # 8(%rsp)=pointer to prog name
                                # 16(%rsp)=pointer to 1st parameter
.argloop:
    mov (%rbx), %rsi            # Get current cmd line parameter pointer
    test %rsi, %rsi
    jz .exit                    # If it's zero we are finished

    # Compute length of current cmd line parameter
    # Starting at the address in RSI (current parameter) search until
    # we find a NUL(0) terminating character.
    # rdx = length not including terminating NUL character

    xor %edx, %edx              # RDX = character index = 0
    mov %edx, %eax              # RAX = terminating character NUL(0) to look for
.strlenloop:
         inc %rdx               # advance to next character index
         cmpb %al, -1(%rsi,%rdx)# Is character at previous char index
                                #     a NUL(0) character?
         jne .strlenloop        # If it isn't a NUL(0) char then loop again
    dec %rdx                    # We don't want strlen to include NUL(0)

    # Display the cmd line argument
    # sys_write requires:
    #    rdi = output device number
    #    rsi = pointer to string (command line argument)
    #    rdx = length
    #
    mov $STDOUT, %edi
    mov $SYS_WRITE, %eax
    syscall

    # display a new line
    mov $NEWLINE_LEN, %edx
    lea newline(%rip), %rsi     # We use RIP addressing for the
                                #     string address
    mov $SYS_WRITE, %eax
    syscall

    add $8, %rbx                # Go to next cmd line argument pointer
                                #     In 64-bit pointers are 8 bytes
    # lea 8(%rbx), %rbx         # This LEA instruction can replace the
                                #     ADD since we don't care about the flags
                                #     rbx = 8 + rbx (flags unaltered)
    jmp .argloop

.exit:
    # Exit the program
    # sys_exit requires:
    #    rdi = return value
    #
    xor %edi, %edi
    mov $SYS_EXIT, %eax
    syscall

If you intend to use code like strlen in various places then I recommend creating a function that performs that operation. I have hard coded strlen into the code for simplicity. If you are looking to improve on the efficiency of your strlen implementation then a good place to start would be Agner Fog's Optimizing subroutines in assembly language.

This code should compile and link to a static executable without C runtime using:

gcc -e _main progargs.s -o progargs -nostartfiles -static