c assembly arguments cpu-registers calling-convention

How do registers work as arguments in assembly?

I am trying to understand how assembly works with arguments and return values.

So far, I have learnt that %eax is is the return value and to load a single argument, I need to load the effective address of %rip + offset into %rid by using leaq var(%rip), %rdi .

To learn more about arguments, I created a c program that takes in 10 (11 arguments including the formatting string) to try and find out the order of registers. I then converted the C code into assembly using gcc on my Mac.

Here is the C code I used:

#include <stdio.h>

int main(){
  printf("%s %s %s %s %s %s %s %s %s %s", "1 ", "2", "3", "4", "5", "6", "7", "8", "9", "10");
  return 0;
}

And hear is the assembly output:

.section  __TEXT,__text,regular,pure_instructions
  .macosx_version_min 10, 13
  .globl  _main                   ## -- Begin function main
  .p2align  4, 0x90
_main:                                  ## @main
  .cfi_startproc
## %bb.0:
  pushq %rbp
  .cfi_def_cfa_offset 16
  .cfi_offset %rbp, -16
  movq  %rsp, %rbp
  .cfi_def_cfa_register %rbp
  pushq %r15
  pushq %r14
  pushq %rbx
  pushq %rax
  .cfi_offset %rbx, -40
  .cfi_offset %r14, -32
  .cfi_offset %r15, -24
  subq  $8, %rsp
  leaq  L_.str.10(%rip), %r10
  leaq  L_.str.9(%rip), %r11
  leaq  L_.str.8(%rip), %r14
  leaq  L_.str.7(%rip), %r15
  leaq  L_.str.6(%rip), %rbx
  leaq  L_.str(%rip), %rdi
  leaq  L_.str.1(%rip), %rsi
  leaq  L_.str.2(%rip), %rdx
  leaq  L_.str.3(%rip), %rcx
  leaq  L_.str.4(%rip), %r8
  leaq  L_.str.5(%rip), %r9
  movl  $0, %eax
  pushq %r10
  pushq %r11
  pushq %r14
  pushq %r15
  pushq %rbx
  callq _printf
  addq  $48, %rsp
  xorl  %eax, %eax
  addq  $8, %rsp
  popq  %rbx
  popq  %r14
  popq  %r15
  popq  %rbp
  retq
  .cfi_endproc
                                        ## -- End function
  .section  __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
  .asciz  "%s %s %s %s %s %s %s %s %s %s"

L_.str.1:                               ## @.str.1
  .asciz  "1 "

L_.str.2:                               ## @.str.2
  .asciz  "2"

L_.str.3:                               ## @.str.3
  .asciz  "3"

L_.str.4:                               ## @.str.4
  .asciz  "4"

L_.str.5:                               ## @.str.5
  .asciz  "5"

L_.str.6:                               ## @.str.6
  .asciz  "6"

L_.str.7:                               ## @.str.7
  .asciz  "7"


L_.str.8:                               ## @.str.8
  .asciz  "8"

L_.str.9:                               ## @.str.9
  .asciz  "9"

L_.str.10:                              ## @.str.10
  .asciz  "10"


.subsections_via_symbols

After that, I then cleared the code up which removes some macOS only settings? The code still works.

.text
  .globl  _main                   ## -- Begin function main
_main:                                  ## @main
  pushq %rbp
  movq  %rsp, %rbp
  pushq %r15
  pushq %r14
  pushq %rbx
  pushq %rax
  subq  $8, %rsp
  leaq  L_.str.10(%rip), %r10
  leaq  L_.str.9(%rip), %r11
  leaq  L_.str.8(%rip), %r14
  leaq  L_.str.7(%rip), %r15
  leaq  L_.str.6(%rip), %rbx
  leaq  L_.str(%rip), %rdi
  leaq  L_.str.1(%rip), %rsi
  leaq  L_.str.2(%rip), %rdx
  leaq  L_.str.3(%rip), %rcx
  leaq  L_.str.4(%rip), %r8
  leaq  L_.str.5(%rip), %r9
  movl  $0, %eax
  pushq %r10
  pushq %r11
  pushq %r14
  pushq %r15
  pushq %rbx
  callq _printf
  addq  $48, %rsp
  xorl  %eax, %eax
  addq  $8, %rsp
  popq  %rbx
  popq  %r14
  popq  %r15
  popq  %rbp
  retq

.data
L_.str:                                 ## @.str
  .asciz  "%s %s %s %s %s %s %s %s %s %s"

L_.str.1:                               ## @.str.1
  .asciz  "1 "

L_.str.2:                               ## @.str.2
  .asciz  "2"

L_.str.3:                               ## @.str.3
  .asciz  "3"

L_.str.4:                               ## @.str.4
  .asciz  "4"

L_.str.5:                               ## @.str.5
  .asciz  "5"

L_.str.6:                               ## @.str.6
  .asciz  "6"

L_.str.7:                               ## @.str.7
  .asciz  "7"

L_.str.8:                               ## @.str.8
  .asciz  "8"

L_.str.9:                               ## @.str.9
  .asciz  "9"

L_.str.10:                              ## @.str.10
  .asciz  "10"

I understand that at the beginning of the code, that the base pointer is pushed onto the stack which is then copied into the stack pointer for later use.

The leaq is then loading each string into each register that will be used as an argument to printf.

What I want to know is why are registers r10 r11 r14 and r15 before the first argument is loaded into memory and that registers rsi rdx rcx r8 and 'r9' loaded into memory after the first argument? Also why are r14 and r15 used instead of r12 and r13?

Also why is 8 added and subtracted from the stack pointer in this case and does it matter which order the registers are pushed and popped?

I hope all the subquestions are related to this question, if not let me know. Also car me up on any knowledge I may be getting wrong. This is what I have learnt by converting c to assembly.

Solution

First, it looks like you are using unoptimized code so things are taking place that do not need to.

Look at the register state right before the call to printf that are not pushed on the stack:

rdi = format string
rsi = 1
rdx = 2
rcx = 3
r8 = 4
r9 = 5

Then 6 .. 10 are pushed on the stack in reverse order.

That should give you an idea of the calling convention. The first six parameters go through registers. The remaining parameters get passed on the stack.

What I want to know is why are registers r10 r11 r14 and r15 before the first argument is loaded into memory and that registers rsi rdx rcx r8 and 'r9' loaded into memory after the first argument?

That's just the order the compiler chose.

Also why are r14 and r15 used instead of r12 and r13?

Again, that's what the compiler chose. Not these are just being used a scratch locations. If the code were optimized, it is likely fewer registers would be used.

Also why is 8 added and subtracted from the stack pointer in this case and does it matter which order the registers are pushed and popped?

It could just be some boiler plate function code the compiler generates.