c assembly parameters x86-64 calling-convention

x86-64 order of passing parameters in registers

I'm curious about the parameter passing procedure in the x86-64 environment and therefore I wrote a snippet of code.

//a.c
extern int shared;
int main(){
    int a=100;
    swap(&a, &shared);
}
//b.c
int shared=1;
void swap(int* a, int* b){
    *a ^= *b ^= *a ^= *b;
}

I compile two files using the following commands: gcc -c -fno-stack-protector a.c b.c Then I objdump -d a.o to check a.o's disassembly code.

Disassembly of section .text:

0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 10             sub    $0x10,%rsp
   8:   c7 45 fc 64 00 00 00    movl   $0x64,-0x4(%rbp)
   f:   48 8d 45 fc             lea    -0x4(%rbp),%rax
  13:   be 00 00 00 00          mov    $0x0,%esi
  18:   48 89 c7                mov    %rax,%rdi
  1b:   b8 00 00 00 00          mov    $0x0,%eax
  20:   e8 00 00 00 00          callq  25 <main+0x25>
  25:   b8 00 00 00 00          mov    $0x0,%eax
  2a:   c9                      leaveq 
  2b:   c3                      retq

Due to my working environment is Ubuntu 16.04 x86-64, I found it hard to understand the order of passing parameter.

In my point of view, the default call convention is fastcall here and therefore the parameters are passed from right to left.

I know from the x86-64 System V ABI manual that rdi and rsi are used for passing the first two parameters

However, according to the disassembly code, rdi is responsible for var a, which is the param on the left, meaning it should be the second param.

Could someone help me point out my mistake?

Solution

Args are numbered from left to right (credit to @R. for spotting that this was your actual confusion; I thought you were talking about the order of asm instructions, and missed the last paragraph of the question.)

Looks normal to me. When the call swap instruction runs,

rdi holds a pointer to a (a local on the stack), set up by
lea -0x4(%rbp),%rax and mov %rax,%rdi.

(instead of just lea into rdi, because you didn't enable optimization.)
rsi holds a pointer to shared, set up by mov $shared,%esi
al holds 0 because you didn't define or prototype the function before calling it. (gcc should have warned you about this even without -Wall)

The disassembly of the .o shows $shared as 0 because it's not linked yet, so it's a placeholder (and 0 offset) from the symbol. Use objdump -drwC to see relocation symbols. (I like -Mintel as well, instead of AT&T syntax.)

Also easier to look at would be the compiler's asm output, where you would see $shared instead of a number and symbol reference. See How to remove "noise" from GCC/clang assembly output?.

It doesn't matter what order registers are written, only their value on entry to the called function.

Same for stack args: if a compiler chooses to use mov to write args to the stack, it can do it in any order.

Only if you choose to use push do you have to go from right to left to leave the first (left-most) arg at the lowest address, as required by all the mainstream C calling conventions for args that aren't passed in registers (if any).

This right-to-left order might be why gcc -O0 (no optimization, plus anti-optimization for debugging) chooses to set registers in that order, even though it doesn't matter.

And BTW, xor-swap is useless and pointless even if implemented correctly without UB. (Are there sequence points in the expression a^=b^=a^=b, or is it undefined?).

if(a==b) { *a = *b = 0; } else { int tmp = *a; *a=*b; *b=tmp; } is a more efficient swap that preserves the behaviour of a safe xor-swap of zeroing if both pointers are to the same object. I assume you want that? Why else would you use an xor-swap?

The compiler-generated asm for either will basically suck if you don't enable optimization, just like the code for main sucks for the same reason. And if you do, swap can often inline and be zero instructions, or just cost up to 3 mov instructions between registers; sometimes less. (The compiler can just change its register allocation and decide that a and b are in opposite registers now.)