I am starting with Assembly and, for testing, I've written a simple C program, compiled and disassembled it, in order to see how the parameters are passed. This is the C code:
#include <stdio.h>
#include <stdlib.h>
void calc (float*a,float*b,float*c,float*d) {
a[0]=1000;
b[0]=100.0;
c[0]=99.9;
d[0]=10000;
}
int main() {
float a[100];
float b[100];
float c[100];
float d[100];
calc(a,b,c,d);
}
And this is its disassembly:
default rel
global calc: function
global main: function
SECTION .text align=1 execute ; section number 1, code
calc: ; Function begin
push rbp ; 0000 _ 55
mov rbp, rsp ; 0001 _ 48: 89. E5
mov qword [rbp-8H], rdi ; 0004 _ 48: 89. 7D, F8
mov qword [rbp-10H], rsi ; 0008 _ 48: 89. 75, F0
mov qword [rbp-18H], rdx ; 000C _ 48: 89. 55, E8
mov qword [rbp-20H], rcx ; 0010 _ 48: 89. 4D, E0
; 0054 _ 90
pop rbp ; 0055 _ 5D
ret ; 0056 _ C3
; calc End of function
main: ; Function begin
push rbp ; 0057 _ 55
mov rbp, rsp ; 0058 _ 48: 89. E5
sub rsp, 1600 ; 005B _ 48: 81. EC, 00000640
lea rcx, [rbp-640H] ; 0062 _ 48: 8D. 8D, FFFFF9C0
lea rdx, [rbp-4B0H] ; 0069 _ 48: 8D. 95, FFFFFB50
lea rsi, [rbp-320H] ; 0070 _ 48: 8D. B5, FFFFFCE0
lea rax, [rbp-190H] ; 0077 _ 48: 8D. 85, FFFFFE70
mov rdi, rax ; 007E _ 48: 89. C7
call calc ; 0081 _ E8, 00000000(rel)
mov eax, 0 ; 0086 _ B8, 00000000
leave ; 008B _ C9
ret ; 008C _ C3
; main End of function
I don't understand why the parameters on the stack are of differing size. The first one is in [ebp-8H]
, which is understandable, as it is a 64-bit address, but the next one is only two bytes further, in [ebp-10H]
instead of [ebp-16H]
.
Why is that, and, most importantly, when I write an assembler procedure that takes these exact parameters, which addresses should I use from ebp
?
It seems like I say this a lot, but presumably you haven't heard it yet, so it bears repeating: analyzing the disassembly of unoptimized code is largely a waste of time. When optimizations are disabled, compilers are focused on two things:
Unoptimized code is messy, ugly, and confusing. It contains a lot of redundant instructions, doesn't look like what a human being would write, and doesn't match the code found in real-world applications (which are compiled with optimizations enabled).
When you want to analyze assembly code, turn on the optimizer.
When we do so, we see that your code compiles to:
calc(float*, float*, float*, float*):
mov DWORD PTR [rdi], 0x447a0000
mov DWORD PTR [rsi], 0x42c80000
mov DWORD PTR [rdx], 0x42c7cccd
mov DWORD PTR [rcx], 0x461c4000
ret
main:
xor eax, eax
ret
Wait, what happened? Well, the optimizer saw that main
doesn't do anything except return 0 (implicitly; not even represented in your code), so it transformed the entire function to simply an instruction that clears the EAX
register and then returns.
From this, though, we can tell that a function's result is returned in EAX
. This is true in the System V AMD64 calling convention common on Unix systems, and it's also true in the 64-bit calling convention used on Windows, and it's even true across all 32-bit x86 calling conventions that you'll find in the wild. (32-bit results are returned in EAX
; 64-bit results are returned in EDX:EAX
, where the high-order bits are in EDX
and the low-order bits are in EAX
.)
We can also tell by looking at the calc
function's disassembly how it receives its parameters. The first integer parameter is passed in RDI
, the second in RSI
, the third in RDX
, and the fourth in RCX
. As per the System V AMD64 calling convention, if there was a fifth parameter, it would be passed in R8
, and the sixth parameter would be passed in R9
.
In other words, up to the first six integer parameters are passed in registers. After that, any additional integer parameters are passed on the stack.
Floating-point parameters are passed in XMM registers (XMM0
through XMM7
), to facilitate use of SSE instructions. Again, any additional floating-point parameters are passed on the stack.
You try to make a distinction in the comments between "integer parameters" and "memory parameters", but there is no such thing as the latter. When you pass pointers (or references in C++, which compilers implement in terms of pointers), you are actually passing addresses. Since addresses are just integers, they are passed just like any other integer value—in registers.
If you pass parameters on the stack, they are all 8 bytes (64 bits) in size, and come one after the other. The first one is at an offset of 8 from the stack pointer, RBP
. The second one would be at an offset of 16, etc. There seemed to be a bit of confusion when you were looking at the code in the question that arose from the offsets being represented in hexadecimal, where 10h
is equivalent to 16 in decimal, and 18h
is equivalent to 24 in decimal. (Why does the first parameter start at an offset of 8? Because the first position, RBP+0
, is occupied by the return pointer.)
This basically covers the fundamentals of the calling convention. But frankly, analyzing disassembly is not a very good way of learning calling conventions. There are a lot more details that you won't necessarily see, and you won't get the big picture view. You really need to read the fine manual. If you hate manuals, there are more concise (and more simplified) summaries available various places online, e.g., Wikipedia.