Position of Memory parameters on the Stack x86_64 gcc

I am starting with Assembly and, for testing, I've written a simple C program, compiled and disassembled it, in order to see how the parameters are passed. This is the C code:

#include <stdio.h>
#include <stdlib.h>

void calc (float*a,float*b,float*c,float*d) {
    a[0]=1000;
    b[0]=100.0;
    c[0]=99.9;
    d[0]=10000;
}


int main() {
    float a[100];
    float b[100];
    float c[100];
    float d[100];
    calc(a,b,c,d);
}

And this is its disassembly:

default rel

global calc: function
global main: function


SECTION .text   align=1 execute                         ; section number 1, code

calc:   ; Function begin
        push    rbp                                     ; 0000 _ 55
        mov     rbp, rsp                                ; 0001 _ 48: 89. E5
        mov     qword [rbp-8H], rdi                     ; 0004 _ 48: 89. 7D, F8
        mov     qword [rbp-10H], rsi                    ; 0008 _ 48: 89. 75, F0
        mov     qword [rbp-18H], rdx                    ; 000C _ 48: 89. 55, E8
        mov     qword [rbp-20H], rcx                    ; 0010 _ 48: 89. 4D, E0
                   ; 0054 _ 90
        pop     rbp                                     ; 0055 _ 5D
        ret                                             ; 0056 _ C3
; calc End of function

main:   ; Function begin
        push    rbp                                     ; 0057 _ 55
        mov     rbp, rsp                                ; 0058 _ 48: 89. E5
        sub     rsp, 1600                               ; 005B _ 48: 81. EC, 00000640
        lea     rcx, [rbp-640H]                         ; 0062 _ 48: 8D. 8D, FFFFF9C0
        lea     rdx, [rbp-4B0H]                         ; 0069 _ 48: 8D. 95, FFFFFB50
        lea     rsi, [rbp-320H]                         ; 0070 _ 48: 8D. B5, FFFFFCE0
        lea     rax, [rbp-190H]                         ; 0077 _ 48: 8D. 85, FFFFFE70
        mov     rdi, rax                                ; 007E _ 48: 89. C7
        call    calc                                    ; 0081 _ E8, 00000000(rel)
        mov     eax, 0                                  ; 0086 _ B8, 00000000
        leave                                           ; 008B _ C9
        ret                                             ; 008C _ C3
; main End of function

I don't understand why the parameters on the stack are of differing size. The first one is in [ebp-8H], which is understandable, as it is a 64-bit address, but the next one is only two bytes further, in [ebp-10H] instead of [ebp-16H].
Why is that, and, most importantly, when I write an assembler procedure that takes these exact parameters, which addresses should I use from ebp?

Solution

It seems like I say this a lot, but presumably you haven't heard it yet, so it bears repeating: analyzing the disassembly of unoptimized code is largely a waste of time. When optimizations are disabled, compilers are focused on two things:

Generating code as quickly as possible, so you get the fastest possible compiles, and
Making it easy for you to debug the code (for example, by ensuring that you can set a breakpoint on each high-level language statement, and by not reordering instructions to allow you to step through the code).

Unoptimized code is messy, ugly, and confusing. It contains a lot of redundant instructions, doesn't look like what a human being would write, and doesn't match the code found in real-world applications (which are compiled with optimizations enabled).

When you want to analyze assembly code, turn on the optimizer.

When we do so, we see that your code compiles to:

calc(float*, float*, float*, float*):
    mov     DWORD PTR [rdi], 0x447a0000
    mov     DWORD PTR [rsi], 0x42c80000
    mov     DWORD PTR [rdx], 0x42c7cccd
    mov     DWORD PTR [rcx], 0x461c4000
    ret

main:
    xor     eax, eax
    ret

Wait, what happened? Well, the optimizer saw that main doesn't do anything except return 0 (implicitly; not even represented in your code), so it transformed the entire function to simply an instruction that clears the EAX register and then returns.

From this, though, we can tell that a function's result is returned in EAX. This is true in the System V AMD64 calling convention common on Unix systems, and it's also true in the 64-bit calling convention used on Windows, and it's even true across all 32-bit x86 calling conventions that you'll find in the wild. (32-bit results are returned in EAX; 64-bit results are returned in EDX:EAX, where the high-order bits are in EDX and the low-order bits are in EAX.)

We can also tell by looking at the calc function's disassembly how it receives its parameters. The first integer parameter is passed in RDI, the second in RSI, the third in RDX, and the fourth in RCX. As per the System V AMD64 calling convention, if there was a fifth parameter, it would be passed in R8, and the sixth parameter would be passed in R9.

In other words, up to the first six integer parameters are passed in registers. After that, any additional integer parameters are passed on the stack.

Floating-point parameters are passed in XMM registers (XMM0 through XMM7), to facilitate use of SSE instructions. Again, any additional floating-point parameters are passed on the stack.

You try to make a distinction in the comments between "integer parameters" and "memory parameters", but there is no such thing as the latter. When you pass pointers (or references in C++, which compilers implement in terms of pointers), you are actually passing addresses. Since addresses are just integers, they are passed just like any other integer value—in registers.

If you pass parameters on the stack, they are all 8 bytes (64 bits) in size, and come one after the other. The first one is at an offset of 8 from the stack pointer, RBP. The second one would be at an offset of 16, etc. There seemed to be a bit of confusion when you were looking at the code in the question that arose from the offsets being represented in hexadecimal, where 10h is equivalent to 16 in decimal, and 18h is equivalent to 24 in decimal. (Why does the first parameter start at an offset of 8? Because the first position, RBP+0, is occupied by the return pointer.)

This basically covers the fundamentals of the calling convention. But frankly, analyzing disassembly is not a very good way of learning calling conventions. There are a lot more details that you won't necessarily see, and you won't get the big picture view. You really need to read the fine manual. If you hate manuals, there are more concise (and more simplified) summaries available various places online, e.g., Wikipedia.