Is there a standard for saving caller saved registers from within variadic functions?

I'm exploring variadic functions, particularly the assembly it gets compiled to. Let's imagine we have a variadic function which calls a function from within it. With reference to the example file below,

#include <stdio.h>

void foo(int arg1, int arg2) {}

void sum(int num_args, ...) {
    foo(69, 420); 
}

int main(int argc, const char **argv) { sum(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); }

I then this with the gcc -S flag and saw that the argument registers %rdi through r9 (values 1 through 6 passed to sum) were pushed onto the stack at specific offsets (%rbp - 180, 168, 160, 152, 144, 136, respectively).

My questions are:

Why does rdi get 12 bytes when it's at most 8 bytes? Or is that an incorrect assumption?
Why these arbitrary addresses instead of simple push operations? Is this something I can know a priori without having to tinker/dig into the gcc source?

Thanks

For context, I'm trying to implement an interface for variadic argument management similar to stdarg.h to explore what's going on under the hood and practice writing x86. Not being able to know where the compiler stores the variadic caller's arguments before making calls to functions such as va_start etc. makes it quite difficult to achieve. Ideally I'd want a portable way to know where the arguments are being stored (well, as portable as an x86 routine can be)

Solution

No, there's no "standard" location in the stack frame for a callee to dump its incoming register args.

va_start isn't a function, it's a compiler builtin (or a macro that uses one). It's what gets the compiler to emit the instructions that dump the register args to an array it can index later. (And presumably that array is aligned; if you're talking about 12 byte of space in the stack frame for RDI, that's probably just 4 bytes of padding for alignment.)

(x86-64 varargs is pretty complicated because it can't dump them contiguous with stack args, and next-integer vs. next-xmm need separate counters / pointers. Unlike Windows x64 which makes variadic functions very easy with shadow space, at the cost of some efficiency for the more common non-variadic case.)

As for why so far below RSP, the asm output for your example (https://godbolt.org/z/dsqeqaE9h) seems pretty obvious: the space above that is used for dumping XMM0..7 if there are any XMM args (AL!=0). Dumping R9 at rbp-136 is contiguous with XMM0 and rbp-128.

Why not push? What C/C++ compiler can use push pop instructions for creating local variables, instead of just increasing esp once? - somewhat of a missed optimization. It could save some code size in this case, but it does still need a sub from RSP to make space for the XMM args. If your function had other local variables, the offsets for the incoming args could be different.

I'm trying to implement an interface for variadic argument management similar to stdarg.h

You can't write your own 3rd party vararg stuff without using compiler builtins and interacting with compiler internals. Just using inline asm won't work safely/correctly, especially with optimization enabled. (Notice that with -O1, no incoming args are dumped to the stack because the callee doesn't use them.)