What calling convention should I use to make things portable?

I am writing a C interface for CPU's cpuid instruction. I'm just doing this as kind of an exercise: I don't want to use compiler-depended headers such as cpuid.h for GCC or intrin.h for MSVC. Also, I'm aware that using C inline assembly would be a better choice, since it avoids thinking about calling conventions (see this implementation): I'd just have to think about different compiler's syntaxes. However I'd like to start practicing a bit with integrating assembly and C.

Given that I now have to write a different assembly implementation for each major assembler (I was thinking of GAS, MASM and NASM) and for each of them both for x86-64 and x86, how should I handle the fact that different machines and C compilers may use different calling conventions?

Solution

If you really want to write, as just an exercise, an assembly function that "conforms" to all the common calling conventions for x86_64 (I know only the Windows one and the System V one), without relying on attributes or compiler flags to force the calling convention, let's take a look at what's common.

The Windows GPR passing order is rcx, rdx, r8, r9. The System V passing order is rdi, rsi, rdx, rcx, r8, r9. In both cases, rax holds the return value if it fits and is a piece of POD. Technically speaking, you can get away with a "polyglot" called function if it (0) saves the union of what each ABI considers non-volatile, and (1) returns something that can fit in a single register, and (2) takes no more than 2 GPR arguments, because overlap would happen past that. To be absolutely generic, you could make it take a single pointer to some structure that would hold whatever arbitrary return data you want.

So now our arguments will come through either rcx and rdx or rdi and rsi. How do you tell which will contain the arguments? I'm actually not sure of a good way. Maybe what you could do instead is have a wrapper that puts the arguments in the right spot, and have your actual function take "padding" arguments, so that your arguments always land in rcx and rdx. You could technically expand to r8 and r9 this way.

#ifdef _WIN32
#define CPUID(information) cpuid(information, NULL, NULL, NULL)
#else
#define CPUID(information) cpuid(NULL, NULL, NULL, information)
#endif

// d duplicates a
// c duplicates b
no_more_than_64_bits_t cpuid(void * a, void * b, void * c, void * d);

Then, in your assembly, save the union of what each ABI considers non-volatile, do your thing, put whatever information you want in the structure to which rcx points, and restore.