When coding a function, I usually recall the "Clean Code" principle of
A function shouldn’t have more than 3 arguments.
However, given these x86-64 calling conventions below, I've relaxed it to 4 arguments because that covers cross-platform functions by ensuring that CPU registers are utilized versus stack operations which are slower than register access.
System V AMD64 ABI
Integer/Pointer Arguments 1-6: RDI, RSI, RDX, RCX, R8, R9
Floating Point Arguments 1-8: XMM0 - XMM7
Excess Arguments: Stack
Microsoft x64 calling convention
Integer/Pointer Arguments 1-4: RCX, RDX, R8, R9
Floating Point Arguments 1-4: XMM0 - XMM3
Excess Arguments: Stack
Example
Limiting the number of arguments to 4, ensures both Windows and Linux use registers instead of the stack.
#include <stdio.h>
int Add(int a, int b, int c, int d) { return a + b + c + d; }
int main() {
int sum = Add(1,2,3,4);
return 0;
}
Linux Disassembly
mov ecx, 4
mov edx, 3
mov esi, 2
mov edi, 1
call Add
Windows Disassembly
mov r9d,4
mov r8d,3
mov edx,2
mov ecx,1
call Add
Question
When writing cross-platform functions, is this micro-optimization a viable new mantra?
A function shouldn’t have more than 4 arguments.
Note
Specific use case is assembling code with UASM (understands MASM syntax) on windows and linux.
For compilers (msvc, gcc), optimizer tools would handle performance, however, for assemblers (masm, nasm, uasm) there are no such tools.
The 4 args mantra was chosen for performance (not a coding style) so that the generated code started from an optimized state.
The "clean code" principle you're talking about most likely refers to exactly that: code tidiness and readability. Many parameters usually means long lines and/or parameter lists broken into multiple lines (just look at Win32 functions with 10 arguments).
However, since it seems that you are asking this from a strictly performance perspective, it doesn't matter how many parameters a function has. Rather than reducing the number of arguments, you should try to reduce the number of calls. A good way to do this is to write reasonably short functions which the compiler can inline.
Let's suppose that instead of 4 numbers, my code needs to add a batch of 4 and a batch of 5. With a 4 parameter function, I will need to do this:
#include <stdio.h>
int Add(int a, int b, int c, int d) { return a + b + c + d; }
int main() {
int sum1 = Add(1,2,3,4);
int sum2 = Add(1,2,3,Add(4,5,0,0));
return 0;
}
Now let's say I use a 5-parameter function:
#include <stdio.h>
int Add(int a, int b, int c, int d, int e) { return a + b + c + d + e; }
int main() {
int sum1 = Add(1,2,3,4,0);
int sum2 = Add(1,2,3,4,5);
return 0;
}
As you can see, the second version only uses 2 calls (because I need 2 different sums). The first version used 3 calls. So you're trading 4 mov
stack accesses (in the 5-arg version) for a whole extra call
/ret
pair. Which, mind you, also make 2 stack accesses for the return address (while possibly also introducing additional overhead in the form of shadow stack accesses, canary values placed in the new function's stack frame etc).
Of course, this example uses an Add
function which is inlined by the compiler (actually, it's not, the compiler detects that the sum is not needed and the Add
function has no additional side effects and generates an empty main
). But with a realistic function that's more complex it will almost always be better to call it fewer times.
This is, however, just the tip of the iceberg when it comes to performance. If you worry about performance, get a profiler and find the bottlenecks in your code. If your code is slow in a badly written loop you should only optimize that by hand. Leave most of the optimization to the compiler itself, because it knows what it's doing and has been refined for decades.
Your requirements are also somewhat contradictory. You are writing cross-platform functions but also the Specific use case is assembling code with UASM?