I've been reading about assembly functions and I'm confused as to whether to use the enter and exit or just the call/return instructions for fast execution. Is one way fast and the other smaller? For example what is the fastest (stdcall) way to do this in assembly without inlining the function:
static Int32 Add(Int32 a, Int32 b) {
return a + b;
}
int main() {
Int32 i = Add(1, 3);
}
Use call
/ ret
, without making a stack frame with either enter
/ leave
or push&pop rbp / mov rbp, rsp
. gcc (with the default -fomit-frame-pointer
) only makes a stack frame in functions that do variable-size allocation on the stack. This may make debugging slightly more difficult, since gcc normally emits stack unwind info when compiling with -fomit-frame-pointer
, but your hand-written asm won't have that. Normally it only makes sense to write leaf functions in asm, or at least ones that don't call many other functions.
Stack frames mean you don't have to keep track of how much the stack pointer has changed since function entry to access stuff on the stack (e.g. function args and spill slots for locals). Both Windows and Linux/Unix 64bit ABIs pass the first few args in registers, and there are often enough regs that you don't have to spill any variables to the stack. Stack frames are a waste of instructions in most cases. In 32bit code, having ebp
available (going from 6 to 7 GP regs, not counting the stack pointer) makes a bigger difference than going from 14 to 15. Of course, you still have to push/pop
rbp if you do use it, though, because in both ABIs it's a callee-saved register that functions aren't allowed to clobber.
If you're optimizing x86-64 asm, you should read Agner Fog's guides, and check out some of the other links in the x86 tag wiki.
The best implementation of your function is probably:
align 16
global Add
Add:
lea eax, [rdi + rsi]
ret
; the high 32 of either reg doesn't affect the low32 of the result
; so we don't need to zero-extend or use a 32bit address-size prefix
; like lea eax, [edi, esi]
; even if we're called with non-zeroed upper32 in rdi/rsi.
align 16
global main
main:
mov edi, 1 ; 1st arg in SysV ABI
mov esi, 3 ; 2nd arg in SysV ABI
call Add
; return value in eax in all ABIs
ret
align 16
OPmain: ; This is what you get if you don't return anything from main to use the result of Add
xor eax, eax
ret
This is in fact what gcc emits for Add()
, but it still turns main into an empty function, or into a return 4
if you return i
. clang 3.7 respects -fno-inline-functions
even when the result is a compile-time constant. It beats my asm by doing tail-call optimization, and jmp
ing to Add
.
Note that the Windows 64bit ABI uses different registers for function args. See the links in the x86 tag wiki, or Agner Fog's ABI guide. Assembler macros may help for writing functions in asm that use the correct registers for their args, depending on the platform you're targeting.