I need to integrate some C code with an existing project written in assembly. That project uses a number of registers for its internal purposes, so I do not want the C code to overwrite them.
Can I specify which registers GCC can / cannot use? Or should I rather save the registers before calling C code and then restore them?
Also, what are other caveats to be aware of?
Normally the standard calling convention is pretty reasonable, and specifies some registers as call-clobbered, and some as call-preserved. Use call-preserved registers for values that you want to survive function calls. For example, see the function-calling convention part of What are the calling conventions for UNIX & Linux system calls on i386 and x86-64.
The standard, but less descriptive, terms are "caller-saved" vs. "callee saved" (confusing because it's normal for nobody to save a call-clobbered register, and let the value die if you don't need it), or "volatile" vs. "non-volatile": somewhat bogus because volatile
already has an unrelated specific technical meaning in C.
I like call-preserved vs. call-clobbered because it describes both kinds of registers from the perspective of the current function using them.
You can use whatever custom calling convention you want for calls between hand-written asm functions, documenting the convention in comments on a per-function basis. It's normally a good idea to use the standard calling convention for your platform as much as possible, only customizing when there's a speedup to be had. Most are fairly well-designed and strike a good balance between performance and code-size, passing args efficiently and so on.
One exception to the rule is that the i386 32-bit calling convention
(used on Linux) sucks. It passes all args on the stack, not registers. You can customize the calling convention x86 gcc will use with -mregparm=2 -msseregparm
for example, to pass the first 2 integer args in eax
and edx
on 32-bit x86. 32-bit Windows often uses a calling convention like this, e.g. _vectorcall
. If you're on x86, see Agner Fog's calling convention guide (and other x86 asm optimization guides).
GCC does have some code-gen options that modify the calling convention registers.
You can tell gcc that it must not touch a register at all with -ffixed-reg
, e.g. -ffixed-rbx
(so it will still have your value in an interrupt or signal handler, for example).
Or you can tell gcc that a register is call-preserved (-fcall-saved-reg
), so it can use it as long as it saves/restores it. This is probably what you want if you just want gcc to put things back when its done, without gimping its ability to free up registers for cases where having an extra register is worth saving/restoring one. (If that C code calls back into your asm, it will be expecting your asm functions to follow the same calling convention you've told it about.)
Interestingly -fcall-saved-reg
seems to work even for arg-passing registers, so you could make multiple function calls without reloading registers.
And finally, -fcall-used-reg
tells the compiler it's free to clobber a register.
Note that it's an error to use -fcall-saved
on a return-value register, or -fcall-used
on the stack or frame pointer, but gcc may silently do silly things instead of warning!
It is an error to use this flag with the frame pointer or stack pointer. Use of this flag for other registers that have fixed pervasive roles in the machine’s execution model produces disastrous results.
So these advanced options may not protect you from yourself if you use them in a silly way; wear your safety goggles + hard hat. You have been warned.
Example: I used x86-64, but it should be equivalent for any other architecture.
// tempt the compiler into using lots of registers
// to keep values across loop iterations.
int foo(int a, int *p, int len) {
int t1 = a * 2, t2 = a-1, t3 = a>>3;
int max= p[0];
for (int i=0 ; i<len ; i++) {
p[i] *= t1;
p[i] |= t2;
p[i] ^= t3;
max = (p[i] < max) ? max : p[i];
}
return max;
}
On Godbolt for x86-64 with gcc6.3 -O3 -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-rsi -fno-tree-vectorize
foo: # args in the x86-64 SysV convention: int edi, int *rsi, int edx
lea r9d, [rdi+rdi]
lea r10d, [rdi-1]
mov eax, DWORD PTR [rsi]
sar edi, 3
test edx, edx # check if loop runs at least once: len <= 0
jle .L10
push rsi # save of normally volatile RSI
lea r8d, [rdx-1]
push rdx # and RDX
lea r11, [rsi+4+r8*4]
.L3:
mov r8d, DWORD PTR [rsi]
imul r8d, r9d # and use of temporaries that require a REX prefix
or r8d, r10d
xor r8d, edi
cmp eax, r8d
mov DWORD PTR [rsi], r8d
cmovl eax, r8d
add rsi, 4 # pointer-increment of RSI as the loop counter
cmp r11, rsi
jne .L3
pop rdx # and restore RDX + RSI
pop rsi
.L10:
ret
Note the use of r8-r11 as temporaries. These registers require a REX prefix to access, adding 1 byte of code size unless you already needed 32-bit operand size. So gcc prefers using the low 8 registers (eax..ebp) for scratch regs, only using r8d
if it would otherwise have to save/restore rbx
or rbp
.
The code-gen is basically the same without the -fcall-saved-reg
options, but with a different choice of registers and no push/pop.