I'm using GCC 4.8.1 to compile C code and I need to detect if underflow occurs in a subtraction on x86/64 architecture. Both are UNSIGNED. I know in assembly is very easy, but I'm wondering if I can do it in C code and have GCC optimize it in a way, cause I can't find it. This is a very used function (or lowlevel, is that the term?) so I need it to be efficient, but GCC seems to be too dumb to recognize this simple operation? I tried so many ways to give it hints in C, but it always uses two registers instead of just a sub and a conditional jump. And to be honest I get annoyed seeing such stupid code written so MANY times (function is called a lot).
My best approach in C seemed to be the following:
if((a-=b)+b < b) {
// underflow here
}
Basically, subtract b from a, and if result underflows detect it and do some conditional processing (which is unrelated to a's value, for example, it brings an error, etc).
GCC seems too dumb to reduce the above to just a sub and a conditional jump, and believe me I tried so many ways to do it in C code, and tried alot of command line options (-O3 and -Os included of course). What GCC does is something like this (Intel syntax assembly):
mov rax, rcx ; 'a' is in rcx
sub rcx, rdx ; 'b' is in rdx
cmp rax, rdx ; useless comparison since sub already sets flags
jc underflow
Needless to say the above is stupid, when all it needs is this:
sub rcx, rdx
jc underflow
This is so annoying because GCC does understand that sub modifies flags that way, since if I typecast it into a "int" it will generate the exact above except it uses "js" which is jump with sign, instead of carry, which will not work if the unsigned values difference is high enough to have the high bit set. Nevertheless it shows it is aware of the sub instruction affecting those flags.
Now, maybe I should give up on trying to make GCC optimize this properly and do it with inline assembly which I have no problems with. Unfortunately, this requires "asm goto" because I need a conditional JUMP, and asm goto is not very efficient with an output because it's volatile.
I tried something but I have no idea if it is "safe" to use or not. asm goto can't have outputs for some reason. I do not want to make it flush all registers to memory, that would kill the entire point I'm doing this which is efficiency. But if I use empty asm statements with outputs set to the 'a' variable before and after it, will that work and is it safe? Here's my macro:
#define subchk(a,b,g) { typeof(a) _a=a; \
asm("":"+rm"(_a)::"cc"); \
asm goto("sub %1,%0;jc %l2"::"r,m,r"(_a),"r,r,m"(b):"cc":g); \
asm("":"+rm"(_a)::"cc"); }
and using it like this:
subchk(a,b,underflow)
// normal code with no underflow
// ...
underflow:
// underflow occured here
It's a bit ugly but it works just fine. On my test scenario, it compiles just FINE without volatile overhead (flushing registers to memory) without generating anything bad, and it seems it works ok, however this is just a limited test, I can't possibly test this everywhere I use this function/macro as I said it is used A LOT, so I'd like to know if someone is knowledgeable, is there something unsafe about the above construct?
Particularly, the value of 'a' is NOT NEEDED if underflow occurs, so with that in mind are there any side effects or unsafe stuff that can happen with my inline asm macro? If not I'll use it without problems till they optimize the compiler so I can replace it back after I guess.
Please don't turn this into a debate about premature optimizations or what not, stay on topic of the question, I'm fully aware of that, so thank you.
I probably miss something obvious, but why isn't this good?
extern void underflow(void) __attribute__((noreturn));
unsigned foo(unsigned a, unsigned b)
{
unsigned r = a - b;
if (r > a)
{
underflow();
}
return r;
}
I have checked, gcc optimizes it to what you want:
foo:
movl %edi, %eax
subl %esi, %eax
jb .L6
rep
ret
.L6:
pushq %rax
call underflow
Of course you can handle underflow however you want, I have just done this to keep the asm simple.