Can a single x86 instruction toggle a boolean value between '0' and '1'?
I thought of following ways but all result in two instruction with -O3 flag of gcc.
status =! status;
status = 1 - status;
status = status == 0 ? 1: 0;
int flip[2] = {1, 0};
status = flip[status];
Is there a faster way to do this?
This is what I tried: https://godbolt.org/g/A3qNUw
What I need is a function which toggles the input and returns, written in a way that compiles to one instruction. Something similar to this function:
int addOne(int n) { return n+1; }
compiles on Godbolt to this:
lea eax, [rdi+1] # return n+1 in a single instruction
ret
To flip a bit in an integer, use xor
like this: foo ^= 1
.
gcc knows this optimization already for bool
, so you can return !status;
like a normal person without losing any efficiency. gcc does compile status ^= 1
to an xor instruction as well. In fact, all your ideas except the table lookup compile to a single xor
instruction with bool
input / return value.
Check it out on the Godbolt compiler explorer with gcc -O3
, with asm output panes for bool
and int
.
MYTYPE func4(MYTYPE status) {
status ^=1;
return status;
}
# same code for bool or int
mov eax, edi
xor eax, 1
ret
vs.
MYTYPE func1(MYTYPE status) {
status = !status;
return status;
}
# with -DMYTYPE=bool
mov eax, edi
xor eax, 1
ret
# with int
xor eax, eax
test edi, edi
sete al
ret
Semi-related: XOR is add-without-carry.
So if you only care about the low bit, you can copy-and-flip-low with lea eax, [rdi+1]
.
See Check if a number is even where that's useful in combination with and eax, 1
to get it done in 2 instructions.
bool
different from int
?The x86-64 System V ABI requires that callers passing a bool
pass a 0 or 1 value, not just any non-zero integer. Thus, the compiler can assume that about the input.
But with int foo
, the C expression !foo
requires "booleanizing" the value. !foo
has type _Bool
/ (aka bool
if you #include <stdbool.h>
), and converting that back to an integer must produce a value of 0 or 1. If the compiler doesn't know that foo
must be 0
or 1
, it can't optimize !foo
to foo^=1
, and can't realize that foo ^= 1
flips a value between truthy / falsy. (In the sense that if(foo)
means if(foo != 0)
in C).
This is why you get test/setcc (zero-extended into a 32-bit int
by xor
-zeroing a register before the test
).
Related: Boolean values as 8 bit in compilers. Are operations on them inefficient?. Stuff like (bool1 && bool2) ? x : y
isn't always compiled as efficiently as you might hope. Compilers are pretty good, but do have missed-optimization bugs.
mov
instruction?It will go away when inlining, if the compiler doesn't need / want to keep the old un-flipped value around for later. But in a stand-alone function, the first arg is in edi
, and the return value needs to be in eax
(in the x86-64 System V calling convention).
Tiny functions like this are a close approximation to what you might get as part of a large function (if this flip couldn't be optimized into something else), but needing the result in a different register is a confounding factor.
x86 doesn't have a copy-and-xor integer instruction, so for a stand-alone function it will take at least a mov
to copy from the arg-passing register to eax
.
lea
is special: it's one of the few integer ALU instructions that can write the result to a different register instead of destroying its input. lea
is a copy-and-shift/add instruction, but there's no copy-and-xor instruction in x86. Many RISC instruction sets have 3-operand instructions, for example MIPS could do xor $t1, $t2, $t3
.
AVX introduced non-destructive versions of vector instructions (saving a lot of movdqa
/ movups
register-copying in a lot of code), but for integer there are only a few new instructions that do different things. rorx eax, ecx, 16
for example does eax = rotate_right(ecx, 16)
, and uses the same VEX encoding that non-destructive AVX instructions use.