Search code examples
c++cassemblyx86micro-optimization

Any possible code that can flip a bit/integer/bool between 0 and 1 in single CPU instruction


Can a single x86 instruction toggle a boolean value between '0' and '1'?

I thought of following ways but all result in two instruction with -O3 flag of gcc.

status =! status;

status = 1 - status;

status  = status == 0 ? 1: 0;

int flip[2] = {1, 0};
status = flip[status];

Is there a faster way to do this?

This is what I tried: https://godbolt.org/g/A3qNUw


What I need is a function which toggles the input and returns, written in a way that compiles to one instruction. Something similar to this function:

int addOne(int n) { return n+1; }

compiles on Godbolt to this:

  lea eax, [rdi+1]    # return n+1 in a single instruction
  ret

Solution

  • To flip a bit in an integer, use xor like this: foo ^= 1.

    gcc knows this optimization already for bool, so you can return !status; like a normal person without losing any efficiency. gcc does compile status ^= 1 to an xor instruction as well. In fact, all your ideas except the table lookup compile to a single xor instruction with bool input / return value.

    Check it out on the Godbolt compiler explorer with gcc -O3, with asm output panes for bool and int.

    MYTYPE func4(MYTYPE status) {
        status ^=1;
        return status;
    }
    
      # same code for bool or int
      mov eax, edi
      xor eax, 1
      ret
    

    vs.

    MYTYPE func1(MYTYPE status) {
        status = !status;
        return status;
    }
    
      # with -DMYTYPE=bool
      mov eax, edi
      xor eax, 1
      ret
    
      # with int
      xor eax, eax
      test edi, edi
      sete al
      ret
    

    Semi-related: XOR is add-without-carry. So if you only care about the low bit, you can copy-and-flip-low with lea eax, [rdi+1]. See Check if a number is even where that's useful in combination with and eax, 1 to get it done in 2 instructions.


    Why is bool different from int?

    The x86-64 System V ABI requires that callers passing a bool pass a 0 or 1 value, not just any non-zero integer. Thus, the compiler can assume that about the input.

    But with int foo, the C expression !foo requires "booleanizing" the value. !foo has type _Bool / (aka bool if you #include <stdbool.h>), and converting that back to an integer must produce a value of 0 or 1. If the compiler doesn't know that foo must be 0 or 1, it can't optimize !foo to foo^=1, and can't realize that foo ^= 1 flips a value between truthy / falsy. (In the sense that if(foo) means if(foo != 0) in C).

    This is why you get test/setcc (zero-extended into a 32-bit int by xor-zeroing a register before the test).

    Related: Boolean values as 8 bit in compilers. Are operations on them inefficient?. Stuff like (bool1 && bool2) ? x : y isn't always compiled as efficiently as you might hope. Compilers are pretty good, but do have missed-optimization bugs.


    What about that extra mov instruction?

    It will go away when inlining, if the compiler doesn't need / want to keep the old un-flipped value around for later. But in a stand-alone function, the first arg is in edi, and the return value needs to be in eax (in the x86-64 System V calling convention).

    Tiny functions like this are a close approximation to what you might get as part of a large function (if this flip couldn't be optimized into something else), but needing the result in a different register is a confounding factor.


    x86 doesn't have a copy-and-xor integer instruction, so for a stand-alone function it will take at least a mov to copy from the arg-passing register to eax.

    lea is special: it's one of the few integer ALU instructions that can write the result to a different register instead of destroying its input. lea is a copy-and-shift/add instruction, but there's no copy-and-xor instruction in x86. Many RISC instruction sets have 3-operand instructions, for example MIPS could do xor $t1, $t2, $t3.

    AVX introduced non-destructive versions of vector instructions (saving a lot of movdqa / movups register-copying in a lot of code), but for integer there are only a few new instructions that do different things. rorx eax, ecx, 16 for example does eax = rotate_right(ecx, 16), and uses the same VEX encoding that non-destructive AVX instructions use.