c x86 atomic compiler-optimization stdatomic

Efficient way to toggle an atomic_bool

If I have atomic_bool flag;, how can I write C code to toggle it that's atomic, portable, and efficient? Regarding "efficient", I'd like it to assemble on x86_64 to lock xorb $1, flag(%rip). The "obvious" flag = !flag; is out because it isn't actually atomic. My next guess would be flag ^= true;, which assembled to this mess on GCC:

        movzbl  flag(%rip), %eax
0:
        movb    %al, -1(%rsp)
        xorl    $1, %eax
        movl    %eax, %edx
        movzbl  -1(%rsp), %eax
        lock cmpxchgb   %dl, flag(%rip)
        jne     0b

And this mess on Clang:

        movb    flag(%rip), %al
0:
        andb    $1, %al
        movl    %eax, %ecx
        xorb    $1, %cl
        lock            cmpxchgb        %cl, flag(%rip)
        jne     0b

Then I tried specifying a weaker memory order by doing atomic_fetch_xor_explicit(&flag, true, memory_order_acq_rel); instead. This does what I want on Clang, but GCC now completely fails to compile it with error: operand type '_Atomic atomic_bool *' {aka '_Atomic _Bool *'} is incompatible with argument 1 of '__atomic_fetch_xor'. Interestingly, if my type is an atomic_char instead of an atomic_bool, then both GCC and Clang emit the assembly that I want. Is there a way to do what I want with atomic_bool?

Solution

Mainly summarizing comments:

It looks like the only portable way to atomically toggle your atomic_bool is flag ^= 1. But as you noted, gcc and clang don't know how to optimize it, and fall back to the cmpxchg loop. If you want full portability and compliance I think you just have to put up with that, until such time as they fix their missed optimization, which you might want to report.

In principle, another option should be flag -= 1 or flag += -1, which have the same truth table when nonzero values are treated as true. However, gcc compiles it to the same inefficient xor/cmpxchg code as flag ^= 1, and clang actually miscompiles it: when flag == 0, then flag -= 1 will set flag to 0xff which is invalid. It looks like this was reported several years ago but is still unfixed.

If you want a workaround, at least on x86 you should be able to do

atomic_fetch_xor((atomic_uchar *)&flag, 1);

I think it's okay for strict aliasing because atomic_uchar is a character type. In practice it is most likely fine anyway, because an atomic access shouldn't be optimized away. To be safe, check the assembly that is generated, or just go ahead and replace the whole thing with the appropriate one-liner of inline asm.

It's a nice touch that clang extends the atomic_fetch_* functions to work on atomic_bool, even though the C standard doesn't support it (7.17.7.5p1: "None of these operations is applicable to atomic_bool.") I don't really understand why the standards committee included that exception. All those operations still have to be available on atomic_bool via the compound assignment operators, so omitting them from atomic_fetch_* serves only to deprive the programmer of being able to use any weak memory ordering, without making life any easier for the implementation.

For similar reasons, I also don't understand why they didn't provide atomic_fetch_* for the remaining compound assignment operators. atomic_fetch_mul might not be that useful, but since *= has to work, it shouldn't cost the implementation anything to speak of, and the consistency would be nice.