If I have atomic_bool flag;
, how can I write C code to toggle it that's atomic, portable, and efficient? Regarding "efficient", I'd like it to assemble on x86_64 to lock xorb $1, flag(%rip)
. The "obvious" flag = !flag;
is out because it isn't actually atomic. My next guess would be flag ^= true;
, which assembled to this mess on GCC:
movzbl flag(%rip), %eax
0:
movb %al, -1(%rsp)
xorl $1, %eax
movl %eax, %edx
movzbl -1(%rsp), %eax
lock cmpxchgb %dl, flag(%rip)
jne 0b
And this mess on Clang:
movb flag(%rip), %al
0:
andb $1, %al
movl %eax, %ecx
xorb $1, %cl
lock cmpxchgb %cl, flag(%rip)
jne 0b
Then I tried specifying a weaker memory order by doing atomic_fetch_xor_explicit(&flag, true, memory_order_acq_rel);
instead. This does what I want on Clang, but GCC now completely fails to compile it with error: operand type '_Atomic atomic_bool *' {aka '_Atomic _Bool *'} is incompatible with argument 1 of '__atomic_fetch_xor'
. Interestingly, if my type is an atomic_char
instead of an atomic_bool
, then both GCC and Clang emit the assembly that I want. Is there a way to do what I want with atomic_bool
?
Mainly summarizing comments:
It looks like the only portable way to atomically toggle your atomic_bool
is flag ^= 1
. But as you noted, gcc and clang don't know how to optimize it, and fall back to the cmpxchg loop. If you want full portability and compliance I think you just have to put up with that, until such time as they fix their missed optimization, which you might want to report.
In principle, another option should be flag -= 1
or flag += -1
, which have the same truth table when nonzero values are treated as true. However, gcc compiles it to the same inefficient xor/cmpxchg code as flag ^= 1
, and clang actually miscompiles it: when flag == 0
, then flag -= 1
will set flag
to 0xff
which is invalid. It looks like this was reported several years ago but is still unfixed.
If you want a workaround, at least on x86 you should be able to do
atomic_fetch_xor((atomic_uchar *)&flag, 1);
I think it's okay for strict aliasing because atomic_uchar
is a character type. In practice it is most likely fine anyway, because an atomic access shouldn't be optimized away. To be safe, check the assembly that is generated, or just go ahead and replace the whole thing with the appropriate one-liner of inline asm.
It's a nice touch that clang extends the atomic_fetch_*
functions to work on atomic_bool
, even though the C standard doesn't support it (7.17.7.5p1: "None of these operations is applicable to atomic_bool
.") I don't really understand why the standards committee included that exception. All those operations still have to be available on atomic_bool
via the compound assignment operators, so omitting them from atomic_fetch_*
serves only to deprive the programmer of being able to use any weak memory ordering, without making life any easier for the implementation.
For similar reasons, I also don't understand why they didn't provide atomic_fetch_*
for the remaining compound assignment operators. atomic_fetch_mul
might not be that useful, but since *=
has to work, it shouldn't cost the implementation anything to speak of, and the consistency would be nice.