Atomic bool fails to protect non-atomic counter

I encountered an issue with a (basic) spinlock mutex that does not seem to work as expected.

4 threads are incrementing a non-atomic counter that is protected by this mutex. The outcome does not match with the expected result which makes the mutex seem broken.

example output:

  result: 2554230
expected: 10000000

In my environment it happens under the following conditions:

  • flag is std::atomic<bool>, anything else such as std::atomic<int> or std::atomic_flag (with test_and_set) works fine.

  • compiled on X86_64 with gcc 6.3.1 and -O3 flag

My question is, what could explain this behavior ?

#include <iostream>
#include <vector>
#include <atomic>
#include <thread>
#include <mutex>
#include <assert.h>

class my_mutex {
    std::atomic<bool> flag{false};

    void lock()
        while (, std::memory_order_acquire));

    void unlock()
    {, std::memory_order_release);

my_mutex mut;
static int counter = 0;

void increment(int cycles)
    for (int i=0; i < cycles; ++i)
        std::lock_guard<my_mutex> lck(mut);


int main()
    std::vector<std::thread> vec;
    const int n_thr = 4;
    const int n_cycles = 2500000;

    for (int i = 0; i < n_thr; ++i)
        vec.emplace_back(increment, n_cycles);

    for(auto &t : vec)

    std::cout << "  result: " << counter << std::endl;
    std::cout << "expected: " << n_cycles * n_thr << std::endl;


Per request from Voo, here is the assembly output for increment()..

$ g++ -O3 increment.cpp
$ gdb a.out
Reading symbols from a.out...done.
(gdb) disassemble increment
Dump of assembler code for function increment(int):
   0x0000000000401020 <+0>:     mov    0x20122a(%rip),%ecx        # 0x602250 <_ZL7counter>
   0x0000000000401026 <+6>:     test   %edi,%edi
   0x0000000000401028 <+8>:     mov    $0x1,%edx
   0x000000000040102d <+13>:    lea    (%rdi,%rcx,1),%esi
   0x0000000000401030 <+16>:    jle    0x401058 <increment(int)+56>
   0x0000000000401032 <+18>:    nopw   0x0(%rax,%rax,1)
   0x0000000000401038 <+24>:    mov    %edx,%eax
   0x000000000040103a <+26>:    xchg   %al,0x20120c(%rip)        # 0x60224c <mut>
   0x0000000000401040 <+32>:    test   %al,%al
   0x0000000000401042 <+34>:    jne    0x401038 <increment(int)+24>
   0x0000000000401044 <+36>:    add    $0x1,%ecx
   0x0000000000401047 <+39>:    cmp    %ecx,%esi
   0x0000000000401049 <+41>:    mov    %ecx,0x201201(%rip)        # 0x602250 <_ZL7counter>
   0x000000000040104f <+47>:    movb   $0x0,0x2011f6(%rip)        # 0x60224c <mut>
   0x0000000000401056 <+54>:    jne    0x401038 <increment(int)+24>
   0x0000000000401058 <+56>:    repz retq
End of assembler dump.


  • Your code is correct. It's a bug 80004 - [6 Regression] non-atomic load moved to before atomic load with std::memory_order_acquire