Search code examples
c++multithreadingmutexatomiccondition-variable

C++ atomic is it safe to replace a mutex with an atomic<int>?


#include <iostream>
#include <thread>
#include <mutex>
#include <atomic>
using namespace std;

const int FLAG1 = 1, FLAG2 = 2, FLAG3 = 3;
int res = 0;
atomic<int> flagger;

void func1() 
{
    for (int i=1; i<=1000000; i++) {
        while (flagger.load(std::memory_order_relaxed) != FLAG1) {}
        res++; // maybe a bunch of other code here that don't modify flagger
        // code here must not be moved outside the load/store (like mutex lock/unlock)
        flagger.store(FLAG2, std::memory_order_relaxed);
    }
    cout << "Func1 finished\n";
}

void func2() 
{
    for (int i=1; i<=1000000; i++) {
        while (flagger.load(std::memory_order_relaxed) != FLAG2) {}
        res++; // same
        flagger.store(FLAG3, std::memory_order_relaxed);
    }
    cout << "Func2 finished\n";
}

void func3() {
    for (int i=1; i<=1000000; i++) {
        while (flagger.load(std::memory_order_relaxed) != FLAG3) {}
        res++; // same
        flagger.store(FLAG1, std::memory_order_relaxed);
    }
    cout << "Func3 finished\n";
}

int main()
{
    flagger = FLAG1;
    
    std::thread first(func1);
    std::thread second(func2);
    std::thread third(func3);
    
    first.join();
    second.join();
    third.join();
    
    cout << "res = " << res << "\n";
    return 0;
}

My program has a segment that is similar to this example. Basically, there are 3 threads: inputer, processor, and outputer. I found that busy wait using atomic is faster than putting threads to sleep using condition variable, such as:

std::mutex commonMtx;
std::condition_variable waiter1, waiter2, waiter3;
// then in func1()
unique_lock<std::mutex> uniquer1(commonMtx);
while (flagger != FLAG1) waiter1.wait(uniquer1);

However, is the code in the example safe? When I run it gives correct results (-std=c++17 -O3 flag). However, I don't know whether the compiler can reorder my instructions to outside the atomic check/set block, especially with std::memory_order_relaxed. In case it's unsafe, then is there any way to make it safe while being faster than mutex?

Edit: it's guaranteed that the number of thread is < number of CPU cores


Solution

  • std::memory_order_relaxed results in no guarantees on the ordering of memory operations except on the atomic itself.

    All your res++; operations therefore are data races and your program has undefined behavior.

    Example:

    #include<atomic>
    
    int x;
    std::atomic<int> a{0};
    
    void f() {
        x = 1;
        a.store(1, std::memory_order_relaxed);
        x = 2;
    }
    

    Clang 13 on x86_64 with -O2 compiles this function to

        mov     dword ptr [rip + a], 1
        mov     dword ptr [rip + x], 2
        ret
    

    (https://godbolt.org/z/hxjYeG5dv)

    Even on a cache-coherent platform, between the first and second mov, another thread can observe a set to 1, but x not set to 1.

    You must use memory_order_release and memory_order_acquire (or sequential consistency) to replace a mutex.

    (I should add that I have not checked your code in detail, so I can't say that simply replacing the memory order is sufficient in your specific case.)