Search code examples
c++c++17atomicmemory-barriersstdatomic

Synchronising with mutex and relaxed memory order atomic


I have a shared data structure that is already internally synchronised with a mutex. Can I use an atomic with memory order relaxed to signal changes. A very simplified view of what I mean in code

Thread 1

shared_conf.set("somekey","someval");
is_reconfigured.store(true, std::memory_order_relaxed);

Thread 2

if (is_reconfigured.load(std::memory_order_relaxed)) {
  inspect_shared_conf();
}

Is it guaranteed that I'll see updates in the shared_map?; the shared map itself internally synchronises every write/read call with a mutex


Solution

  • Your example code will work, and yes, you will see the updates. The relaxed ordering will give you the correct behavior. That said, it may not actually be optimal in terms of performance.

    Let's look at a more concrete example, with the mutexes made explicit.

    std::mutex m;
    std::atomic<bool> updated;
    foo shared;
    
    void thr1() {
        m.lock();
        shared.write(new_data);
        m.unlock();
        updated.store(true, std::memory_order_relaxed);
    }
    
    void thr2() {
        if (updated.load(std::memory_order_relaxed)) {
            m.lock();
            data = shared.read();
            m.unlock();
        }
    }
    

    Informal explanation

    m.lock() is an acquire operation and m.unlock() is release. This means that nothing prevents updated.store(true) from "floating" up into the critical section, past m.unlock() and even shared.write(). At first glance this seems bad, because the whole point of the updated flag was to signal that shared.write() had finished. But no actual harm occurs in that case, because thr1 still holds the mutex m, and so if thr2 starts trying to read the shared data, it will just wait until thr1 drops it.

    What would really be bad is if updated.store() were to float all the way up past m.lock(); then thr2 could potentially see updated.load() == true and take the mutex before thr1. However, this cannot happen because of the acquire semantics m.lock().

    There could be some related issues in thr2 (a little more complicated because they would have to be speculative) but again we are saved by the same fact: the updated.load() can sink downward into the critical section, but not past it entirely (because m.unlock() is release).

    But this is an instance where a stronger memory order on the updated operations, although seemingly more expensive, might actually improve performance. If the value true in updated becomes visible prematurely, then thr2 attempts to lock m while it is already locked by thr1, and so thr2 will have to block while it waits for m to become available. But if you changed to updated.store(true, std::memory_order_release) and updated.load(std::memory_order_acquire), then the value true in updated can only become visible after m is truly unlocked by thr1, and so the m.lock() in thr2 should always succeed immediately (ignoring contention by any other threads that might exist).


    Proof

    Okay, that was an informal explanation, but we know those are always risky when thinking about memory ordering. Let's give a proof from the formal rules of the C++ memory model. I will follow the C++20 standard because I have it handy, but I don't believe there are any significant relevant changes from C++17. See [intro.races] for definitions of the terms used here.

    I claim that, if shared.read() executes at all, then shared.write(new_data) happens before it, and so by write-read coherence [intro.races p18] shared.read() will see the new data.

    The lock and unlock operations on m are totally ordered [thread.mutex.requirements.mutex p5]. Consider two cases: either thr1's unlock precedes thr2's lock (Case I), or vice versa (Case II).

    Case I

    If thr1's unlock precedes thr2's lock in m's lock order, then there is no problem; they synchronize with each other [thread.mutex.requirements.mutex p11]. Since shared.write(new_data) is sequenced before thr1's m.unlock(), and thr2's m.lock() is sequenced before shared.read(), by chasing the definitions in [intro.races] we see that shared.write(new_data) indeed happens before shared.read().

    Case II

    Now suppose the contrary, that thr2's lock precedes thr1's unlock in m's lock order. Since locks and unlocks of the same mutex cannot interleave (that's the whole point of a mutex, to provide mutual exclusion), the lock total order on m must be as follows:

    thr2: m.lock()
    thr2: m.unlock()
    thr1: m.lock()
    thr1: m.unlock()
    

    That means that thr2's m.unlock() synchronizes with thr1's m.lock(). Now updated.load() is sequenced before thr2 m.unlock(), and thr1 m.lock() is sequenced before updated.store(true), so it follows that updated.load() happens before updated.store(true). By read-write coherence [intro.races p17], updated.load() must not take its value from updated.store(true), but from some strictly earlier side effect in the modification order of updated; presumably its initialization to false.

    We conclude that updated.load() must return false in this case. But if that were so, then thr2 would never have tried to lock the mutex in the first place. This is a contradiction, so Case II must never occur.