Search code examples
c++c++11concurrencymemory-modelstdatomic

Will two atomic writes to different locations in different threads always be seen in the same order by other threads?


Similar to my previous question, consider this code

-- Initially --
std::atomic<int> x{0};
std::atomic<int> y{0};

-- Thread 1 --
x.store(1, std::memory_order_release);

-- Thread 2 --
y.store(2, std::memory_order_release);

-- Thread 3 --
int r1 = x.load(std::memory_order_acquire);   // x first
int r2 = y.load(std::memory_order_acquire);

-- Thread 4 --
int r3 = y.load(std::memory_order_acquire);   // y first
int r4 = x.load(std::memory_order_acquire);

Is the weird outcome r1==1, r2==0 and r3==2, r4==0 possible in this case under the C++11 memory model? What if I were to replace all std::memory_order_acq_rel by std::memory_order_relaxed?

On x86 such an outcome seems to be forbidden, see this SO question but I am asking about the C++11 memory-model in general.

Bonus question:

We all agree, that with std::memory_order_seq_cst the weird outcome would not be allowed in C++11. Now, Herb Sutter said in his famous atomic<>-weapons talk @ 42:30 that std::memory_order_seq_cst is just like std::memory_order_acq_rel but std::memory_order_acquire-loads may not move before std::memory_order_release-writes. I cannot see how this additional constraint in the above example would prevent the weird outcome. Can anyone explain?


Solution

  • The updated1 code in the question (with loads of x and y swapped in Thread 4) does actually test that all threads agree on a global store order.

    Under the C++11 memory model, the outcome r1==1, r2==0, r3==2, r4==0 is allowed and in fact observable on POWER.

    On x86 this outcome is not possible, because there "stores are seen in a consistent order by other processors". This outcome is also not allowed in a sequential consistent execution.


    Footnote 1: The question originally had both readers read x then y. A sequentially consistent execution of that is:

    -- Initially --
    std::atomic<int> x{0};
    std::atomic<int> y{0};
    
    -- Thread 4 --
    int r3 = x.load(std::memory_order_acquire);
    
    -- Thread 1 --
    x.store(1, std::memory_order_release);
    
    -- Thread 3 --
    int r1 = x.load(std::memory_order_acquire);
    int r2 = y.load(std::memory_order_acquire);
    
    -- Thread 2 --
    y.store(2, std::memory_order_release);
    
    -- Thread 4 --
    int r4 = y.load(std::memory_order_acquire);
    

    This results in r1==1, r2==0, r3==0, r4==2. Hence, this is not a weird outcome at all.

    To be able to say that each reader saw a different store order, we need them to read in opposite orders to rule out the last store simply being delayed.