Search code examples
c++language-lawyeratomic

Is synchronization relationship necessary to avoid the duplicate invocation of a function?


Consider this example:

#include <iostream>
#include <atomic>
#include <random>
#include <thread>
int need_close(){
   random_device rd;
   std::mt19937 gen(rd());
   uniform_int_distribution<int> distribute(0, 2);
   return distribute(gen);
}
void invoke(std::atomic<bool>& is_close){
    if(is_close.load(std::memory_order::relaxed)){  // #1
        return;
    }
    auto r = need_close();
    if(r==0){
        is_close.store(true,std::memory_order::relaxed);
    }
}
int main(){
    std::atomic<bool> is_close{false};
    std::thread t1([&](){
        for(auto i = 0; i<100000;i++)
        invoke(is_close);
    });
    std::thread t2([&](){
        for(auto i = 0; i<100000;i++)
        invoke(is_close);
    });
    t1.join();
    t2.join();
}

In this example, Is the relaxed ordering sufficient to avoid the call to need_close once the thread sees that is_close == true(not immediately, but at some time once the thread can read is_close==true)? In this example, It seems that I don't need the synchronization to avoid data-race because there is no conflict action in this example. However, from the compiler implementation, the compiler may reorder #1 to any place following it because relaxed is used here, for example, if #1 is moved to some place after the call point of need_close, the need_close will be always called again even though is_close is set to be true, which is not expected. So, I wonder whether Acquire/Release ordering is necessary to avoid compiler reordering the code to make the logic be expected?


Solution

  • Apparently what you really meant to ask is whether need_close can be called even if is_close.load(relaxed) returned true, due to the compiler moving the check later. Despite your edit leaving the title talking about "duplicate" invocation, as if there was something in this code that would otherwise prevent it from being called twice. (Which as comments and other answers discuss, there isn't, you can get multiple invocations even with seq_cst.)


    The cardinal rule of out-of-order exec by CPUs, and of compile-time reordering, is that it must not break single-threaded code.

    If is_close.load() is true, the early-out return runs, so nothing else in the function happens. This is sequenced before the call to need_close.

    Compilers can't just arbitrarily shuffle source lines (which isn't how they optimize anyway), they have to respect sequencing to not break single-threaded programs, and the sequenced-before rules for multi-threaded programs.

    The CPU might fetch and decode the machine code for need_close() in the shadow of a mispredicted branch (which can happen in a single-threaded program), but the end result has to be as if the code ran in program order with is_close.load() having produced whatever value.

    relaxed means that value didn't have to be ready until after we've read and/or written some other things, but practically that will happen via out-of-order speculative execution by the CPU (which can discard mis-speculated work when it rolls back to a consistent state after a mispredict), not by the compiler statically reordering things in ways such that visible side-effects happen differently from the abstract machine if there are no other threads running.

    Since your need_close() has visible side-effects (I/O in the form of constructing and reading from a std::random_device object), in this case there's not anything the compiler could usefully do unconditionally.


    Who's afraid of a big bad optimizing compiler? on LWN talks about some practical examples of compile-time reordering, including some subtle and interesting ones, but also some of the limitations of what compilers must not do.

    See also https://preshing.com/20120625/memory-ordering-at-compile-time/ and Preshing's other articles to get a better understanding of how to think about this stuff.