Are there any implicit memory barriers in C++

In the following code, is using atomics necessary to guarantee race-free semantics on all platforms or does the use the use of a promise.set_value/future.wait imply some kind of implicit memory barrier that would allow me to rely on the flag write having become visible to the outer thread?

std::atomic_bool flag{false}; // <- does this need to be atomic?
runInThreadPoolBlocking([&]() {
    // do something
    flag.store(true);
});
if (flag.load()) // do something


// simplified runInThreadPoolBlocking implementation
template <typename Callable>
void runInThreadPoolBlocking(Callable func)
{
    std::promise<void> prom;
    auto fut = prom.get_future();

    enqueueToThreadPool([&]() {
        func();
        prom.set_value();
    });

    fut.get();
}

In general, are there any "implicit" memory barriers guaranteed by the standard for things like thread.join() or futures?

Solution

thread.join() and promise.set_value()/future.wait() guarantee to imply memory barriers.

Using atomic_bool is necessary if you don't want the compiler to reorder boolean check or assignment with other code. But in that particular case you can use not atomic bool. That flag will be guaranteed to be true in the moment of check if you don't use it in any other place, as the assignment and check are on the opposite sides of synchronisation point (fut.get()) (forcing the compiler to load real flag value) and the function runInThreadPoolBlocking() is guaranteed to finish only after the lambda is executed.

Quoting from cplusplus.com for future::get(), for example:

Data races

The future object is modified. The shared state is accessed as an atomic operation (causing no data races).

The same is for promise::set_value(). Besides other stuff

... atomic operation (causing no data races) ...

means no one of the conflicting evaluations happens before another (strict memory ordering).

So do all of std:: multithreading synchronization primitives and tools, where you expect some operations to occur only before or after the synchronization point (like std::mutex::lock() or unlock(), thread::join(), etc.).

Note that any operations on the thread object itself are not synchronized with thread::join() (unlike the operations within the thread it represents).