According to C++ Reference, mutex.lock()
is a memory_order_acquire
operation, and mutex.unlock()
is a memory_order_release
operation.
However, memory_order_acquire
and memory_order_release
are only effective for non-atomic and relaxed atomic operations.
memory_order: Release-Acquire ordering on cppreference
If an atomic store in thread A is tagged
memory_order_release
and an atomic load in thread B from the same variable is taggedmemory_order_acquire
, all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B
Could mutex in C++ guarantee the visibility of atomic operations? An example is as follows. Can the code A
reorder before the mu.lock()
, and the thread b
read x
as false
?
#include <thread>
#include <atomic>
#include <cassert>
#include <iostream>
#include <unistd.h>
std::atomic<bool> x = {false};
std::mutex mu;
void write_x(){
mu.lock();
std::cout << "write_x" << std::endl;
x.store(true, std::memory_order_release);
mu.unlock();
}
void read_x() {
mu.lock();
std::cout << "read_x" << std::endl;
assert(x.load(std::memory_order_acquire)); // A
mu.unlock();
}
int main() {
std::thread a(write_x);
usleep(1);
std::thread b(read_x);
a.join(); b.join();
return 0;
}
TL:DR: "all memory writes" means all, not just the kinds mentions, but the phrasing is confusing. Probably intended just to point out that even non-atomic and relaxed atomic ops are safely visible across a synchronizes-with, but the phrasing is missing the word "including".
Note that cppreference is a wiki that's intended to explain the standard. It's not normative technical language, and sometimes even explains things in different terms than the ISO C++ standard.
It's generally very good, but don't just assume that it's perfect when something seems strange. From surrounding context (and sanity), like the last sentence in the paragraph saying "everything" with no qualifications, it's still fairly obvious that's what was meant.
ISO C++ is clear. An acquire operation that "sees" a release operation creates a synchronizes-with relationship. Everything before the release is visible to code after the acquire operation.
So in terms of a model where operations that access a globally coherent shared state of memory, acquire operations block everything from reordering before them. Including release and seq_cst operations. (Note that this part of cppreference doesn't make any reference to reordering, just to guaranteed visibility or not. Local reordering of accesses to global coherent state is in practice how real CPUs work, so it's often more convenient to describe things that way, like you're doing in the question.)
This means that C++'s definition of acquire and release matches standard terminology without insane magic exceptions. https://preshing.com/20120913/acquire-and-release-semantics/
Note that some people use "relaxed atomics" to describe all orderings weaker than seq_cst
. Example: Herb Sutter uses it that way in the talk this question is about.
That might be what was meant in that cppreference definition, but IDK why they'd want to exclude seq_cst
. All atomic and non-atomic operations are ordered. So perhaps they did mean mo_relaxed
, and just wanted to point out that even those are ordered / visible.
(seq_cst
could be said to already order itself wrt. everything else, so "of course" it's ordered with respect to acquire and release operations. But that reason seems unlikely.)
If it was intended for emphasis of the fact that weaker orders were also ordered by it, they should have written "including non-atomic and relaxed atomic". Without the word "including", that phrasing can be read as implying only non-atomic and relaxed-atomic. Only an understanding of the big picture and what would be sane or not can give you a correct reading.
Technical writing that needs to be precisely understood will often use the phrase "including but not limited to".
Also note that your example can still trigger the assert, just not for the reason you were worried about.
If thread a
is slow to start up, thread b
could enter its critical section first and print + read x
before the print+store in the other thread happens.
The usual way to write toy examples like that is a loop that spins on an acquire load until it sees a value, e.g. a flag like data_read
stored by a release operation after the store you care about. That way you know the read side runs after an acquire operation that synced-with a release operation in the write side.