As known, there are 6 std::memory_order's, and 2 of its:
I.e. for acquire-semantic, only S = local1;
can be executed after X.load(std::memory_order_acquire);
:
static std::atomic<int> X;
static int L, S;
...
void thread_func()
{
int local1 = L; // load(L)-load(X) - !!! can be reordered with X !!!
S = local1; // store(S)-load(X) - !!! can be reordered with X !!!
int x_local = X.load(std::memory_order_acquire); // load(X)
int local2 = L; // load(X)-load(L) - can't be reordered with X
S = local2; // load(X)-store(S) - can't be reordered with X
}
But which of reorders across load(X)
can be for consume-semantic?
static std::atomic<int *> X;
static int L1, L2, S1, S2;
static int L, S;
...
void thread_func()
{
int *x_ptr_local = new int(1);
int local1 = L1; // load(L1)-load(X) - !!! can be reordered with X !!!
S1 = local1; // store(S1)-load(X) - !!! can be reordered with X !!!
int dependent_x1 = *x_ptr_local; // load(x_ptr)-load(X) - !!! can be reordered with X !!!
S = dependent_x1; // store(S)-load(X) - !!! can be reordered with X !!!
x_ptr_local = X.load(std::memory_order_consume); // load(X)
int dependent_x2 = *x_ptr_local; // load(X)-load(x_ptr) - can't be reordered with X
S = dependent_x2; // load(X)-store(S) - can't be reordered with X
int local2 = L2; // load(X)-load(L2) - !!! can be reordered with X !!!
S2 = local2; // load(X)-store(S2) - !!! can be reordered with X !!!
}
Is it true, that only operations with dependent_x2
can't be reordered across X.load(std::memory_order_consume)
?
And all of operations with variables L1
, L2
, S1
, S2
and dependent_x1
can be reordered across X.load(std::memory_order_consume)
- i.e. can be performed either before or after X.load(std::memory_order_consume)
, isn't it?
memory_order_consume
is used to preserve the ordering of data dependencies on the atomic object itself without using heavier synchronization such as that introduced by memory_order_acquire
. With memory_order_acquire
, all memory operations after the acquire
-- dependent on the atomic variable or otherwise -- are forbidden from being reordered before it, whereas memory_order_consume
only inhibits the reordering of dependent instructions. This is beneficial for more weakly-ordered architectures such as ARM and PowerPC which guarantee the ordering of data-dependent instructions without the need for an explicit barrier.
Since memory_order_consume
deals with data dependencies, most use cases of it involve a std::atomic<T*>
. Producer threads can build an entire data structure and publish the address of that data structure to the atomic pointer using memory_order_release
. Consumer threads then load the atomic pointer with memory_order_consume
and can establish a data dependency with the writer thread's stores if they use the pointer in a data-dependent manner, such as by dereferencing it. The standard guarantees that any dependent loads will reflect the writer thread's stores. Since the load of the atomic variable is done via memory_order_consume
however, no guarantees can be made about the state of independent variables from the reader thread's perspective.
In your first example, none of the loads after memory_order_acquire
can be reordered before it. However, in your second example, any reordering that doesn't have a dependency on X
or the loaded value of it is fair game. Namely, int dependent_x2 = *x_ptr_local;
(and the corresponding load from dependent_x2
) are guaranteed to remain ordered with respect to X
, but that's it. All other reorderings are possible.