As a followup from my previous question, the atomic<T>
class specifies most operations with a memory_order
parameter. In contrast to a fence this memory order affects only the atomic on which it operates. Presumably by using several such atomics you can build a concurrent algorithm where the ordering of other memory is unimportant.
So I have two questions:
The memory ordering parameter on operations on std::atomic<T>
variables does not affect the ordering of that operation per se, it affects the ordering relationships that operation creates with other operations.
e.g. a.store(std::memory_order_release)
on its own tells you nothing about how operations on a
are ordered with respect to anything else, but paired with a call to a.load(std::memory_order_acquire)
from another thread, this then order other operations --- all writes to other variables (including non-atomic ones) done by the thread that did the store to a
are visible to the thread that did the load, if that load reads the value stored.
On modern processors, some memory orderings on operations are no-ops. e.g. on x86, memory_order_acquire
, memory_order_consume
and memory_order_release
are implicit in the load and store instructions, and do not require separate fences. In these cases the orderings just affect the instruction reordering the compiler can do.
Clarification: The implicit fences in the instructions can mean that the compiler does not need to issue any explicit fence instructions if all the memory ordering constraints are attached to individual operations on atomic variables. If you use memory_order_relaxed
for everything, and add explicit fences then the compiler may well have to explicitly issue those fences as instructions.
e.g. on x86, the XCHG
instruction carries with it an implicit memory_order_seq_cst
fence. There is thus no difference between the generated code for the two exchange operations below on x86 --- they both map to a single XCHG
instruction:
std::atomic<int> ai;
ai.exchange(3,std::memory_order_relaxed);
ai.exchange(3,std::memory_order_seq_cst);
However, I'm not yet aware of any compiler that get rid of the explicit fence instructions in the following code:
std::atomic_thread_fence(std::memory_order_seq_cst);
ai.exchange(3,std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
I expect compilers will handle that optimization eventually, but there are other similar cases where the implicit fences will allow better optimization.
Also, std::memory_order_consume
can only be applied to direct operations on variables.