Specifically, is there any effective difference between:
i = a.load(memory_order_acquire);
or
a.store(5, memory_order_release);
and
atomic_thread_fence(memory_order_acquire);
i = a.load(memory_order_relaxed);
or
a.store(5, memory_order_relaxed);
atomic_thread_fence(memory_order_release);
respectively?
Do non-relaxed atomic accesses provide signal fences as well as thread fences?
You need
atomic_thread_fence(memory_order_release);
a.store(5, memory_order_relaxed);
and
i = a.load(memory_order_relaxed);
atomic_thread_fence(memory_order_acquire);
To replace
a.store(5, memory_order_release);
and
i = a.load(memory_order_acquire);
Non-relaxed atomic accesses do provide signal fences as well as thread fences.