c++multithreading thread-safety atomic memory-model

confused about atomic class: memory_order_relaxed

I am studying this site: https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync, which is very helpful to understand the topic about atomic class.

But this example about relaxed mode is hard to understand:

    /*Thread 1:*/

    y.store (20, memory_order_relaxed)
    x.store (10, memory_order_relaxed)
    /*Thread 2*/
 if (x.load (memory_order_relaxed) == 10)
      {
        assert (y.load(memory_order_relaxed) == 20) /* assert A */
        y.store (10, memory_order_relaxed)
      }
 /*Thread 3*/
     if (y.load (memory_order_relaxed) == 10)
        assert (x.load(memory_order_relaxed) == 10) /* assert B */

To me assert B should never fail, since x must be 10 and y=10 because of thread 2 has conditioned on this.

But the website says either assert in this example can actually FAIL.

Solution

I find it much easier to understand atomics with some knowledge of what might be causing them, so here's some background knowledge. Know that these concepts are in no way stated in the C++ language itself, but is some of the possible reasons why things are the way they are.

Compiler reordering

Compilers, often when optimizing, will choose to refactor the program as long as its effects are the same on a single threaded program. This is circumvented with the use of atomics, which will tell the compiler (among other things) that the variable might change at any moment, and that its value might be read elsewhere.

Formally, atomics ensures one thing: there will be no data races. That is, accessing the variable will not make your computer explode.

CPU reordering

CPU might reorder instructions as it is executing them, which means the instructions might get reordered on the hardware level, independent of how you wrote the program.

Caching

Finally there are effects of caches, which are faster memory that sorta contains a partial copy of the global memory. Caches are not always in sync, meaning they don't always agree on what is "correct". Different threads may not be using the same cache, and due to this, they may not agree on what values variables have.

Back to the problem

What the above surmounts to is pretty much what C++ says about the matter: unless explicitly stated otherwise, the ordering of side effects of each instruction is totally and completely unspecified. It might not even be the same viewed from different threads.

Formally, the guarantee of an ordering of side effects is called a happens-before relation. Unless a side effect happens-before another, it is not. Loosely, we just say call it synchronization.

Now, what is memory_order_relaxed? It is telling the compiler to stop meddling, but don't worry about how the CPU and cache (and possibly other things) behave. Therefore, one possibility of why you see the "impossible" assert might be

Thread 1 stores 20 to y and then 10 to x to its cache.
Thread 2 reads the new values and stores 10 to y to its cache.
Thread 3 didn't read the values from thread 1, but reads those of thread 2, and then the assert fails.

This might be completely different from what happens in reality, the point is anything can happen.

To ensure a happens-before relation between the multiple reads and writes, see Brian's answer.

Another construct that provides happens-before relations is std::mutex, which is why they are free from such insanities.