c++multithreading thread-safety volatile

Thread safe access to shared data - read/write actually happens and no reordering takes place

From here: https://stackoverflow.com/a/2485177/462608

For thread-safe accesses to shared data, we need a guarantee that
the read/write actually happens (that the compiler won't just store the value in a register instead and defer updating main memory until much later)
that no reordering takes place. Assume that we use a volatile variable as a flag to indicate whether or not some data is ready to be read. In our code, we simply set the flag after preparing the data, so all looks fine. But what if the instructions are reordered so the flag is set first?

In which cases does compiler stores the value in a register and defers updating main memory? [with respect to the above quote]
What is the "re-ordering" that the above quote is talking about? In what cases does it happen?

Solution

Q: In which cases does compiler stores the value in a register and defers updating main memory?

A: (This is a broad and open-ended question which is perhaps not very well suited to the stackoverflow format.) The short answer is that whenever the semantics of the source language (C++ per your tags) allow it and the compiler thinks it's profitable.

Q: What is the "re-ordering" that the above quote is talking about?

A: That the compiler and/or CPU issues load and store instructions in an order different from the one dictated by a 1-to-1 translation of the original program source.

Q: In what cases does it happen?

A: For the compiler, similarly to the answer of the first question, anytime the original program semantics allow it and the compiler thinks it's profitable. For the CPU it's similar, the CPU can, depending on the architecture memory model, typically reorder memory accesses as long as the original (single-threaded!) result is identical. For instance, both the compiler and the CPU can try to hoist loads as early as possible, since load latency is often critical for performance.

In order to enforce stricter ordering, e.g. for implementing synchronization primitives, CPU's offer various atomic and/or fence instructions, and compilers may, depending on the compiler and source language, provide ways to prohibit reordering.