Do volatile and mutex ensure memory ordering in C++?

Suppose I have two variables:

volatile int a = 0;
int b = 0;

they are shared between two threads. Now in first thread I modify these variables in next order:

a = 1;
b = 2;

in second thread I do:

while (true) {
    if (b == 2)
        assert(a == 1);
}

Is there a guarantee that second thread never fails? Meaning that second thread reads-out written values of a and b in same order that they were written by first thread?

As you can see I made a volatile and b non-volatile. So my question is if volatile modifier makes any guarantee on order of memory writes? And if I make b also volatile will it improve situation?

Or the only way to guarantee order is to use std::atomic<int> for both a and b?

What about std::mutex? If I protect both variables by single shared mutex on both threads and use non-volatile variables will it help on memory ordering? I.e. if I do next (both a and b are non-volatile):

int a = 0, b = 0; // shared
std::mutex m; // shared
// .... In Thread 1 ....
{
    std::unique_lock<std::mutex> l(m);
    a = 1; b = 2;
}
// .... In Thread 2 ....
while (true) {
    std::unique_lock<std::mutex> l(m);
    assert(a == 0 && b == 0 || a == 1 && b == 2);
}

Does above solution of using mutex for non-volatile a and b variables guarantee that assertion never fails, meaning that either a and b are both 0 or set to correct values 1 and 2 same time? Can it happen sometimes that after releasing mutex a and b can be not 1 and 2 for other threads and CPU cores? For example a writing of a is delayed then other core sees a equal to 0 and b equal to 2, can such happen?

I.e. does mutex guarantee memory order and caches propagation between cores? Maybe acquiring/releasing mutex flushes caches or uses some other memory-ordering-enforsing techniques?

Or I have to use std::atomic for all shared variables?

Solution

Is there a guarantee that second thread never fails? Meaning that second thread reads-out written values of a and b in same order that they were written by first thread?

No, there is no guarantee of anything at all, in fact. Unsynchronized writing of (non-atomic) variables from one thread and reading them from another invokes undefined behavior, meaning that as far as the compiler is concerned, anything can happen, because the program is broken.

So my question is if volatile modifier makes any guarantee on order of memory writes?

There are two kinds of re-ordering you have to watch out for when dealing with multiple threads:

Re-ordering of instructions at compile-time, by your compiler's optimizer. (e.g. it might change your code to b = 2; a = 1; as part of making your program more efficient, as it is allowed to do under the "as-if" rule)
On-the-fly re-ordering of the generated opcodes at run-time, by the CPU's instruction decoder (also for performance reasons).

The volatile keyword can help you with type (1), but it can't (or at least doesn't) do anything about type (2), and therefore it ends up being insufficient for use in making multithreaded programs work correctly. volatile also doesn't help you at all with cache-coherency issues. For multithreading, you need stronger magic than volatile can provide (which makes sense, since volatile was never intended to be a multithreading construct -- it was intended for simpler use-cases, such as reading memory-mapped device registers)

Or the only way to guarantee order is to use std::atomic for both a and b? What about std::mutex?

Either one of those two approaches should be sufficient to obtain the write-ordering guarantee you are looking for. Only a mutex can provide a more general consistency guarantee, though (see below).

does mutex guarantee memory order and caches propagation between cores?

Yes -- as long as every thread locks the mutex before reading from or writing to shared variables (and unlocks the mutex afterwards), then every thread will see the shared variables in a coherent/consistent state. Memory-order and cache-update-propagation issues will all be handled for you by the mutex implementation (assuming the mutex implementation isn't buggy, which is a reliable assumption these days)

Or I have to use std::atomic for all shared variables?

std::atomic can work, although it only guarantees memory-write ordering; it can't help you if you also need non-trivial consistency guarantees. For example, if thread A needs to set two or more variables, and you need to guarantee that thread B either "sees" all of them set, or sees none of them set (and never sees an interim state where only some of them are set), then you'll need to use a mutex instead.