Revisiting `volatile` in C while multithreading

I have read many questions about this topic here, but still feel like the point is being missed by many answers I read.

The Question: Should variables shared between threads in pure C be marked volatile?

Disclaimer: I very much understand that volatile does not mean atomic. That is, just because my variables are marked volatile doesn't mean I don't need to worry about race conditions.

The Point: If I have a variable A which is made atomic via the use of a mutex, shouldn't A still be marked volatile to prevent the compiler from optimizing out reads to A?

Consider the following example:

mutex_t m;
static int flag = 0;

// Thread 1
void writer(void) {
  lock(m)
  flag = 1;
  unlock(m);
}

// Thread 2
void reader(void) {
  int read = 0;
  while (1) {
    lock(m);
    read = flag;
    unlock(m);

    if (read) exit(0);
  }
}

Given the functions writer and reader are executing in different threads, wouldn't it be possible for the compiler to optimize out the repeated reads to flag inside the reader function?

I feel like flag here should be made volatile?

Final Disclaimer: I understand that there probably exists a nice atomic/thread-safe type which could be used for small integral values like flag in this example. However, I am more asking this question for the situation where the data shared between two threads is a large struct.

Solution

The Question: Should variables shared between threads in pure C be marked volatile?

It depends. volatile has multiple purposes:

Either it can be used to force the compiler to generate code that updates the variable in memory during every access, as is mandatory in case of hardware registers.
Or it can be used to prevent undesired compiler optimizations to happen, including code getting omitted or re-ordered.

Now the reason why most questions on SO regarding this topic are so-so, is because the frequent question "should volatile be used in multi-threading?" is actually not a good question. The answers start to fill in the blanks and their knee-jerk reaction is to think: "Aha they are asking if volatile should be used for thread synchronization, bad idea!". People have gone as far as writing papers about how volatile does not do that.

But if we don't ask the broad question "should volatile be used in multi-threading?" but specify what actual problem we believe volatile would solve, then there are 3 entirely different and separate issues:

Should volatile be used for thread synchronization/preventing race conditions?
Should volatile be used to prevent incorrect optimizations done by the compiler, for example when it doesn't realize that a callback function may be called from outside the program?
Should volatile be used to prevent out-of-order execution and instruction re-ordering, acting as a memory barrier?

Disclaimer: I very much understand that volatile does not mean atomic. That is, just because my variables are marked volatile doesn't mean I don't need to worry about race conditions.

Well that pretty much answers 1) which everyone spends way too much energy arguing about, I don't know why. volatile does indeed not mean atomic access. In fact it might mean the opposite, if it forces a one-instruction access to a CPU register to instead get boiled down to multiple instructions accessing a memory location instead. And so volatile alone doesn't do a thing to prevent race conditions and I'm not even sure if anyone ever claimed as much, so having that debate yet again is beating the dead horse.

The Point: If I have a variable A which is made atomic via the use of a mutex, shouldn't A still be marked volatile to prevent the compiler from optimizing out reads to A?

There's a slight difference between re-entrant and thread-safe. Re-entrant means that the code is intrinsically safe against race conditions, for example by not accessing shared resources or only accessing them atomically. Thread-safe means that shared resources are protected by a mutex or equivalent means.

Mutex doesn't make the access atomic. It just blocks different threads using the same mutex to progress in execution while your thread is reserving the mutex. For a mutex to work it must also come with memory barrier behavior, preventing out of order execution or instruction re-ordering.

You may or may not get a "poor man's memory barrier" of sorts when an external function like a mutex lock is called though, because the compiler will not be able to resolve the call and therefore can't assume that it doesn't contain side effects. But that just prevents re-ordering across the function call, it doesn't help against optimizations on the static variable, which obviously cannot be changed by the external function, hence "poor man's memory barrier". Without static an external function may update any file scope variable and so it would have to be reloaded from memory after the call.

Now volatile could supposedly be the answer to prevent re-ordering/out of order execution, if the mutex access doesn't already guarantee that (it probably does). Reading the C standard strictly, volatile should (arguably) act as a memory barrier preventing any form of re-ordering across a volatile access. However, all compilers do not necessarily agree with that - details here: Is the C compiler allowed to do out-of-order optimization around an inlined function?

Given the functions writer and reader are executing in different threads, wouldn't it be possible for the compiler to optimize out the repeated reads to flag inside the reader function?

Indeed, that is perfectly possible and even likely in your example - since the variable is static it won't be updated from outside the translation unit and the compiler could therefore start making assumptions about when the variable is updated or not.

But what it boils down to is if the compiler is aware that reader and writer are callback functions, and that the compiler treats such callbacks functions as "could be executed at any time". "PC-like" compilers usually have such awareness when dealing with the common thread libraries. And threads are most often encountered in PC/hosted systems too. If the compiler is aware that a callback function may be called from something outside the program then it doesn't need you to mark the variable volatile meaning the variable may be updated from something outside the program - that's the same thing.

But various embedded systems compilers often don't have such awareness. And in the context of embedded systems we are more likely to either deal with low-level interrupts or higher level processes (RTOS). In such situations volatile is often required on shared variables to prevent incorrect optimizations. _Atomic etc wouldn't prevent that, it would only prevent race condition bugs.

So whether the variable needs to be volatile or not depends on the system, compiler and libraries. Since C is so low-level that it isn't possible to give an universal answer. Most (all?) PC systems with common thread implementations (pthreads, CreateThread etc) so that you don't need volatile. RTOS implementations may or may not do that too. Bare metal embedded systems will usually not make any guarantees at all.

Making it volatile when it doesn't need to be will hurt performance, but it will not cause bugs. Skipping it may cause bugs on certain systems.