c parallel-processing pthreads volatile memory-barriers

Using a flag to communicate between threads

On the Internet, there can be found many debates about the use of volatile keyword in parallel programming, sometimes with contradictory argumentation.

One of the more trustworthy discussion of this topic seems to be this article by Arch Robison. The example he is using is the task of passing a value from one thread to another:

Thread 1. computes a matrix product and gives it to Thread 2, which does something other with it. The matrix is variable M and the flag is a volatile pointer R.

Thread 1 multiplies computes a matrix product M and atomically sets R to point to M.

Thread 2 waits until R!=NULL and then uses M as a factor to compute another matrix product.

In other words, M is a message and R is a ready flag.

The author is claiming, that while declaring R as a volatile will solve the issue with propagating the change from Thread 1 to Thread 2, it makes no guarantees about what the value of M will be when this happens. And the assignments to R and M can be reordered. So we need to make both M and R volatile or use some synchronization mechanism in some library like pthreads.

My question is, how to do the following in C

1) How to share a single flag between two threads - How to atomically assign to it, make sure the other thread will see the change and test for the change in the other thread. Is the use of volatile legitimate in this case? Or can some library provide a conceptually better or faster way, probably involving memory barriers?

2) How to do the Robison's example right, so how to send the matrix M from one thread to the other and do it safely (and preferably portably with pthreads)

Solution

volatile gives you zero ordering guarantees. At compile time (and run-time on a weakly-ordered ISA), it's similar to _Atomic with memory_order_relaxed. (Assuming the variable is small enough and aligned enough to be naturally atomic.

Of course with a bool only 1 byte of it ever changes, so seeing anything other than 0 or 1 is impossible.

At runtime on strongly-ordered x86, asm loads/stores have acq/rel ordering, so if volatile happens not to reorder then it's "safe" for that build.

When to use volatile with multi threading? (never: use atomic with memory_order_relaxed if that's what you want.)

For a "data ready" flag, you actually need release / acquire semantics. https://preshing.com/20120913/acquire-and-release-semantics/

How to share a single flag between two threads - How to atomically assign to it, make sure the other thread will see the change and test for the change in the other thread.

#include <stdatomic.h>
// shared:
_Atomic bool data_ready = false;
float shared_matrix[N][N];

In producer:

   write_matrix( &shared_matrix );  // loop that fills a buffer
   atomic_store_explicit(&data_ready, true, memory_order_release);
   // data_ready = true  but with only release, not seq_cst for efficiency

In the consumer:

#include <immintrin.h>   // ifdef __x86__

void consumer() {
   while(!atomic_load_explicit(&data_ready, memory_order_acquire)) {
       _mm_pause();   // for x86 spin loops
   }
   // now safe to read matrix
}