Even for a simple 2-thread communication example, I have difficulty to express this in the C11 atomic and memory_fence style to obtain proper memory ordering:
shared data:
volatile int flag, bucket;
producer thread:
while (true) {
int value = producer_work();
while (atomic_load_explicit(&flag, memory_order_acquire))
; // busy wait
bucket = value;
atomic_store_explicit(&flag, 1, memory_order_release);
}
consumer thread:
while (true) {
while (!atomic_load_explicit(&flag, memory_order_acquire))
; // busy wait
int data = bucket;
atomic_thread_fence(/* memory_order ??? */);
atomic_store_explicit(&flag, 0, memory_order_release);
consumer_work(data);
}
As far as I understand, above code would properly order the store-in-bucket -> flag-store -> flag-load -> load-from-bucket. However, I think that there remains a race condition between load-from-bucket and re-write the bucket again with new data. To force an order following the bucket-read, I guess I would need an explicit atomic_thread_fence()
between the bucket read and the following atomic_store. Unfortunately, there seems to be no memory_order
argument to enforce anything on preceding loads, not even the memory_order_seq_cst
.
A really dirty solution could be to re-assign bucket
in the consumer thread with a dummy value: that contradicts the consumer read-only notion.
In the older C99/GCC world I could use the traditional __sync_synchronize()
which I believe would be strong enough.
What would be the nicer C11-style solution to synchronize this so-called anti-dependency?
(Of course I am aware that I should better avoid such low-level coding and use available higher-level constructs, but I would like to understand...)
To force an order following the bucket-read, I guess I would need an explicit atomic_thread_fence() between the bucket read and the following atomic_store.
I do not believe the atomic_thread_fence()
call is necessary: the flag update has release semantics, preventing any preceding load or store operations from being reordered across it. See the formal definition by Herb Sutter:
A write-release executes after all reads and writes by the same thread that precede it in program order.
This should prevent the read of bucket
from being reordered to occur after the flag
update, regardless of where the compiler chooses to store data
.
That brings me to your comment about another answer:
The
volatile
ensures that there are ld/st operations generated, which can subsequently be ordered with fences. However, data is a local variable, not volatile. The compiler will probably put it in register, avoiding a store operation. That leaves the load from bucket to be ordered with the subsequent reset of flag.
It would seem that is not an issue if the bucket
read cannot be reordered past the flag
write-release, so volatile
should not be necessary (though it probably doesn't hurt to have it, either). It's also unnecessary because most function calls (in this case, atomic_store_explicit(&flag)
) serve as compile-time memory barriers. The compiler would not reorder the read of a global variable past a non-inlined function call because that function could modify the same variable.
I would also agree with @MaximYegorushkin that you could improve your busy-waiting with pause
instructions when targeting compatible architectures. GCC and ICC both appear to have _mm_pause(void)
intrinsics (probably equivalent to __asm__ ("pause;")
).