Please consider the following synchronization problem:
initially:
version = 0 // atomic variable
data = 0 // normal variable (there could be many)
Thread A:
version++
data = 3
Thread B:
d = data
v = version
assert(d != 3 || v == 1)
Basically, if thread B sees data = 3
then it must also see version++
.
What's the weakest memory order and synchronization we must impose so that the assertion in thread B is always satisfied?
If I understand C++ memory_order correctly, the release-acquire ordering won't do because that guarantees that operations BEFORE version++
, in thread A, will be seen by the operations AFTER v = version
, in thread B.
Acquire and release fences also work in the same directions, but are more general.
As I said, I need the other direction: B sees data = 3
implies B sees version = 1
.
I'm using this "versioning approach" to avoid locks as much as possible in a data structure I'm designing. When I see something has changed, I step back, read the new version
and try again.
I'm trying to keep my code as portable as possible, but I'm targeting x86-64 CPUs.
You might be looking for a SeqLock, as long as your data doesn't include pointers. (If it does, then you might need something more like RCU to protect readers that might load a pointer, stall / sleep for a while, then deref that pointer much later.)
You can use the SeqLock sequence counter as the version number. (version = tmp_counter >> 1
since you need two increments per write of the payload to let readers detect tearing when reading the non-atomic data
. And to make sure they see the data
that goes with this sequence number. Make sure you don't read the atomic counter a 3rd time; use the local tmp that you read it into to verify match before/after copying data
.)
Readers will have to retry if they happen to attempt a read while data
is being modified. But it's non-atomic, so there's no way if thread B sees data = 3
can ever be part of what creates synchronization; it can only be something you see as a result of synchronizing with a version number from the writer.
See:
Implementing 64 bit atomic counter with 32 bit atomics - my attempt at a SeqLock in C++, with lots of comments. It's a bit of a hack because ISO C++'s data-race UB rules are overly strict; a SeqLock relies on detecting possible tearing and not using torn data, rather than avoiding concurrent access entirely. That's fine on a machine without hardware race detection so that doesn't fault (like all real CPUs), but C++ still calls that UB, even with volatile
(although that puts it more into implementation-defined territory). In practice it's fine.
GCC reordering up across load with `memory_order_seq_cst`. Is this allowed? - A GCC bug fixed in 8.1 that could break a seqlock implementation.
If you have multiple writers, you can use the sequence-counter itself as a spinlock for mutual exclusion between writers. e.g. using an atomic_fetch_or
or CAS to attempt to set the low bit to make the counter odd. (tmp = seq.fetch_or(1, std::memory_order_acq_rel);
, hopefully compiling to x86 lock bts
). If it previously didn't have the low bit set, this writer won the race, but if it did then you have to try again.
But with only a single writer, you don't need to RMW the atomic sequence counter, just store new values (ordered with writes to the payload), so you can either keep a local copy of it, or just do a relaxed load of it, and store tmp+1
and tmp+2
.