My understanding is that a spinlock can be implemented using C++11 atomics with an acquire-CAS on lock and a release-store on unlock, something like this:
class SpinLock {
public:
void Lock() {
while (l_.test_and_set(std::memory_order_acquire));
}
void Unlock() {
l_.clear(std::memory_order_release);
}
private:
std::atomic_flag l_ = ATOMIC_FLAG_INIT;
};
Consider its use in a function that acquires a lock and then does a blind write to some shared location:
int g_some_int_;
void BlindWrite(int val) {
static SpinLock lock_;
lock_.Lock();
g_some_int_ = val;
lock_.Unlock();
}
I'm interested in how the compiler is restricted in translating this to generated assembly code.
I understand why the write to g_some_int_
can't migrate past the end of the
critical section in the compiler's output -- that would mean that the write
isn't guaranteed to be seen by the next thread that acquires the lock, which is
guaranteed by the release/acquire ordering.
But what prevents the compiler from moving it to before the acquire-CAS of the lock flag? I thought that compilers were allowed to re-order writes to different memory locations when generating assembly code. Is there some special rule that prevents a write from being re-ordered to before an atomic store that precedes it in program order?
I'm looking for a language lawyer answer here, preferably covering std::atomic
as well as std::atomic_flag
.
Edit to include something from the comments which maybe asks the question
more clearly. The heart of the question is: which part of the standard says that
the abstract machine must observe l_
being false
before it writes to
g_some_int_
?
I suspect the answer is either "writes can't be lifted above potentially infinite loops" or "writes can't be lifted above atomic writes". Perhaps it's even "you're wrong that writes can ever be re-ordered at all". But I'm looking for a specific reference in the standard.
Suppose you have two functions that use your spinlock:
SpinLock sl;
int global_int=0;
int read(){
sl.Lock();
int res=global_int;
sl.Unlock();
return res;
}
void write(int val){
sl.Lock();
global_int=val;
sl.Unlock();
}
If two calls to BlindWrite
happen concurrently on separate threads, then one (call it A) will acquire the lock; the other (B) will spin in the loop in Lock
.
A then writes to g_some_int
, and calls Unlock
, which contains a call to clear
which is a store-release. The write is sequenced-before the call to clear
since it is in the same thread.
B then wakes in Lock
, and this time the test_and_set
call returns false
. This is a load-acquire that read the value stored by the call to clear
, so the call to clear
synchronizes-with this call to test_and_set
.
The test_and_set
call in Lock
is a load-acquire, and is sequenced-before the write to g_some_int
in BlindWrite
, since it is in the same thread.
Since the first write in thread A is sequenced-before the call to clear
, which synchronizes-with the call to test_and_set
in thread B, which is in turn sequenced-before the write in thread B, the write in thread A happens-before the write in thread B.
If the compiler hoisted the write to g_some_int
above the call to Lock
, then it would be possible for the write from thread B to occur before the write in thread A. This would violate the happens-before ordering, so is not permitted.
In general, this means that the compiler cannot hoist anything above a load-acquire, since it may violate the happens-before ordering.