c++c++11 concurrency language-lawyer memory-model

C++11: What prevents stores from lifting past the start of a lock's critical section?

My understanding is that a spinlock can be implemented using C++11 atomics with an acquire-CAS on lock and a release-store on unlock, something like this:

class SpinLock {
 public:
  void Lock() {
    while (l_.test_and_set(std::memory_order_acquire));
  }

  void Unlock() {
    l_.clear(std::memory_order_release);
  }

 private:
  std::atomic_flag l_ = ATOMIC_FLAG_INIT;
};

Consider its use in a function that acquires a lock and then does a blind write to some shared location:

int g_some_int_;

void BlindWrite(int val) {
  static SpinLock lock_;

  lock_.Lock();
  g_some_int_ = val;
  lock_.Unlock();
}

I'm interested in how the compiler is restricted in translating this to generated assembly code.

I understand why the write to g_some_int_ can't migrate past the end of the critical section in the compiler's output -- that would mean that the write isn't guaranteed to be seen by the next thread that acquires the lock, which is guaranteed by the release/acquire ordering.

But what prevents the compiler from moving it to before the acquire-CAS of the lock flag? I thought that compilers were allowed to re-order writes to different memory locations when generating assembly code. Is there some special rule that prevents a write from being re-ordered to before an atomic store that precedes it in program order?

I'm looking for a language lawyer answer here, preferably covering std::atomic as well as std::atomic_flag.

Edit to include something from the comments which maybe asks the question more clearly. The heart of the question is: which part of the standard says that the abstract machine must observe l_ being false before it writes to g_some_int_?

I suspect the answer is either "writes can't be lifted above potentially infinite loops" or "writes can't be lifted above atomic writes". Perhaps it's even "you're wrong that writes can ever be re-ordered at all". But I'm looking for a specific reference in the standard.

Solution

Suppose you have two functions that use your spinlock:

SpinLock sl;
int global_int=0;

int read(){
    sl.Lock();
    int res=global_int;
    sl.Unlock();
    return res;
}

void write(int val){
    sl.Lock();
    global_int=val;
    sl.Unlock();
}

If two calls to BlindWrite happen concurrently on separate threads, then one (call it A) will acquire the lock; the other (B) will spin in the loop in Lock.

A then writes to g_some_int, and calls Unlock, which contains a call to clear which is a store-release. The write is sequenced-before the call to clear since it is in the same thread.

B then wakes in Lock, and this time the test_and_set call returns false. This is a load-acquire that read the value stored by the call to clear, so the call to clear synchronizes-with this call to test_and_set.

The test_and_set call in Lock is a load-acquire, and is sequenced-before the write to g_some_int in BlindWrite, since it is in the same thread.

Since the first write in thread A is sequenced-before the call to clear, which synchronizes-with the call to test_and_set in thread B, which is in turn sequenced-before the write in thread B, the write in thread A happens-before the write in thread B.

If the compiler hoisted the write to g_some_int above the call to Lock, then it would be possible for the write from thread B to occur before the write in thread A. This would violate the happens-before ordering, so is not permitted.

In general, this means that the compiler cannot hoist anything above a load-acquire, since it may violate the happens-before ordering.