c++multithreading concurrency lock-free reentrancy

Could the following code written for a Reentrant Lock be susceptible to an instruction reordering error?

I recently came across the following code while learning about Reentrant Locks in Lock-Free Concurrency:

class ReentrantLock32
 {
  std::atomic<std::size_t> m_atomic;
  std::int32_t m_refCount;

public:
  ReentrantLock32() : m_atomic(0), m_refCount(0) {}

  void Acquire()
   {
    std::hash<std::thread::id> hasher;
    std::size_t tid = hasher(std::this_thread::get_id());

    if (m_atomic.load(std::memory_order_relaxed) != tid)
     {
       std::size_t unlockValue = 0;
       while (!m_atomic.compare_exchange_weak(
        unlockValue,
        tid,
        std::memory_order_relaxed,
        std::memory_order_relaxed))
       {
        unlockValue = 0;
        PAUSE();
       }
      }
      ++m_refCount;
      std::atomic_thread_fence(std::memory_order_acquire);
     }

  void Release() {
   std::atomic_thread_fence(std:memory_order_release);
   std::hash<std::thread::id> hasher;
   std::size_t tid = hasher(std::this_thread::get_id());
   std::size_t actual = m_atomic.load(std::memory_order_relaxed);
   assert(actual == tid);

   --m_refCount;
   if (m_refCount == 0)
    {
     m_atomic.store(0,std::memory_order_relaxed);
    }
 }
//...
}

However, I'm unsure whether the release fence call doesn't preclude the possibility of later memory operations in the thread preceding it and whether the acquire fence precludes the possibility of earlier memory operation succeeding it. If they don't, wouldn't it technically be possible that an optimisation could cause the line

   if (m_refCount == 0)

to be suceeded by a complete and successful call to Acquire() on the same thread before the call to

     m_atomic.store(0,std::memory_order_relaxed);

in which case the valid incrementation in the reordered Acquire() call would be overwritten by the delayed store() call?

When analyzing this code it also occurred to me that there might be stale data issues which lead to duplicate locks which is questioned here.

There is also another related question to clarify the potential order of memory operations for release fence calls here.

Solution

That can't happen.

The situation you mention takes place within a thread. A variable is always sequentially consistent with itself within a thread. (Otherwise it would be impossible to program.)

If, for example, m_atomic.store(0,std::memory_order_relaxed); is stuck in the store buffer, the CPU knows to look there for the load in this line: if (m_atomic.load(std::memory_order_relaxed) != tid)

Regardless of how relaxed a variable is, within a thread, optimizations aren't allowed to change the semantics of the source code. Atomics only exist to provide visibility ordering guarantees to other threads.

BTW, it's not accurate to say this:

the release fence doesn't preclude the possibility of a later operation in the thread preceding it

According to Jeff Preshing:

A release fence prevents the memory reordering of any read or write which precedes it in program order with any write which follows it in program order. https://preshing.com/20130922/acquire-and-release-fences/

This means that in theory C++ acquire/release atomic thread fences are a bit more strict than the common notion of acquire/release memory barriers, by which reordering is allowed 1-way.