c++concurrency lock-free reference-counting

Why don't these lock-free reference counting implementations explode (or do they)?

I'm trying to wrap my head around LFRC algorithms but I keep finding examples where I assume that they work, but I can't understand why. I'm focused on copying and deletion. There are three in particular that all follow the same general pattern of (pseudo-code):

Copy {
  shared = source->shared;
  if (shared != null) {
     Atomically_Increment_Counter(shared);
  }
}

Delete {
  if (shared != null) {
     oldcounter = Atomically_Decrement_Counter(shared);
     if (oldcounter == value signifying all references released) {
        Deallocate(shared);
     }
  }
}

First is from the 2004 paper Lock-Free Reference Counting by David L. Detlefs, figure 2, page 8 (edited for formatting):

void LFRCCopy(SNode **v, SNode *w) {
   Snode *oldv = *v;
   if (w != null) 
      add_to_rc(w,1);
   *v = w;
   LFRCDestroy(oldv);
}

void LFRCDestroy(SNode *v) {
   if (v != null && add_to_rc(v, -1)==1) {
     LFRCDestroy(v->L);
     LFRCDestroy(v->R);
     delete v;
   }
}

Where add_to_rc is an atomic increment and load. The second is from the Mat implementation in opencv4 (edited for brevity):

// Line 405
Mat::Mat(const Mat& m)
    : flags(m.flags), dims(m.dims), rows(m.rows), cols(m.cols), data(m.data),
      datastart(m.datastart), dataend(m.dataend), datalimit(m.datalimit), allocator(m.allocator),
      u(m.u), size(&rows), step(0)
{
    if( u )
        CV_XADD(&u->refcount, 1);
    ...
}

// Line 551
void Mat::release()
{
    if( u && CV_XADD(&u->refcount, -1) == 1 )
        deallocate();
    u = NULL;
    ...
}

Where CV_XADD is an atomic increment and load. And the third is from the std::shared_ptr implementation that ships with libstdc++ in the most recent version of GCC (10.2) (this implementation is very complex and I've edited it down a lot for brevity):

class _Sp_counted_base {

    void _M_add_ref_copy()
    { __gnu_cxx::__atomic_add_dispatch(&_M_use_count, 1); }

    // Line 161 for full implementation
    void _M_release() noexcept
    {
       _GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
       if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
       {
          _GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
          _M_dispose();
          if (_Mutex_base<_Lp>::_S_need_barriers)
             __atomic_thread_fence (__ATOMIC_ACQ_REL);
          // editors note: weak pointer handling removed for brevity
          _M_destroy();
       }
    }

}

class __shared_count {

    __shared_count(const __shared_count& __r) noexcept 
        : _M_pi(__r._M_pi)
    {
        if (_M_pi != nullptr)
           _M_pi->_M_add_ref_copy();
    }
    
    ~__shared_count() noexcept
    {
        if (_M_pi != nullptr)
           _M_pi->_M_release();
    }

    _Sp_counted_base* _M_pi;

}

class __shared_ptr {

    __shared_ptr(const __shared_ptr&) noexcept = default;

    ~__shared_ptr() = default;

    element_type*   _M_ptr;         // Contained pointer.
    __shared_count  _M_refcount;    // Reference counter.

}

Some explanation of that last one since it's kind of indirect:

Copy: __shared_ptr default copy constructor copies _M_refcount, whose copy constructor shares the same _Sp_counted_base as the original, and atomically increments the reference counter.
Delete: __shared_ptr default destructor destroys _M_refcount, whose destructor calls _M_release on the _Sp_counted_base, which atomically decrements and possibly deallocates the reference counter.

Anyways, all this boils down to one question which is the crux of my failure to understand why these work (even that old Dr. Dobbs article and the related questions here on SO [I feel like the answer might be here but I can't see it] raised more questions than answers for me):

In the copy operation, what is preventing a race condition where another thread performs the delete operation on the last reference count (possibly via another view on the shared object) then deallocates the object between the start of the copy operation and the start of the atomic counter increment -- thus, from what I (mis)understand, causing copy to increment a deallocated counter and crash or dump core everywhere or something?

That is, referring to opencv implementation above (since examining it is actually what started my whole mini research project here):

Mat::Mat(const Mat& m)
    : flags(m.flags), dims(m.dims), rows(m.rows), cols(m.cols), data(m.data),
      datastart(m.datastart), dataend(m.dataend), datalimit(m.datalimit), allocator(m.allocator),
      u(m.u),   // <--- START OF DANGER ZONE 
      size(&rows), step(0)
{
    if( u )     
                // <--- ALMOST AT END OF DANGER ZONE 
        CV_XADD(&u->refcount, 1);
    ...
}

void Mat::release()
{
    if( u && CV_XADD(&u->refcount, -1) == 1 )
        deallocate();
    u = NULL;
    ...
}

What keeps another thread from releasing the last reference via m in the marked "danger zone" in the copy constructor, thus leaving u non-null but pointing to post-deallocation garbage?

What am I missing here? I feel like it might be obvious and maybe I've been awake too long. In particular, the fact that the libstdc++ implementation also seems to follow this pattern gives it a ton of street cred; I just can't understand the details.

Solution

Short answer, you are tired!

This is the the classic 'reference counting model'. When an object is created its reference count is initialised (classically to 1). The creating thread owns that reference. That thread may increment or decrement that count when it creates a new reference and when it evetually reaches 0 no references exist and the object is deleted. That's all single threaded but your question is about multi-threaded.

There's two additional rule for multi-threaded.

A thread must not attempt to decrement the counter unless it already owns a reference.
Also, (key) a thread must not attempt to increment the counter unless it already holds a reference.

So the race condition of another thread trying to deallocate an object while a thread is trying to increase the number of references (here in the copy) shall never happen (and must never happen).

So if some other thread is concurrently in the process of releasing a reference the count while another thread is increasing it the count shall always be >=2 because both must be holding a reference.

__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) returns the replaced value so the line if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1)==1 means 'if this thread exchanged 1 for 0 meaning "if this thread is holding the last reference".

There are a number of strategies for ensuring that no thread decrements or increments the counter unless it holds a reference. But the main one is that a thread that does hold a reference increments it before 'passing' it to another thread or simply 'surrenders' its reference to the other thread.

In a 'thread pool' where objects are placed in a queue it may be indeterminate which thread the reference is being passed to and further synchronisation (e.g. on the queue) is required to ensure that only one thread with ever 'claim's that 'shared' reference.

By "holding" I mean either "owns" or "has borrowed". Owning means the thread that owns the reference (will ensure it's eventual release) and 'borrowing' means the actual owning (that will ensure its eventual release) thread will be synchronised with the borrower and ensure it isn't released until either the other thread has (a) acquired a reference or (b) shall not access the object again.

The code snippets appear valid (I'm not familiar with the detail of all those libraries) but confirming that __exchange_and_add_dispatch returns the old valueChapter 29. Concurrency is enough to convince me that is what is going on.

However the downside of these strategies is that all code that increases or decreases the count must conform to the strategy so we don't know if code is valid without close inspection of the design and implementation of all points that reference the objects. Does each hold a reference? If they increase it do they already hold a reference?

Weak references are a subtle finesse on this model and represent a separate counter that follows the same model. Though may be (as common for std::shared_ptr using std::make_shared) allocated (new) with the main object and while the main object may be destructed before the last weak reference the memory block is deleteded when both counters reach zero and if weak references exist when the last strong reference is released the weak reference is marked to show the main object has been deleted.