Search code examples
c++atomiclock-free

How to avoid race condition when refcounting without a mutex?


I'm trying to figure out how to avoid the race condition in the following code, with thread A acquiring the datablock, then thread B Releasing/deleting it, and then thread A AddRefing it. Is it possible to fix this without a mutex? I think that it's possible to fix this with atomic_thread_fence, but I really have no idea how it would apply to this situation.

#include <atomic>

class Foo
{
    std::atomic<Datablock*> datablock

public:
    Datablock * get_datablock()
    {
        Datablock * datablock = m_datablock.load();
        if(datablock) datablock->AddRef();
        return datablock;
    }

    void set_datablock(Datablock* datablock)
    {
        datablock = m_datablock.exchange(datablock);
        if(datablock) datablock->Release();
    }
};

Solution

  • I think that it's possible to fix this with atomic_thread_fence

    atomic_thread_fence is only useful if you're using memory orderings weaker than the default seq_cst (see Jeff Preshing's article about C++11 fences for more about fences and memory ordering. Jeff Preshing's articles are excellent; definitely read most of those while you're trying to grok lockless programming).

    atomic_thread_fence can only limit reordering of how the current thread's memory operations become globally visible. It doesn't itself wait for something in other threads.


    When you try to add a reference, be prepared to find that it had already dropped to zero. i.e. AddRef() can fail, if you were too late and another thread has already started to destroy the refcounted object.

    So the implementation of AddRef would do something like

    bool AddRef() {
        int old_count = m_refcount;
    
        do {
            if (old_count <= 0) {
                // we were too late; refcount had already dropped to zero
                // so another thread is already destroying the data block
                return false;
            }
        }while( !m_refcount.compare_exchange_weak(old_count, old_count+1) );
    
        return true;
    }
    

    We're using a CAS loop as a conditional fetch_add instead of doing a fetch_add and then undoing it if the old value was too low. The latter would require extra work to avoid a race condition if two threads were incrementing at once. (The 2nd thread would see and old_count of 1 and think it was ok.) You could maybe work around that by having the Release function set the refcount to a large negative number before starting to destroy a block, but this is easy to verify and a CAS that almost always succeeds on the first try is barely any slower than an actual fetch_add. The separate atomic load is nearly free compared to CAS, especially on x86. (You could use memory_order_relaxed for it to make it near-free on weakly-ordered architectures too.)


    Note that your refcount can't be part of the datablock that you delete when the refcount reaches zero. If you did that, a thread that called get_datablock and did m_datablock.load(), then slept, then dereferenced that pointer with datablock->AddRef() could segfault (or cause other undefined behaviour) if the pointed-to memory was deleted by another thread while it was asleep.


    This answer is doesn't solve the whole problem (of managing refcount blocks while still allowing an exchange in your set_datablock API. I'm not sure that API design really works.

    It's also not a complete working atomic_shared_pointer implementation.

    If you want to know how that works, look at its documents, or hopefully someone's written a post about how it's implemented. Open-source library implementations of it exist, but are probably pretty hard to read.