Search code examples
c++multithreadingreference-countingmemory-modelstdatomic

Release-Consume ordering for reference counting


Consider the following simple reference counting functions (to be used with boost::intrusive_ptr):

class Foo {
    // ...

    std::atomic<std::size_t> refCount_{0};

    friend void intrusive_ptr_add_ref(Foo* ptr)
    {
        ++ptr->refCount_;  // ❶
    }

    friend void intrusive_ptr_release(Foo* ptr)
    {
        if (--ptr->refCount_ == 0) {  // ❷
            delete ptr;
        }
    }
};

I'm still learning memory ordering, and I'm wondering if the default memory ordering for fetch_add/sub (memory_order_seq_cst) is too strict in this case. Since the only ordering I want to ensure is between the ❶ and ❷, I think we can replace ❶ with

ptr->refCount_.fetch_add(1, std::memory_order_release);

and ❷ with

if (ptr->refCount_.fetch_sub(1, std::memory_order_consume) == 1) {

But memory ordering is still new and subtle to me, so I'm not sure if this will work correctly. Did I miss anything?


Solution

  • Consulting the libc++ implementation of std::shared_ptr, you might want memory_order_relaxed for increment and memory_order_acq_rel for the decrement. Rationalizing this usage:

    If the number increases, then all that matters is its consistency. The current thread is already sure that it's greater than zero. Other threads are unsynchronized so they will see the update at an indeterminate time before the next atomic modification, and perhaps at a time inconsistent with updates of other variables.

    If the number decreases, then you need to be sure that the current thread has already finished modifying it. All updates from other threads must be visible. The current decrement must be visible to the next one. Otherwise, if the counter raced ahead of the object it was guarding, the object could be destroyed prematurely.

    Cppreference has a nice page on memory ordering. It includes this note:

    The specification of release-consume ordering is being revised, and the use of memory_order_consume is temporarily discouraged.

    It also insinuates that no current compilers or CPUs implement consume; it's effectively the same as acquire.