Search code examples
c++boostthread-safetyshared-ptr

Fully thread-safe shared_ptr implementation


Does anybody know of a fully thread-safe shared_ptr implementation? E.g. boost implementation of shared_ptr is thread-safe for the targets (refcounting) and also safe for simultaneous shared_ptr instance reads, but not writes or for read/write.

(see Boost docs, examples 3, 4 and 5).

Is there a shared_ptr implementation that is fully thread-safe for shared_ptr instances?

Strange that boost docs say that:

shared_ptr objects offer the same level of thread safety as built-in types.

But if you compare an ordinary pointer (built-in type) to smart_ptr, then simultaneous write of an ordinary pointer is thread-safe, but simultaneous write to a smart_ptr isn't.

EDIT: I mean a lock-free implementation on x86 architecture.

EDIT2: An example use case for such a smart pointer would be where there are a number of worker threads which update a global shared_ptr with a their current work item and a monitor thread that takes random samples of the work items. The shared-ptr would own the work item until another work item pointer is assigned to it (thereby destroying the previous work item). The monitor would get ownership of the work item (thereby preventing the work item to be destroyed) by assigning it to its own shared-ptr. It can be done with XCHG and manual deletion, but would be nice if a shared-ptr could do it.

Another example is where the global shared-ptr holds a "processor", and is assigned by some thread, and used by some other thread. When the "user" thread sees that the processor shard-ptr is NULL, it uses some alternative logic to do the processing. If it's not NULL, it prevents the processor from being destroyed by assigning it to its own shared-ptr.


Solution

  • Adding the necessary barriers for such a fully thread-safe shared_ptr implementation would likely impact performance. Consider the following race (note: pseudocode abounds):

    Thread 1: global_ptr = A;

    Thread 2: global_ptr = B;

    Thread 3: local_ptr = global_ptr;

    If we break this down into its constituent operations:

    Thread 1:

    A.refcnt++;
    tmp_ptr = exchange(global_ptr, A);
    if (!--tmp_ptr.refcnt) delete tmp_ptr;
    

    Thread 2:

    B.refcnt++;
    tmp_ptr = exchange(global_ptr, B);
    if (!--tmp_ptr.refcnt) delete tmp_ptr;
    

    Thread 3:

    local_ptr = global_ptr;
    local_ptr.refcnt++;
    

    Clearly, if thread 3 reads the pointer after A's swap, then B goes and deletes it before the reference count can be incremented, bad things will happen.

    To handle this, we need a dummy value to be used while thread 3 is doing the refcnt update: (note: compare_exchange(variable, expected, new) atomically replaces the value in variable with new if it's currently equal to new, then returns true if it did so successfully)

    Thread 1:

    A.refcnt++;
    tmp_ptr = global_ptr;
    while (tmp_ptr == BAD_PTR || !compare_exchange(global_ptr, tmp_ptr, A))
        tmp_ptr = global_ptr;
    if (!--tmp_ptr.refcnt) delete tmp_ptr;
    

    Thread 2:

    B.refcnt++;
    while (tmp_ptr == BAD_PTR || !compare_exchange(global_ptr, tmp_ptr, A))
        tmp_ptr = global_ptr;
    if (!--tmp_ptr.refcnt) delete tmp_ptr;
    

    Thread 3:

    tmp_ptr = global_ptr;
    while (tmp_ptr == BAD_PTR || !compare_exchange(global_ptr, tmp_ptr, BAD_PTR))
        tmp_ptr = global_ptr;
    local_ptr = tmp_ptr;
    local_ptr.refcnt++;
    global_ptr = tmp_ptr;
    

    You've now had to add a loop, with atomics in it in the middle of your /read/ operation. This is not a good thing - it can be extremely expensive on some CPUs. What's more, you're busy-waiting as well. You can start to get clever with futexes and whatnot - but by that point you've reinvented the lock.

    This cost, which has to be borne by every operation, and is very similar in nature to what a lock would give you anyway, is why you generally don't see such thread-safe shared_ptr implementations. If you need such a thing, I would recommend wrapping a mutex and shared_ptr into a convenience class to automate locking.