c++multithreading shared-ptr memory-barriers reference-counting

Will full memory barriers around std::shared_ptr's use_count() make it a reliable counter?

I'm implementing a thread-safe "lazy-synchronized" Set as a linked list of nodes connected by shared_ptr's. The algorithm is from "The Art of Multiprocessor Programming". I'm adding an is_empty() function that needs to be linearizable with the existing functions: contains(), add(), remove(). In the code below, you can see remove is a 2 step process. First it "lazy" marks the node by setting marked = nullptr, then it physically moves the linked list next pointers.

Modified classes to support is_empty()

template <class T>
class LazySet : public Set<T> {
    public:
      LazySet ();
      bool contains (const T&) const;
      bool is_empty ()         const;
      bool add      (const T&);
      bool remove   (const T&);
    private:
      bool validate(const std::shared_ptr<Node>&, const std::shared_ptr<Node>&);
      class Node;
      std::shared_ptr<Node> head;
      std::shared_ptr<bool> counter; //note: type is unimportant, will never change true/fase
};

template <class T>
class LazySet<T>::Node {
    public:
      Node ();
      Node (const T&);
      T key;
      std::shared_ptr<bool> marked; //assume initialized to = LazySet.counter
                                    // nullptr means it's marked; otherwise unmarked
      std::shared_ptr<Node> next;
      std::mutex mtx;
};

Relevant modified methods to support is_empty

template <class T>
bool LazySet<T>::remove(const T& k) {
    std::shared_ptr<Node> pred;
    std::shared_ptr<Node> curr;
    while (true) {
        pred = head;
        curr = atomic_load(&(head->next));
        //Find window where key should be in sorted list
        while ((curr) && (curr->key < k)) {
            pred = atomic_load(&curr);
            curr = atomic_load(&(curr->next));
        }
        //Aquire locks on the window, left to right locking prevents deadlock
        (pred->mtx).lock();
        if (curr) { //only lock if not nullptr
            (curr->mtx).lock();
        }
        //Ensure window didn't change before locking, and then remove
        if (validate(pred, curr)) {
            if (!curr) { //key doesn't exist, do nothing
                //## unimportant ##
            } else { //key exists, remove it
                atomic_store(&(curr->marked), nullptr); //logical "lazy" remove
                atomic_store(&(pred->next), curr->next) //physically remove
                (curr->mtx).unlock();
                (pred->mtx).unlock();
                return true;
            }
        } else {
            //## unlock and loop again ##
        }
    }
}

template <class T>
bool LazySet<T>::contains(const T& k) const {
    std::shared_ptr<Node> curr;
    curr = atomic_load(&(head->next));
    //Find window where key should be in sorted list
    while ((curr) && (curr->key < k)) {
        curr = atomic_load(&(curr->next));
    }
    //Check if key exists in window
    if (curr) {
        if (curr->key == k) { //key exists, unless marked
            return (atomic_load(&(curr->marked)) != nullptr);
        } else { //doesn't exist
            return false;
        }
    } else { //doesn't exist
        return false;
    }
}

Node.marked was originally a plain bool, and LazySet.counter didn't exist. The choice to make them shared_ptrs was to be able to be able to atomically modify both a counter on the number of nodes and the lazy removal mark on the nodes. Modifying both simultaneously in remove() is necessary for is_empty() to be linearizable with contains(). (It can't be a separate bool mark and int counter without a double wide CAS or something.) I hope to implement the counter with shared_ptr's use_count() function, but in multithreaded contexts it's only an approximation due to relaxed_memory_order.

I know standalone fences are usually bad practice, and I'm not too familiar with using them. But if I implemented is_empty like below, would the fences ensure it's no longer an approximation, but an exact value for a reliable counter?

template <class T>
bool LazySet<T>::is_empty() const {
    // ## SOME FULL MEMORY BARRIER
    if (counter.use_count() == 1) {
        // ## SOME FULL MEMORY BARRIER
        return true
    }
    // ## SOME FULL MEMORY BARRIER
    return false
}

I only ask because LWG Issue 2776 says:

We can't make use_count() reliable without adding substantially more fencing.

Solution

// ## SOME FULL MEMORY BARRIER
if (counter.use_count() == 1) {
    // ## SOME FULL MEMORY BARRIER

With an acquire fence before, you can make sure you can make sure you "see" the results of all the resets (including during assignment and destruction) of all owners in other threads. The acquire fence gives all following relaxed operation acquire semantics, preventing them from "fetching values in the future" (which is semantic insanity in any case and probably makes all programs formally UB).

(There is no meaningful fence that you could put after the call.)