I'm implementing a thread-safe "lazy-synchronized" Set as a linked list of nodes connected by shared_ptr's. The algorithm is from "The Art of Multiprocessor Programming". I'm adding an is_empty()
function that needs to be linearizable with the existing functions: contains(), add(), remove()
. In the code below, you can see remove
is a 2 step process. First it "lazy" marks the node by setting marked = nullptr
, then it physically moves the linked list next
pointers.
Modified classes to support is_empty()
template <class T>
class LazySet : public Set<T> {
public:
LazySet ();
bool contains (const T&) const;
bool is_empty () const;
bool add (const T&);
bool remove (const T&);
private:
bool validate(const std::shared_ptr<Node>&, const std::shared_ptr<Node>&);
class Node;
std::shared_ptr<Node> head;
std::shared_ptr<bool> counter; //note: type is unimportant, will never change true/fase
};
template <class T>
class LazySet<T>::Node {
public:
Node ();
Node (const T&);
T key;
std::shared_ptr<bool> marked; //assume initialized to = LazySet.counter
// nullptr means it's marked; otherwise unmarked
std::shared_ptr<Node> next;
std::mutex mtx;
};
Relevant modified methods to support is_empty
template <class T>
bool LazySet<T>::remove(const T& k) {
std::shared_ptr<Node> pred;
std::shared_ptr<Node> curr;
while (true) {
pred = head;
curr = atomic_load(&(head->next));
//Find window where key should be in sorted list
while ((curr) && (curr->key < k)) {
pred = atomic_load(&curr);
curr = atomic_load(&(curr->next));
}
//Aquire locks on the window, left to right locking prevents deadlock
(pred->mtx).lock();
if (curr) { //only lock if not nullptr
(curr->mtx).lock();
}
//Ensure window didn't change before locking, and then remove
if (validate(pred, curr)) {
if (!curr) { //key doesn't exist, do nothing
//## unimportant ##
} else { //key exists, remove it
atomic_store(&(curr->marked), nullptr); //logical "lazy" remove
atomic_store(&(pred->next), curr->next) //physically remove
(curr->mtx).unlock();
(pred->mtx).unlock();
return true;
}
} else {
//## unlock and loop again ##
}
}
}
template <class T>
bool LazySet<T>::contains(const T& k) const {
std::shared_ptr<Node> curr;
curr = atomic_load(&(head->next));
//Find window where key should be in sorted list
while ((curr) && (curr->key < k)) {
curr = atomic_load(&(curr->next));
}
//Check if key exists in window
if (curr) {
if (curr->key == k) { //key exists, unless marked
return (atomic_load(&(curr->marked)) != nullptr);
} else { //doesn't exist
return false;
}
} else { //doesn't exist
return false;
}
}
Node.marked
was originally a plain bool, and LazySet.counter
didn't exist. The choice to make them shared_ptrs was to be able to be able to atomically modify both a counter on the number of nodes and the lazy removal mark on the nodes. Modifying both simultaneously in remove()
is necessary for is_empty()
to be linearizable with contains()
. (It can't be a separate bool mark and int counter without a double wide CAS or something.) I hope to implement the counter with shared_ptr's use_count()
function, but in multithreaded contexts it's only an approximation due to relaxed_memory_order
.
I know standalone fences are usually bad practice, and I'm not too familiar with using them. But if I implemented is_empty
like below, would the fences ensure it's no longer an approximation, but an exact value for a reliable counter?
template <class T>
bool LazySet<T>::is_empty() const {
// ## SOME FULL MEMORY BARRIER
if (counter.use_count() == 1) {
// ## SOME FULL MEMORY BARRIER
return true
}
// ## SOME FULL MEMORY BARRIER
return false
}
I only ask because LWG Issue 2776 says:
We can't make
use_count()
reliable without adding substantially more fencing.
// ## SOME FULL MEMORY BARRIER
if (counter.use_count() == 1) {
// ## SOME FULL MEMORY BARRIER
With an acquire fence before, you can make sure you can make sure you "see" the results of all the resets (including during assignment and destruction) of all owners in other threads. The acquire fence gives all following relaxed operation acquire semantics, preventing them from "fetching values in the future" (which is semantic insanity in any case and probably makes all programs formally UB).
(There is no meaningful fence that you could put after the call.)