I have been trying to implement a lock-free slist erase operation but I'm having obiviously problems with it. Unfortunately I realy realy need it.
To solve usual ABA cmpxchg related problems I have written tagged_ptr<> "smart pointer" class that embeds a counter to the pointer stored in std::atomic<>. The tag value is incremented each time pointer is updated via CAS in the list:
head.compare_exchange_weak(old, old(newptr))
stores newptr
with incremented tag from old
.
This allows multi-writer transactions, but it does not solve problems of updating two pointers simultaneously. (e.g implementing stack is easy with tagged_ptr<>)
See the code here. On line 256 is the erase() function:
bool erase(list_node * node) {
std::atomic<tagged_ptr<list_node>>* before;
tagged_ptr<list_node> itr, after;
for(;;) {
// Find previous (or head) before-node-ptr
before = &head;
itr = before->load(std::memory_order_acquire);
while(itr) {
if(itr.get() == node) {
break;
} else if(itr.is_void()) {
// Thread interfered iteration.
before = &head;
itr = before->load(std::memory_order_acquire);
} else {
// Access next ptr
before = &itr->next;
itr = before->load(std::memory_order_acquire);
}
}
after = node->next.load(std::memory_order_acquire);
if(after.is_void() || !itr) {
return false;
}
// Point before-ptr to after. (set head or previous node's next ptr)
if(before->compare_exchange_strong(itr, itr(after))) {
// Set node->next to invalid ptr.
// list iterators will see it and restart their operation.
while(!node->next.compare_exchange_weak(after, after().set_void()))
;
return true;
}
// If *before changed while trying to update it to after, retry search.
}
}
In the test code two threads push nodes into list simultaneously and two threads search a random node by data and try erase them. bug I'm having is:
I've some doubt concerning your tagged_ptr
implementation.
Also, I've some doubt concerning this part of the code:
} else if(itr.is_void()) {
// Thread interfered iteration.
before = &head;
itr = before->load(std::memory_order_acquire);
Let's say a thread removed the last node (you had 1 node in the list and both thread calls erase). The remaining thread will query the head pointer, it's void. You'll enter an infinite loop with this part of the code since it's in a while(itr)
loop.
This part is also not atomic:
// Point before-ptr to after. (set head or previous node's next ptr)
if(before->compare_exchange_strong(itr, itr(after))) {
// Set node->next to invalid ptr.
// list iterators will see it and restart their operation.
while(!node->next.compare_exchange_weak(after, after().set_void()))
;
return true;
}
If before
is modified by the first CAS, your node
is a unattached pointer still pointing to the list. Yet another thread can have its before
set to this node
and modify it and returns.
Honestly, if your list is cyclic, it's not that hard to debug, just break under a debugger and follow the list. You'll see when it loops and they you can figure out how it did it. You can also use valgrind for this.
The tagged_ptr
class is hard to grasp, with the "set_void()" method that's setting the internal ptr
to 0xFF..F
yet the boolean test in while(itr) will return true if it's "void". I guess the name should be invalid and not void and it should return false in the bool operator if it is (not true). If the itr becomes "void" (it's possible in the code above as far as I understand) the while(itr)
will loop indefinitely.
For example, let say you had:
Head:A -> B -> C
Then after some thread removal you get
Thread 2 removing C : Head:A, before = &B on first iteration, exiting the while(itr) loop since itr == C (scheduled here)
Thread 1 removing B : Head:A->C and B->C (scheduled just before line 286 of your example)
Thread 2 resume, and will modify B to B->null (line 283)
and then C->null to C->yourVoid (line 286, then it's scheduled)
Then, thread 1 update B->next to B->yourVoid (useless here for understanding the issue)
You now have A->C->yourVoid
Whenever iterating here, you'll have an infinite loop since when itr search arrives on C, the next step is to be a "void" and restarting the iteration from head does not solve anything here, it'll give the same result, the list is broken.