I am reading Joe Duffy's Concurrent Programming on Windows. At the end of the chapter "Memory Models and Lock Freedom", he gives an example of the lock free stack. I have gone through the code and there's one thing I don't understand, that is the need for m_next
field to be marked as volatile
. Because there is a full memory barrier with Interlocked.CompareExchange
right? Does anyone have an idea?
I have pasted the sample code below.
class LockFreeStack<T>
{
class Node
{
internal T m_value;
internal volatile Node m_next;
}
volatile Node m_head;
void Push(T value)
{
Node n = new Node();
n.m_value = value;
Node h;
do
{
h = m_head;
n.m_next = h;
}
while (Interlocked.CompareExchange(ref m_head, n, h) != h);
}
T Pop()
{
Node n;
do
{
n = m_head;
if (n == null) throw new Exception('stack empty');
}
while (Interlocked.CompareExchange(ref m_head, n.m_next, n) != n);
return n.m_value;
}
}
I'll give my two cents, without being 100% sure that what I am about to say is correct. Joe Duffy is a world-class expert in multithreading, but I think that in this implementation he has been overly cautious regarding the cross-thread visibility of the internal state of the LockFreeStack<T>
class. The volatility of the Node.m_next
field is redundant in my opinion. The Node
instances are mutable, but they are only mutated before they are linked in the internal linked list. After that phase they are practically immutable. That mutable phase is performed by a single thread, so there is no chance that another thread may see a stale version of a Node
instance.
That leaves only the possibility of instructions re-ordering, as a reason for declaring the Node.m_next
as volatile
. Since the n.m_next = h;
assignement is sandwiched between reading another volatile field (m_head
), and an Interlocked.CompareExchange
operation, I would assume that a possible re-ordering of the instructions that would compromise the correctness of the implementation is already prevented, but as I said I am not 100% sure.
I am pretty sure though that the implementation of the LockFreeStack<T>
class could be improved performance-wise, at the cost of becoming slightly more allocatey, by making immutable the Node
class. Nowadays (C# 9) this can be achieved simply by switching from type class
to type record
. This is how such an implementation would look like:
class LockFreeStack<T>
{
record Node(T Value, Node Next) { }
Node _head;
void Push(T value)
{
Node h = Volatile.Read(ref _head);
while (true)
{
Node n = new Node(value, h);
var original = Interlocked.CompareExchange(ref _head, n, h);
if (original == h) break;
h = original;
}
}
T Pop()
{
Node h = Volatile.Read(ref _head);
while (true)
{
if (h == null) throw new Exception("Stack empty");
var original = Interlocked.CompareExchange(ref _head, h.Next, h);
if (original == h) break;
h = original;
}
return h.Value;
}
}
Notice that the cost of volatility is incurred only once for each Push
or Pop
operation. One could even argue that the Volatile.Read
of the _head
field could be omitted altogether, since a possible stale _head
value would be corrected anyway by the first Interlocked.CompareExchange
operation, costing only an extra iteration of the while (true)
loop.