c#multithreading concurrency volatile interlocked

Why declare a variable as volatile and use Interlocked on it at the same time?

I am reading Joe Duffy's Concurrent Programming on Windows. At the end of the chapter "Memory Models and Lock Freedom", he gives an example of the lock free stack. I have gone through the code and there's one thing I don't understand, that is the need for m_next field to be marked as volatile. Because there is a full memory barrier with Interlocked.CompareExchange right? Does anyone have an idea?

I have pasted the sample code below.

class LockFreeStack<T>
{
    class Node
    {
        internal T m_value;
        internal volatile Node m_next;
    }

    volatile Node m_head;

    void Push(T value)
    {
        Node n = new Node();
        n.m_value = value;

        Node h;
        do
        {
            h = m_head;
            n.m_next = h;
        }
        while (Interlocked.CompareExchange(ref m_head, n, h) != h);
    }

    T Pop()
    {
        Node n;
        do
        {
            n = m_head;
            if (n == null) throw new Exception('stack empty');

        }
        while (Interlocked.CompareExchange(ref m_head, n.m_next, n) != n);

        return n.m_value;
    }
}

Solution

I'll give my two cents, without being 100% sure that what I am about to say is correct. Joe Duffy is a world-class expert in multithreading, but I think that in this implementation he has been overly cautious regarding the cross-thread visibility of the internal state of the LockFreeStack<T> class. The volatility of the Node.m_next field is redundant in my opinion. The Node instances are mutable, but they are only mutated before they are linked in the internal linked list. After that phase they are practically immutable. That mutable phase is performed by a single thread, so there is no chance that another thread may see a stale version of a Node instance.

That leaves only the possibility of instructions re-ordering, as a reason for declaring the Node.m_next as volatile. Since the n.m_next = h; assignement is sandwiched between reading another volatile field (m_head), and an Interlocked.CompareExchange operation, I would assume that a possible re-ordering of the instructions that would compromise the correctness of the implementation is already prevented, but as I said I am not 100% sure.

I am pretty sure though that the implementation of the LockFreeStack<T> class could be improved performance-wise, at the cost of becoming slightly more allocatey, by making immutable the Node class. Nowadays (C# 9) this can be achieved simply by switching from type class to type record. This is how such an implementation would look like:

class LockFreeStack<T>
{
    record Node(T Value, Node Next) { }

    Node _head;

    void Push(T value)
    {
        Node h = Volatile.Read(ref _head);
        while (true)
        {
            Node n = new Node(value, h);
            var original = Interlocked.CompareExchange(ref _head, n, h);
            if (original == h) break;
            h = original;
        }
    }

    T Pop()
    {
        Node h = Volatile.Read(ref _head);
        while (true)
        {
            if (h == null) throw new Exception("Stack empty");
            var original = Interlocked.CompareExchange(ref _head, h.Next, h);
            if (original == h) break;
            h = original;
        }
        return h.Value;
    }
}

Notice that the cost of volatility is incurred only once for each Push or Pop operation. One could even argue that the Volatile.Read of the _head field could be omitted altogether, since a possible stale _head value would be corrected anyway by the first Interlocked.CompareExchange operation, costing only an extra iteration of the while (true) loop.