c++multithreading concurrency atomic lock-free

Lock free linked list insert

Suppose there's a multithreaded application where a single thread is inserting elements into a circular linked list while a number of working threads are walking through this list carring out the actual processing.

Say the node type is similar to this:

struct Node
{
    // ...
    std::atomic< Node * > next;
};

And in the method that perform insertion, there's the following snippet:

auto newNode = new Node( ); // (A)

newNode->next.store( previousNode->next.load( std:memory_order_relaxed ) ,
    std::memory_order_relaxed ); // (B)

previousNode->next.store( newNode , std::memory_order_relaxed ); // (C)

where previousNode has already been determined to be previous to newNode in the list.

The worker threads walk through the list in manner similar to this:

// ...
while ( true )
{
    ProcessNode( * currentNode );
    currentNode = currentNode.next.load( std::memory_order_relaxed );
}

There's no problem that a node just created in line (A) be skipped by the worker threads until its previous node is updated in (C).

Is there any issue with such a design? I'm concerned that in the assembly level the code genereated for (B) and (C) could be something like this:

LOAD( R1 , previousNode->next ) // (1) loads previousNode->next into register R1
WRITE( newNode->next , R1 ) // (2) writes R1 to newNode->next
WRITE( previousNode->next , newNode ) // (3) writes newNode to previousNode->next

And then some optimization could reorder it to:

LOAD( R1 , previousNode->next ) // (1)
WRITE( previousNode->next , newNode ) // (3)
WRITE( newNode->next , R1 ) // (2)

and that can break the worker thread, for it can now access newNode before its next member is initialized.

Is this a legitimate concern? What the standard says about this?

Solution

You have a legitimate concern.

Exactly as you say, a compiler could legally re-order your store to this:

auto temp = previousNode->next.load( std:memory_order_relaxed )
previousNode->next.store( newNode , std::memory_order_relaxed ); // (C)
newNode->next.store(  temp, std::memory_order_relaxed ); // (B)

You have now inserted your node before its values were initialized! Whether or not this happens is the wrong question. This is perfectly legal for the compiler to do.

Here's an example of how a weakly-ordered CPU could do the same thing:

auto temp = previousNode->next.load( std:memory_order_acquire );
// previousNode->next is now hot in cache

newNode->next.store( temp, std::memory_order_release); // (B)
// Suppose newNode is in the cache, but newNode->next is a cache miss

previousNode->next.store( newNode , std::memory_order_release ); // (C)
// while waiting for cache update of newNode->next, get other work done.
// Write newNode into previousNode->next, which was pulled into the cache in the 1st line.

This won't happen on x86 because it has total store order. ARM, though... you have once again inserted your node before its values were initialized.

Better stick with acquire/release.

auto temp = previousNode->next.load( std:memory_order_acquire );
newNode->next.store( temp, std::memory_order_release); // (B)
previousNode->next.store( newNode , std::memory_order_release ); // (C)

The relavent release is line C, because it keeps line B from being moved after it. Line B has a data dependency on line 1, so realistically, it's not going to be re-ordered. But use acquire for line 1 and release for that line B anyway because it's semantically correct, it won't hurt anything, and it might prevent some obscure system or future optimization from breaking your code.