Lock-free Tree Node Assignment in Preallocated Vector of Nodes

I'm currently trying to multithread the creation of a tree where the class which holds the tree preallocates a std::vector of Nodes in a chunk of a particular size (conceptually the size is arbitrary). Additional chunks of Node are created only when necessary, this is because the tree quickly becomes very large and I wanted to avoid constantly using the new operator for time efficiency.

These vectors of Nodes are defined as: std::vector< std::vector< Node > > nodes

head keeps track of the position within the inner vector and chunkCount keeps track of which outer vector is being used currently.

The vectors are resized in the constructor as:

nodes.resize( 1 );
nodes[chunkCount].resize( CHUNK_SIZE );

And a simplified version of Node would be:

typedef struct Node {
    int val;
    Node* subnodes[5];
} Node;

Creation of a new Node goes as follows:

void TreeClass::createNode( Node* node, short index, int val )
{
    omp_set_lock( &treeLock ); // treeLock belongs to TreeClass
    head++;
    if( head == CHUNK_SIZE ) {
        std::vector< Node > tempNodeVec( CHUNK_SIZE );
        nodes.push_back( tempNodeVec );
        chunkCount++;
        head = 0;
    }
    node->subnodes[index] = &( nodes[chunkCount][head] );
    omp_unset_lock( &treeLock );

    node->subnodes[index]->val = val;
}

This works perfectly fine. However my concern is that all but one thread is blocked when creating a node and this happens frequently so a lot of time is spent blocked or locking/unlocking treeLock, hence my desire to make this function lock-free but my attempts so far have failed.

Altering head and chunkCount are easy enough without the lock using #pragma omp atomic (or by using std::atomic< int >s), but it is the logic behind assuring that the if( ... ) statement is executed only once and before any thread proceeds to assign the address of the child, i.e. to assure they use the correct/updated chunkCount and head.

One thought from reading about lock-free algorithms would be to use std::atomic< Node* > subnodes[5] in Node and perform a CAS operation waiting for the correctly updated head and chunkCnt but without knowing what would be "correct", how can I know what I'm waiting for?

Another (naive) idea was:

int myHead;
if( ++head == CHUNK_SIZE ) {
    std::vector< Node > tempNodeVec( CHUNK_SIZE );
    nodes.push_back( tempNodeVec );
    chunkCount++;
    myhead = head = 0;
} else {
    myhead = head;
    while( head > CHUNK_SIZE )
        myHead = ++head;
}
node->subnodes[index] = &( nodes[chunkCount][myHead] );

The idea being that only one thread goes into the if( ... ) and until it has set head to 0, the rest will be stuck in the else { ... } but I can already see the many problems with this approach.

Any help with this would be appreciated.

Solution

I would suggest you use a thread-private memory pool. To that end you can use an annotation like:

#pragma omp threadprivate(nodes)

Not only is that far simpler than trying to protect access to the shared memory pool, it will also likely be better for performance due to data locality.

Note: going lockfree based on atomics with your solution is impossible because nodes[chunkCount] - which is needed for every allocation - must always be protected from nodes.push_back.

Fully featured memory pooling is more complex, but as a little step, you can try using std::deque. It provides what you need without messing with two vectors - inserting elements in constant time while not invalidating pointers of existing elements. You have less control but it's a good start.