I'm currently trying to multithread the creation of a tree where the class which holds the tree preallocates a std::vector
of Node
s in a chunk of a particular size (conceptually the size is arbitrary). Additional chunks of Node
are created only when necessary, this is because the tree quickly becomes very large and I wanted to avoid constantly using the new
operator for time efficiency.
These vectors of Node
s are defined as: std::vector< std::vector< Node > > nodes
head
keeps track of the position within the inner vector and chunkCount
keeps track of which outer vector is being used currently.
The vectors are resized in the constructor as:
nodes.resize( 1 );
nodes[chunkCount].resize( CHUNK_SIZE );
And a simplified version of Node
would be:
typedef struct Node {
int val;
Node* subnodes[5];
} Node;
Creation of a new Node
goes as follows:
void TreeClass::createNode( Node* node, short index, int val )
{
omp_set_lock( &treeLock ); // treeLock belongs to TreeClass
head++;
if( head == CHUNK_SIZE ) {
std::vector< Node > tempNodeVec( CHUNK_SIZE );
nodes.push_back( tempNodeVec );
chunkCount++;
head = 0;
}
node->subnodes[index] = &( nodes[chunkCount][head] );
omp_unset_lock( &treeLock );
node->subnodes[index]->val = val;
}
This works perfectly fine. However my concern is that all but one thread is blocked when creating a node and this happens frequently so a lot of time is spent blocked or locking/unlocking treeLock
, hence my desire to make this function lock-free but my attempts so far have failed.
Altering head
and chunkCount
are easy enough without the lock using #pragma omp atomic
(or by using std::atomic< int >
s), but it is the logic behind assuring that the if( ... )
statement is executed only once and before any thread proceeds to assign the address of the child, i.e. to assure they use the correct/updated chunkCount
and head
.
One thought from reading about lock-free algorithms would be to use std::atomic< Node* > subnodes[5]
in Node
and perform a CAS operation waiting for the correctly updated head
and chunkCnt
but without knowing what would be "correct", how can I know what I'm waiting for?
Another (naive) idea was:
int myHead;
if( ++head == CHUNK_SIZE ) {
std::vector< Node > tempNodeVec( CHUNK_SIZE );
nodes.push_back( tempNodeVec );
chunkCount++;
myhead = head = 0;
} else {
myhead = head;
while( head > CHUNK_SIZE )
myHead = ++head;
}
node->subnodes[index] = &( nodes[chunkCount][myHead] );
The idea being that only one thread goes into the if( ... )
and until it has set head
to 0, the rest will be stuck in the else { ... }
but I can already see the many problems with this approach.
Any help with this would be appreciated.
I would suggest you use a thread-private memory pool. To that end you can use an annotation like:
#pragma omp threadprivate(nodes)
Not only is that far simpler than trying to protect access to the shared memory pool, it will also likely be better for performance due to data locality.
Note: going lockfree based on atomics with your solution is impossible because nodes[chunkCount]
- which is needed for every allocation - must always be protected from nodes.push_back
.
Fully featured memory pooling is more complex, but as a little step, you can try using std::deque
. It provides what you need without messing with two vectors - inserting elements in constant time while not invalidating pointers of existing elements. You have less control but it's a good start.