Search code examples
c++stlopenmp

Efficient parallel union of sets with OpenMP


I need to calculate a global std::set (or equivalently a global std::unordered_set) in an OpenMP parallelised programm. At the moment every thread has a local std::set which then later the union is calculated from using

#pragma omp critical //critical as std container inserting is not thread safe 
global_set.insert(local_set.begin(), local_set.end());

However this creates an effectively serial section of code, where each thread inserts its local set into the global set one after the other.

How can I improve on this by parallelising the union of the sets? The union is preceded by a large block of work, is there a convenient way to give all threads different amounts of work to let the others work while one is inserting the elements in the set? Or can the union itself be efficiently parallelised (for example by unioning sets in a 'binary tree' fashion)?


Solution

  • You should read up on OpenMP reductions, and, in particular user-defined reductions. That lets you pass the problem to the OpenMP implementation, which will very likely perform the reduction up a tree.

    Of course, whether that is beneficial is not clear; it may be that it simply introduces a lot of copying and memory allocation which is still slower than the style of code you show.