I have a parallel C++ algorithm that looks somewhat like this:
void recursiveFunction(int p, std::unordered_set<int> localSet) {
vector<someStruct> toCall;
if (p > SOMEINT) {
return;
}
#pragma omp parallel for schedule(dynamic, 1) shared(localSet, someGlobalVector, someOtherGlobalVector)
for (int i = 0; i < someGlobalVector[p].size(); ++i) {
auto someLocalValue = someGlobalVector[p][i];
auto someOtherLocal = someOtherGlobalVector[p][i];
std::unordered_set<int> filtered;
for (auto elem : localSet) {
if (someLocalValue.count(elem) == 0) {
filtered.insert(elem);
}
}
if (checkSomeCondition(p+1, filtered)) {
#pragma omp critical
{
tocall.push_back({someOtherLocal, filtered});
}
}
}
for (auto [elem1, elem2] : toCall) {
recursiveFunction(p+1, filtered);
}
}
Is iteration through a shared unordered_set
thread safe?
Is iteration through a shared unordered_set thread safe?
No, an unordered_set
is not thread-safe by definition. However, this does not necessarily imply one's code has a race condition when using such structures; For instance, if one only reads the data structure, even concurrently, then one is on the safe side. This is the reason why truly immutable data structures are thread-safe by definition, because one cannot modify those structures, hence the name.
On the other hand, if the unordered_set
is being modified concurrently by multiple threads then one has a race condition. Nonetheless, one can still write to a unordered_set
in a thread-safety manner, as long as one ensures mutual exclusion when one performs does write.
Non-thread-safe data structures such as unordered_set
rely upon the code accessing them to ensure mutual exclusion when currently modifying those structures by multiple threads.
To avoid race conditions, one can use the OpenMP's critical
constructor to synchronize the accesses to the shared structure during the insert
operation:
#pragma omp critical
{
// the code to insure mutual exclusion.
}
However, that would slow down the parallelization due to the overhead of such a synchronization mechanism. Alternatively, one can create data structures that will only be accessed individually by each thread, insert into those structures, and have the main thread merge those structures into one (outside the parallel region).
Let us break down your code a bite:
void recursiveFunction(int p, std::unordered_set<int> localSet) {
#pragma omp parallel for schedule(dynamic, 1) shared(localSet, someGlobalVector, someOtherGlobalVector)
for (int i = 0; i < someGlobalVector[p].size(); ++i) {
auto someLocalValue = someGlobalVector[p][i];
auto someOtherLocal = someOtherGlobalVector[p][i];
std::unordered_set<int> filtered
for (auto elem : localSet) {
if (someLocalValue.count(elem) == 0) {
filtered.insert(elem)
}
}
}
}
The localSet
, someGlobalVector
, someOtherGlobalVector
are shared, but they are only read. Hence, as long there is no other thread outside this method (running at the same time) changing those structures, you are fine. The vector<someStruct> toCall
is shared, and modified, but the modification happens within a critical constructor, therefore there is no race condition there.
To conclude, as it is, and assuming that there is no other thread modifying the aforementioned shared structures, there is no race-condition in your code. Notwithstanding, you can profile your code to verify the overhead of that critical
region. If the overhead is too high, you can try the following:
void recursiveFunction(int p, std::unordered_set<int> localSet, int total_threads) {
vector<someStruct> thread_tocall[total_threads];
vector<someStruct> toCall;
if (p > SOMEINT) {
return;
}
#pragma omp parallel shared(localSet, someGlobalVector, someOtherGlobalVector)
{
int thread_id = omp_get_thread_num();
for (int i = 0; i < someGlobalVector[p].size(); ++i) {
auto someLocalValue = someGlobalVector[p][i];
auto someOtherLocal = someOtherGlobalVector[p][i];
std::unordered_set<int> filtered;
for (auto elem : localSet) {
if (someLocalValue.count(elem) == 0) {
filtered.insert(elem);
}
}
if (checkSomeCondition(p+1, filtered)) {
thread_tocall[thread_id].push_back({someOtherLocal, filtered});
}
}
}
for(int i = 0; i < total_threads; i++)
for (auto [elem1, elem2] : thread_tocall[i]) {
recursiveFunction(p+1, filtered);
}
}