I have a function which draws a random sample:
Sample sample();
I have a function which checks weather a sample is valid:
bool is_valid(Sample s);
This simulates a conditional distribution. Now I want a lot of valid samples (most samples will not be valid).
So I want to parallelize this code with openMP
vector<Sample> valid_samples;
while(valid_samples.size() < n) {
Sample s = sample();
if(is_valid(s)) {
valid_samples.push_back(s);
}
}
How would I do this? Most of the OpenMP code I found were simple for
loops where the number of iterations is determined in the beginning.
The sample()
function has a
thread_local std::mt19937_64 gen([](){
std::random_device d;
std::uniform_int_distribution<int> dist(0,10000);
return dist(d);
}());
as a random number generator. Is is valid and thread save if I assume that my device has a source of randomness? Are there better solutions?
You may employ OpenMP task parallelism. The simplest solution would be to define a task as a single sample insertion:
vector<Sample> valid_samples(n); // need to be resized to allow access in parallel
void insert_ith(size_t i)
{
do {
valid_samples[i] = sample();
} while (!is_valid(valid_samples[i]));
}
#pragma omp parallel
{
#pragma omp single
{
for (size_t i = 0; i < n; i++)
{
#pragma omp task
insert_ith(i);
}
}
}
Note that there might be performance issues with such single-task-single-insertion mapping. First, there would be false sharing involved, but likely worse, tasking management has some overhead which might be significant for very small tasks. In such a case, a remedy is simple — instead of a single insertion per tasks, insert multiple items at once, such as 100. Usually, a suitable number is a trade-off: the lower creates more tasks = more overhead, the higher may result in worse load balancing.