Sampling conditional distribution OpenMP

I have a function which draws a random sample:

Sample sample();

I have a function which checks weather a sample is valid:

bool is_valid(Sample s);

This simulates a conditional distribution. Now I want a lot of valid samples (most samples will not be valid).

So I want to parallelize this code with openMP

vector<Sample> valid_samples;
while(valid_samples.size() < n) {
    Sample s = sample();
    if(is_valid(s)) {
        valid_samples.push_back(s);
    }
}

How would I do this? Most of the OpenMP code I found were simple for loops where the number of iterations is determined in the beginning.

The sample() function has a

thread_local std::mt19937_64 gen([](){
    std::random_device d;
    std::uniform_int_distribution<int> dist(0,10000);
    return dist(d);
}());

as a random number generator. Is is valid and thread save if I assume that my device has a source of randomness? Are there better solutions?

Solution

You may employ OpenMP task parallelism. The simplest solution would be to define a task as a single sample insertion:

vector<Sample> valid_samples(n); // need to be resized to allow access in parallel

void insert_ith(size_t i)
{
  do {
    valid_samples[i] = sample();
  } while (!is_valid(valid_samples[i]));
}

#pragma omp parallel
{
  #pragma omp single
  {
    for (size_t i = 0; i < n; i++) 
    {
      #pragma omp task
      insert_ith(i);
    }
  }
}

Note that there might be performance issues with such single-task-single-insertion mapping. First, there would be false sharing involved, but likely worse, tasking management has some overhead which might be significant for very small tasks. In such a case, a remedy is simple — instead of a single insertion per tasks, insert multiple items at once, such as 100. Usually, a suitable number is a trade-off: the lower creates more tasks = more overhead, the higher may result in worse load balancing.