I am trying to parallelize a for loop with RcppThread. The unparallelized version looks like this:
IntegerVector simulate_pos(NumericVector x_pop,
NumericVector y_pop,
int n_studies,
int sample_size_min,
int sample_size_max,
bool replace,
float lower_limit,
float upper_limit){
IntegerVector pos(n_studies);
int npop = x_pop.size();
NumericVector index_pop(npop);
for (int i = 0; i < npop; i++){
index_pop[i] = i;
}
// HERE IS THE LOOP-------------------------------------------------
for (int k = 0; k < n_studies; k++){
pos[k] = simulate_one_pos(x_pop, y_pop, index_pop, sample_size_min,
sample_size_max, replace, lower_limit,
upper_limit);
}
// ------------------------------------------------------------------
return(pos);
}
Now I thought it would be no problem to use parallelFor:
std::vector<int> simulate_pos(NumericVector x_pop,
NumericVector y_pop,
int n_studies,
int sample_size_min,
int sample_size_max,
bool replace,
float lower_limit,
float upper_limit,
int n_threads){
std::vector<int> pos(n_studies);
int npop = x_pop.size();
NumericVector index_pop(npop);
for (int i = 0; i < npop; i++){
index_pop[i] = i;
}
// HERE IS THE LOOP-------------------------------------------------
RcppThread::parallelFor(0, pos.size(), [&] (int i){
pos[i] = simulate_one_pos(x_pop, y_pop, index_pop, sample_size_min,
sample_size_max, replace, lower_limit,
upper_limit);
});
// -----------------------------------------------------------------
return(pos);
}
To stick to the RcppThread-paper (https://arxiv.org/pdf/1811.00450.pdf) I used an std::vector
as the return value instead of the Rcpp equivalent IntegerVector
.
Sometimes the code works, sometimes I get a stack imbalance error, sometimes it just hangs. I assume that I am making a big conceptual mistake and have to note that I am pretty much a noob at C++.
Is the problem that several threads are reading the same memory address at the same time? Or are the Rcpp data structures (e.g. NumericVector) causing issues?
The complete code can be found on github: https://github.com/johannes-titz/fastpos/tree/rcppthread
To run for yourself:
devtools::install_github("johannes-titz/fastpos@rcppthread")
pop <- fastpos::create_pop(0.5, 1e5)
x <- pop[,1]
y <- pop[,2]
lower_limit <- 0.4
upper_limit <- 0.6
n_studies <- 50
sample_size_min <- 20
sample_size_max <- 1000
res <- fastpos::simulate_pos(x, y, n_studies, sample_size_min, sample_size_max, TRUE, lower_limit,
upper_limit, 4)
PS: also tried to use pool.pushReturn
, but with the same outcome.
EDIT: The problem was indeed using Rcpp data structures (NumericVector
). When replacing all of them with std::vector
it runs fine. Now, without the Rcpp sugar, I had to find a way how to sample from a std::vector
(inside the function I call in the loop), but this is obviously worth it.
You write
Is the problem that several threads are reading the same memory address at the same time? Or are the Rcpp data structures (e.g. NumericVector) causing issues?
and I am inclined to say "likely". See the excellent documentation in package RcppParallel, and how it builds up from small examples.
I know this sounds unappealing but I really recommending building from a really small function with maybe zero or one arguments slowly adding and making sure that when you cross the bridge of adding any R-accessible or created memory that things still work as intended.
Sadly, we can't slap a parallel outer loop around inner R code and "hope for the best". OpenMP and friends are more demanding, and R's single-threaded nature imposes more constraints.
(And you probably know one can of course run higher-level parallelism from R over R functions, but that is a different topic and approach.)