library(foreach)
library(doMC)
myfun <- function(threshold){
val <- rnorm(1, mean = 0, sd = 1)
if(val > threshold){
stop("bad")
}else return(val)
}
results <- vector("list", length = 10)
parallel_fun <- function(reps, threshold){
registerDoMC(cores = 48)
results = foreach (j = 1:reps, .combine = rbind) %dopar% {
myfun(threshold)
}
}
> parallel_fun(reps = 10, threshold = 0)
Error in { : task 1 failed - "bad"
The above is a simple, reproducible example. I want to parallelize myfun
for a total of reps = 10
replicates. myfun
may stop if the val
that was generated is greater than some threshold
. In that case, I want to stop running myfun
and not have it return val
. In the end, I want my results
to have 10 vals
that are greater than some threshold
. Therefore, I thought maybe a while loop would be more appropriate here, since I want to keep it running until I have 10 values that satisfy the threshold
. Is it possible to re-purpose foreach
for parallelizing a while loop?
Using exceptions for control flow is often discouraged. Ideally,
In this specific example, you are simulating truncate normal distribution. So you could use truncnorm
function from the truncnorm package.
Alternatively, rewrite the myfun
to always return correct value:
myfun = function(threshold){
repeat{
val = rnorm(1, 0, 1)
if(val <= threshold)
break
}
val
}
This is just one of the possible variants. Here I am using a custom do-while
construct.
Note that depending on the threshold, a large or potentially infinite number of iterations might take place, so tread carefully and either put a maximum number of iterations in place or do some preliminary checks if threshold
is not outside of a maximum range of the function in question, ideally both.
With this, you should be able to run foreach
easily as you are doing right now.
If you don't have control over the myfun
, you need to construct wrapper, the construct might be almost identical to the function above:
wrap_myfun = function(threshold){
repeat{
val = try(myfun(threshold))
if(is.numeric(val))
break
}
val
}
If you need to keep track of the number of iterations it took you to generate said numbers, you can just rewrite the repeat
into a for
cycle or just add counter and another option:
wrap_myfun = function(threshold, .maxiter=10^9, .default=NA){
iter = 1
repeat{
val = try(myfun(threshold))
if(is.numeric(val))
break
if(iter >= .maxiter){
val = .default
break
}
iter = iter + 1
}
list("value"=val, "iterations"=iter)
}
Alternatively, instead of assigning default value, you can use `stop("maximum iterations reached"). That depends on how serious is the problem.
This way, you have moved all the logic into the data generating function and you do not have to manage the queues implemented in the foreach
. The load should be distributed among the cores equally (past the potentially randomly long computation time for some iterations, but that is something you cannot influence).