Sorry if these are a dumb questions, but I know next to nothing about how parallel processing works in practice.
My questions are:
- Q1. Is a function like purrr::map()
within future.apply::future_apply()
also being ran in parallel?
- Q2. What happens if I run furrr::future_map()
inside of a future.apply()
function?
- Q3. Assuming I did the above, would I include another plan(multiprocess)
call before furrr::future_map()
?
Author of the future framework here.
- Q1. Is a function like
purrr::map()
withinfuture.apply::future_apply()
also being ran in parallel?
No. There is nothing in 'purrr' that runs in parallel.
- Q2. What happens if I run
furrr::future_map()
inside of afuture.apply()
function?
It will fall back to run sequentially, which is plan(sequential)
. The reason for this is to protect against recursive, nested parallelism, which is rarely wanted. This is explained in the future vignette 'A Future for R: Future Topologies'. In some cases it is reasonable to nested parallelism, e.g. distributed processing on multiple machines where you in turn parallel across multiple cores on each machine. This can be done by using
plan(list(tweak(cluster, workers = c("n1", "n2", "n3")), multisession))
- Q3. Assuming I did the above, would I include another
plan(multiprocess)
call beforefurrr::future_map()
?
You don't want to set plan()
"inside" you code / functions. Leave the control of plan()
to whoever will use your code/call your functions. Also, one doesn't want to for a nested number of cores such as in plan(list(tweak(multisession, workers = ncores), tweak(multisession, workers = ncores)))
because that will use ncores^2
cores which will overload you computer. Using the default number of cores as plan(list(multisession, multisession))
will not have this problem, because in the second layer there will be only one core available anyway.