This is a follow up to a previous question about learning multiple models.
The use case is that I have multiple observations for each subject, and I want to train a model for each of them. See Hadley's excellent presentation on how to do this.
In short, this is possible to do using dplyr
and purrr
like so:
library(purrr)
library(dplyr)
library(fitdistrplus)
dt %>%
split(dt$subject_id) %>%
map( ~ fitdist(.$observation, "norm"))
So since the model building is an embarrassingly parallel task, I was
wondering if dplyr
, purrr
have an easy to use parallelization mechanism for such tasks (like a parallel map
).
If these libraries don't provide easy parallelization could it be done using the classic R parallelization libraries (parallel
, foreach
etc)?
There is the furrr
package now, for example something like:
library(dplyr)
library(furrr)
plan(multisession) # or perhaps: plan(multicore), see ?plan
dt %>%
split(dt$subject_id) %>%
future_map(~fitdist(.$observation, "norm"))