Search code examples
rparallel-processingdplyrpurrr

R - Parallelizing multiple model learning (with dplyr and purrr)


This is a follow up to a previous question about learning multiple models.

The use case is that I have multiple observations for each subject, and I want to train a model for each of them. See Hadley's excellent presentation on how to do this.

In short, this is possible to do using dplyr and purrr like so:

library(purrr)
library(dplyr)
library(fitdistrplus)
dt %>% 
    split(dt$subject_id) %>%
    map( ~ fitdist(.$observation, "norm")) 

So since the model building is an embarrassingly parallel task, I was wondering if dplyr, purrr have an easy to use parallelization mechanism for such tasks (like a parallel map).

If these libraries don't provide easy parallelization could it be done using the classic R parallelization libraries (parallel, foreach etc)?


Solution

  • There is the furrr package now, for example something like:

    library(dplyr)
    library(furrr)
    plan(multisession)   # or perhaps:  plan(multicore), see ?plan
    
    dt %>% 
        split(dt$subject_id) %>%
        future_map(~fitdist(.$observation, "norm"))