Search code examples
rparallel-processingmissing-dataimputation

How to use parallel computing for missRanger in imputation of missing values?


I am imputing missing values by missRanger and it takes too long as I have 1000 variables. I tried to use parallel computing, but it does not make the process faster. Here is the code

library(doParallel)
cores=detectCores()
cl <- makeCluster(cores[1]-1) 
registerDoParallel(cl)
library(missRanger)
train[1:lengthvar] <- missRanger(train[1:lengthvar], pmm.k = 3, num.trees = 100)
stopCluster(cl)

I am not sure what to add to this code to make it work.


Solution

  • missRanger is based on a parallelized random forest implementation in R -ranger. Thus, the code is already running on all cores and stuff like doParallel just renders the code clumsy.

    Try to speed up the calculations by passing relevant arguments to ranger via the ... argument of missRanger, e.g.

    • num.trees = 20 or

    • max.depth = 8

    instead.

    Disclaimer: I am the author of missRanger.