I am imputing missing values by missRanger
and it takes too long as I have 1000 variables. I tried to use parallel computing, but it does not make the process faster. Here is the code
library(doParallel)
cores=detectCores()
cl <- makeCluster(cores[1]-1)
registerDoParallel(cl)
library(missRanger)
train[1:lengthvar] <- missRanger(train[1:lengthvar], pmm.k = 3, num.trees = 100)
stopCluster(cl)
I am not sure what to add to this code to make it work.
missRanger
is based on a parallelized random forest implementation in R -ranger
. Thus, the code is already running on all cores and stuff like doParallel
just renders the code clumsy.
Try to speed up the calculations by passing relevant arguments to ranger
via the ...
argument of missRanger
, e.g.
num.trees = 20
or
max.depth = 8
instead.
Disclaimer: I am the author of missRanger
.