Search code examples
rparallel-processingmulticorerandom-forest

rfsrc() command in randomForestSRC package R not using multi core functionality


I am using R (for Windows 7, 32 -bit) for doing text classification using randomForests. Due to large dataset, I looked up the Internet for speeding up model-building and came across randomForestSRC package.

I have followed all the steps in the installation manual for the package, yet during execution of rfsrc() command, only one of the logical cores is used by R (same as randomforest()), the maximum cpu utilization being 25%. I have used following command as per the manual.

options(mc.cores=detectcores()-1, rf.cores = detectcores()-1)

I am using Windows 7 Professional 32 bit Service Pack 1, on Intel i3 2120 CPU with 4 logical cores. Could anyone throw some light on what I could be missing? Any other efficient way to use randomForest with multicore utilization will also be helpful!


Solution

  • The problem is that randomForestSRC uses the mclapply function for parallel execution, but mclapply doesn't support parallel execution on Windows. randomForestSRC can also use OpenMP for multithreaded parallel execution, but that isn't built into the binary distribution from CRAN, so you have to build the package from source with OpenMP support enabled.

    I think your two options are:

    • Build randomForestSRC with OpenMP support on your Windows machine;
    • Call a random forest function in parallel yourself.

    Here's a simple parallel example using the randomForest package with foreach and doParallel that is derived from an example in the foreach vignette:

    library(randomForest)
    library(doParallel)
    workers <- detectCores() 
    cl <- makePSOCKcluster(workers)
    registerDoParallel(cl)
    
    x <- matrix(runif(500), 100)
    y <- gl(2, 50)
    ntree <- 1000
    
    rf <- foreach(n=rep(ceiling(ntree/workers), workers),
                  .combine=combine, .multicombine=TRUE,
                  .packages='randomForest') %dopar% {
      randomForest(x, y, ntree=n)
    }
    

    This example should work on Windows, Mac OS X and Linux. See the foreach vignette for more information.