Search code examples
rparallel-processingmclapply

mclapply vs parLapply speeds


I'm running on Linux and used mclapply easily. I run into some errors with parlapply, even after using clusterEvalQ.

Before I go further to resolve the issue, is there any point, i.e. could there be a significant speed difference between the two or do people just use parLapply when on Windows?

I've read about parLapplyLB and can see the uses of this approach, but if I'm strictly looking at mclapply and parlapply does the FORK approach and PSOCK approach vary much in speed?

The nature of my function may determine the answer; it is using stri_extract.


Solution

  • Some quick benchmarks suggest that mclapply could be slightly faster, but this probably depends on the specific system and problem. The more balanced the jobs and the slower the actual tasks the less it should matter, which function you use.

    library(parallel)
    library(microbenchmark)
    
    microbenchmark(
      parLapply = {cl <- makeCluster(2)
      parLapply(cl, rep(1:7, 3), function(x) {set.seed(1); rnorm(10^x)})
      stopCluster(cl)},
      mclapply = {mclapply(rep(1:7 , 3), function(x) {set.seed(1); rnorm(10^x)}, mc.cores = 2)},
      times = 10
    )
    
    #Unit: seconds
    #     expr     min      lq     mean   median       uq      max neval
    #parLapply 1.85548 2.04397 3.332970 3.071284 4.323514 6.294364    10
    #mclapply  1.62610 1.65288 2.217407 1.849594 2.243418 5.435189    10
    
    
    microbenchmark(
      parLapply = {cl <- makeCluster(2)
      parLapply(cl, rep(6, 20), function(x) {set.seed(1); rnorm(10^x)})
      stopCluster(cl)},
      mclapply = {mclapply(rep(6, 20), function(x) {set.seed(1); rnorm(10^x)}, mc.cores = 2)},
      times = 10
    )
    
    #Unit: milliseconds
    #     expr      min        lq      mean   median       uq      max neval
    #parLapply 1150.657 1188.9750 1705.1364 1242.739 2071.276 3785.516    10
    # mclapply  820.692  932.2262  994.4404 1000.402 1079.930 1117.863    10
    
    sessionInfo()
    #R version 3.3.1 (2016-06-21)
    #Platform: x86_64-pc-linux-gnu (64-bit)
    #Running under: Ubuntu 14.04.5 LTS
    #
    #locale:
    # [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8    
    # [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
    # [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
    #
    #attached base packages:
    #[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
    #
    #other attached packages:
    #[1] microbenchmark_1.4-2.1 doParallel_1.0.10      iterators_1.0.8        foreach_1.4.3         
    #
    #loaded via a namespace (and not attached):
    # [1] colorspace_1.2-6 scales_0.4.0     plyr_1.8.4       tools_3.3.1      gtable_0.2.0     Rcpp_0.12.4     
    # [7] ggplot2_2.1.0    codetools_0.2-14 grid_3.3.1       munsell_0.4.3