Search code examples
rparallel-processingplotmulticore

Multicore generation of plots


I have a for loop which generates via png() and dev.off() a plot and saves it the working directory.

The loop I have is similar to the following example

test.df<-data.frame(id=1:25000, x=rnorm(25000),y=rnorm(25000))

for (i in test.df$id){
  plot(test.df$x[test.df$id==i], test.df$y[test.df$id==i], xlab="chi",ylab="psi")
}

The for loop will run and generate thousands of plots. Is it possible to make it run parallel on all 8 cores of my system so that I can get the plots faster?

PS. The code is an example. My original problem and plots are much more complicated. Don't go viral on the example.


Solution

  • Provided you are using a new version of R, then this should be straightforward. The trick is to create a function that can be run on any core in any order. First we create our data frame:

    test.df = data.frame(id=1:250, x=rnorm(250),y=rnorm(250))
    

    Next we create the function that runs on each core:

    #I could also pass the row or the entire data frame
    myplot = function(id) {
      fname = paste0("/tmp/plot", id, ".png")
      png(fname)
      plot(test.df$x[id], test.df$y[id], 
          xlab="chi",ylab="psi")
      dev.off()
      return(fname)
    }
    

    Then I load the parallel package (this comes with base R)

    library(parallel)
    

    and then use mclapply

    no_of_cores = 8
    ##Non windows
    mclapply(1:nrow(test.df), myplot, 
             mc.cores = no_of_cores)
    
    ##All OS's
    cl = makeCluster(no_of_cores)
    clusterExport(cl, "test.df")
    parSapply(cl, 1:nrow(test.df), myplot)
    stopCluster(cl)
    

    There are two advantages here:

    1. The package parallel comes with R, so we don't need to install anything extra
    2. We can switch off the "parallel" part:

      sapply(1:nrow(test.df), myplot)