I have a for
loop which generates via png()
and dev.off()
a plot and saves it the working directory.
The loop I have is similar to the following example
test.df<-data.frame(id=1:25000, x=rnorm(25000),y=rnorm(25000))
for (i in test.df$id){
plot(test.df$x[test.df$id==i], test.df$y[test.df$id==i], xlab="chi",ylab="psi")
}
The for
loop will run and generate thousands of plots. Is it possible to make it run parallel on all 8 cores of my system so that I can get the plots faster?
PS. The code is an example. My original problem and plots are much more complicated. Don't go viral on the example.
Provided you are using a new version of R, then this should be straightforward. The trick is to create a function that can be run on any core in any order. First we create our data frame:
test.df = data.frame(id=1:250, x=rnorm(250),y=rnorm(250))
Next we create the function that runs on each core:
#I could also pass the row or the entire data frame
myplot = function(id) {
fname = paste0("/tmp/plot", id, ".png")
png(fname)
plot(test.df$x[id], test.df$y[id],
xlab="chi",ylab="psi")
dev.off()
return(fname)
}
Then I load the parallel
package (this comes with base R)
library(parallel)
and then use mclapply
no_of_cores = 8
##Non windows
mclapply(1:nrow(test.df), myplot,
mc.cores = no_of_cores)
##All OS's
cl = makeCluster(no_of_cores)
clusterExport(cl, "test.df")
parSapply(cl, 1:nrow(test.df), myplot)
stopCluster(cl)
There are two advantages here:
parallel
comes with R, so we don't need to install anything extraWe can switch off the "parallel" part:
sapply(1:nrow(test.df), myplot)