Search code examples
rparallel-processinglapplymclapply

Application of mclapply() to a function writing to a global variable


I'm trying to use parallel::mclapply to speed up the calculation of the following code:

library(raster)  
library(HistogramTools)#for AddHistogram
#Create a first h here for the first band... omitted for brevity
readNhist <- function(n,mconst) {
  l <- raster(filename[i], varname=var[i], band=n, na.rm=T)
  gain(l) <- mconst
  h <<- AddHistograms(h, hist(l, plot=F, breaks=histbreaks,right=FALSE))
}
lapply(   1:10000, readNhist, mconst=1, mc.cores=7  )
#Then do stuff with the h histogram...

When performing the code above, all is fine. If using mclapply (below), the result is miles away from what I want to obtain: the histograms are all wrong.

library(raster)  
library(HistogramTools)#for AddHistogram
library(parallel)
#Create a first h here for the first band... omitted for brevity
readNhist <- function(n,mconst) {
  l <- raster(filename[i], varname=var[i], band=n, na.rm=T)
  gain(l) <- mconst
  h <<- AddHistograms(h, hist(l, plot=F, breaks=histbreaks,right=FALSE))
}
mclapply(   2:10000, readNhist, mconst=1  )
#Then do stuff with the h histogram...

I feel like there's something vital I'm missing with the application of parallel computation to this function.


Solution

  • The problem is the <<- which is bad practice in general as far as I can gather.

    The function can be rearranged thusly:

    readNhist <- function(n,mconst) {
      l <- raster(filename, varname=var, band=n, na.rm=T)
      gain(l) <- mconst
      hist <- hist(l, plot=F, breaks=histbreaks,right=FALSE)
      return(hist)
    }
    

    And called like this:

    hists <- mclapply(   2:nbands, readNhist, mconst=gain, mc.cores=ncores  )
    ch <- AddHistograms(x=hists)
    h <- AddHistograms(h, ch)
    rm(ch, hists)
    

    This is pretty fast even with a huge number of layers (and thus histograms).