Search code examples
rmemoryextractraster

extract from {raster} package using excessive memory


I have been using the extract function from the raster package to extract data from raster files using an area defined by shapefiles. However, I am having problems with the amount of memory that this process is now requiring. I do have a large number of shapefiles (~1000). The raster files are large (~1.6gb)

My process is:

shp <- mclapply(list.files(pattern="*.shp",full.names=TRUE), readShapePoly,mc.cores=6)
ndvi <- raster("NDVI.dat")
mc<- function(y) {
temp <- gUnionCascaded(y)
extract <- extract(ndvi,temp)
mean <- range(extract, na.rm=T )[1:2]
leng <- length(output)
}
output <- lapply(shp, mc)

Are there any changes I can make to reduce the memory load? I tried loading fewer shapefiles which worked for about 5 min before the memory spiked again. Its a quad core computer 2.4ghz with 8gb ram


Solution

  • I would do this (untested):

    ## Clearly we need these packages, and their dependencies
    library(raster)
    library(rgeos)
    shpfiles <- list.files(pattern="*.shp",full.names=TRUE)
    ndvi <- raster("NDVI.dat")
    ## initialize an object to store the results for each shpfile
    res <- vector("list", length(shpfiles))
    names(res) <- shpfiles
    ## loop over files
    for (i in seq_along(shpfiles)) {
      ## do the union
      temp <- gUnionCascaded(shpfiles[i])
      ## extract for this shape data (and don't call it "extract")
      extracted <- extract(ndvi,temp)
      ## further processing, save result
      mean <- range(extracted, na.rm = TRUE )[1:2]
      res[[i]] <- mean  ## plus whatever else you need
    }
    

    It's not at all clear what the return value of mc() above is meant to be, so I ignore it. This will be far more memory efficient and fast than what you tried originally. I doubt it's worth using parallel stuff at all here.