Search code examples
rparallel-processingterraclamp

R: Use terra to rescale large raster files and run in parallel


I have some very large raster files and want to rescale the values of these rasters to 0-1. Here are the information of my large rasters:

class       : SpatRaster 
dimensions  : 161239, 132862, 1  (nrow, ncol, nlyr)
resolution  : 25, 25  (x, y)
extent      : 2634975, 5956525, 1385025, 5416000  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs 
source      : mean_temp.tif 
name        : mean 
min value   : -9.1 
max value   : 15.4 

To rescale the raster values to 0~1, I separate the raster values to three parts:

  • For pixels with values <= 5th quantile, all of these pixels get values 0.
  • Pixel values that are greater or equal to 95% will get 1.
  • Pixel values between the 5th quantile and 95th quantile with be
    normalized to 0~1.

I use terra and sf R packages for rescaling and make smaller tiles of the large raster.

Here is how I do it:

library(terra)
library(sf)

min_offset <- rast("test.tif")

quantiles_min <- spatSample(min_offset, 10^4, "regular")
q2 <- as.data.frame(quantile(quantiles_min, probs=c(0.05, 0.95), na.rm=TRUE))
print(q2)
#Read in data in sf format.
c_shape <- read_sf("test.shp") 

# create an initial grid for centroid determination
c_grid <- st_make_grid(c_shape, cellsize = c(5e5, 5e5), square = TRUE) |>
  st_as_sf()

c_grid_spat <- vect(c_grid)
## make tiles of large rasters:
filename <- paste0("/tile/tile", "_.tif")
tile <- makeTiles(min_offset,c_grid_spat, filename, overwrite = TRUE)
tile_rasters <- c(lapply(tile, rast))

Rescaling <- function(x){
  ##clamp to get the range of values that are need to be rescaled.
  middle_min <- clamp(x,q2[1,1], q2[2,1], values = FALSE)
  
  #use normalization to rescale the middle range values.
  normalized_mid <- ((middle_min-q2[1,1])/(q2[2,1]- q2[1,1]))|> round(digits = 1)
  
  
  ## Values that are lower or equal to 5th quantile became 0
  min_q5 <- clamp(x, upper = q2[1,1], value=FALSE)|> round(digits = 1)
  
  min_repl5 <- init(min_q5, fun=0) |> mask(min_q5)
  
  ##values that are greater or equal to 95% get  1.
  min_q95 <- clamp(x,q2[2,1],Inf, value = FALSE)|> round(digits = 1)
  min_repl95 <- init(min_q95, fun=1) |> mask(min_q95)
  
  ##save raster
  final_min<- merge(normalized_mid,min_repl5,min_repl95)
  return(final_min)
}

x <- rast(tile_rasters[2]);Rescaling(x) #this works


MIs <- function(w){
  r <- rast(w)
  y <- Rescaling(r)
  wrap(y,proxy=TRUE)
} #This function work. when > i=1;  r <- tile_rasters[i] ;MI_tile <- MIs(r)

###Test function###
test <- MIs(tile_rasters[2])#this work with next line.
test <- terra::unwrap(test)#this work

When I tried to apply the two functions above in foreach parallel, I encountered some errors:

library(parallelly)
library(doParallel)
library(foreach)

cl <- makeClusterPSOCK(length(tile_rasters), autoStop = TRUE)
registerDoParallel(cl)

 test <- foreach(x =1:length(tile_rasters),
                 .packages = "terra") %dopar% {
                   resu <- MIs(tile_rasters[i])
                   return(resu)
                 }
stopCluster(cl)

Error message:

Error in { : task 1 failed - "NULL value passed as symbol address"

Does anyone has clues that how I can solve this problem to have my function parallelized on each raster tile?

For the test.shp file, you can find it from here, for the test.tif file, download from here.

To merge all the tile results into a final raster, I use vrt function, but it seems to still take quite some memory space. Here is how I did it:

t.lst <- list.files("/AllTiles/", pattern=".tif", full.names=TRUE)
MIs_final <- function(t.lst, fout="") {
  r <- vrt(t.lst)
  if (fout != "") {
    writeRaster(r, fout, overwrite=TRUE)
    fout
  } else {
    wrap(r)
  }
} 

Solution

  • You wrap the SpatRaster that is send back from a node, but you also need to wrap the input SpatRaster. But I would send the filenames instead. Otherwise you would probably run out of memory.

    Something like this

    Set up

    library(terra)
    f <- system.file("ex/elev.tif", package="terra")
    r <- rast(f)
    qnt <- global(r, quantile, probs=c(0.05, 0.95), na.rm=TRUE, maxcell=10^4) |> unlist()
    qrng <- diff(qnt)
    tiles <- makeTiles(r, 20, overwrite = TRUE)
    

    A function that takes the file (this function should be much faster than yours; perhaps reducing the benefit of parallelization)

    
    rescaler1 <- function(f){
        x <- rast(f)
        norm <- (x-qnt[1])/qrng
        rcl <- rbind(c(-Inf, qnt[1], 0), c(qnt[2], Inf, 1))
        cls <- classify(x, rcl, other=NA)
        cover(cls, norm) |> round(1)
    }
    

    or easier to understand (but probably slower)

    rescaler2 <- function(f) {
        x <- rast(f)
        x <- ifel(x < qnt[1], 0,  
                ifel(x > qnt[2], 1, (x-qnt[1]) / qrng))
        round(x, 1)
    }
    

    And now you can do

    a <- rescaler1(tiles[2])
    b <- rescaler2(tiles[2])
    

    And a function like this

    MIs <- function(f, fout="") {
        r <- rescaler1(f)
        if (fout != "") {
            writeRaster(r, fout, overwrite=TRUE)
            fout
        } else {
            wrap(r)
        }
    } 
    

    And you need to export qnt and qrng to the nodes

    Here is an alternative approach with multiple cores

    fresc <- function(x){
        x <- ifelse(x < qnt[1], 0,  
                ifelse(x > qnt[2], 1, (x-qnt[1]) / qrng))
        round(x, 1)
    }
    
    r <- rast(f)
    a  <- app(r, fresc)
    
    library(parallel)
    cl <- makeCluster(4)
    clusterExport(cl, c("qnt", "qrng"))
    b <- app(r, fresc, cores=cl)