Search code examples
rfor-loopparallel-processingrasterdoparallel

Parallelization in R


I am new to parallelization and I want to improve the current version of parallelization of my code to see if I can improve the time. I have 4 rasters and one shapefile with 9000 features. I want for each raster to perform the mean in each of the 9000 features.

I could do the parallelization in a way that I use a core to perform the operation over each raster using mclapply. In total I use 4 cores. However, I have more cores available and I would like to also divide in the remaining cores the different features. For instance, one core run the mean of the 3000 features in the first raster, the second core run the mean of the other 3000 features, and go on...

Could someone give some advice in how to do it?

This is my current code:

library(terra)
library(stringr)
library(dplyr)

irr <- function(x) {

      g <-  terra::vect("prueba.SHP")
      ra <-  rast(x)

      val <- c()
      df <- data.frame()
      
         for (i in 1:nrow(g)) {
               
            d <- terra::extract(ra, g[i],exact = TRUE) 
            d2 <- d1 %>% group_by(ID) %>% summarize(ss = sum(fraction * d1[,2])) 
            val <- c(val,d2)
                    
         }
      
       df <- data.frame("perc_ir_agr"=val)
       write.csv(df,paste(str_sub(x,end=-5),".csv",sep=""))
   
}

ff <-parallel::mclapply(list.files(pattern="\\.tif$"),irr, mc.cores=parallel::detectCores())

Solution

  • It is not quite clear what you mean to take from the inner for loop, surely d2 is a data.frame? And collecting that data.frame with c(val, d2) might not best do what you originally wanted.

    Perhaps you can make the code work for you.

    library(terra)
    library(stringr)
    library(dplyr)
    library(purrr)
    library(furrr)
    
    g <-  terra::vect("prueba.SHP")
    
    irr <- function(x) {
    
        ra <-  rast(x)
    
        segments <- cut(seq_len(nrow(g), parallel::detectCores()))
        slevels <- levels(segments)
    
        l <- future_map( slevels, function(level) {
    
            rows_to_process <- which(segments == level)
    
            map( rows_to_process, function(i) {
                d1 <- terra::extract(ra, g[i], exact = TRUE)
                d1 %>% group_by(ID) %>% summarize(ss = sum(fraction * d1[,2]))
            }) %>% list_rbind()
    
        })
    
        val <- list_rbind(l)
        df <- val %>% rename(perc_ir_agr="ss")
        write.csv(df,paste(str_sub(x,end=-5),".csv",sep=""))
    
    }
    
    for(tif_file in list.files(pattern="\\.tif$")) {
        irr(tif_file)
    }