Search code examples
rnested-loopsgeospatialweighted-averagepopulation

Calculating population-weighted number of hot days using large geospatial data sources


I am calculating the population-weighted number of hot days for a set of countries over multiple years. I have to weight each hot day value in a grid cell by the population located in that grid cell to calculate a final estimate of the population-weighted number of hot days. I have a working code but it does take a long time dealing with large countries. This is caused by a nested loop where I effectively loops through each grid cell individually. Therefore, the code is not time efficient and I am looking for any suggestions to code this more time efficiently.

I cannot add any data since these involve large geospatial files.

# Load input data (hot days – geospatial file – 1 deg. resolution)
hot_days <- rast('file_path.nc')
pop <- rast('file_path.tif')

# Import shape country 
shapefile <- st_read('country_path.shp')     
   
# Crop, reproject and mask raster by shapefile 
v <- vect(shapefile)        
clip1 <- crop(hot_days, ext(shapefile), snap='out') 
clip2 <- crop(pop, ext(shapefile))
mask1 <- mask(clip1, v) 
mask2 <- mask(clip2, v)           

# Prepare grid raster to loop through
grid_rows <- nrow(mask1)  # Number of rows in the grid 
grid_cols <- ncol(mask1)  # Number of columns in the grid        

# Specify cell size 
cell_size <- res(mask1)[[1]]        

# Calculate the extent of the grid based on the raster's extent 
grid_extent <- ext(mask1)        

grid_raster <- rast(nrows = grid_rows, ncols = grid_cols, nlyrs = 1, crs = 'ESRI:54009', extent = grid_extent, resolution = cell_size)        

# Base value for calculating population-weighted number of hot days 
sum_pop_hd <- 0
sum_pop <- 0        

# Loop through each cell       
for (row in 1:grid_rows) {
        for (col in 1:grid_cols) {
          hd_cell <- mask1[row, col][[1]]                      
          if (is.na(hd_cell) == FALSE) {                         
          # Extract the cell as a SpatRaster             
          cell <- grid_raster[row, col, drop = FALSE]             
          cell <- terra::as.polygons(cell)                          

          # Produce values per cell             
          pop_cell <- terra::mask(mask2, cell, touches = FALSE)             
          pop_cell_sum <- terra::global(pop_cell, fun = "sum", na.rm = T)             
          pop_cell_sum <- pop_cell_sum[[1]]                         

            if (is.na(pop_cell_sum) == TRUE) {               
            sum_pop <- sum_pop # basically not change in values               
            sum_pop_hd <- sum_pop_hd             
            } else if (is.na(pop_cell_sum) == FALSE | ((pop_cell_sum == 0) == TRUE)) {               
            sum_pop <- sum_pop + pop_cell_sum               
            weighted <- pop_cell_sum * hd_cell               
            sum_pop_hd <- sum_pop_hd + weighted            
            }                         
          }                            
}              
}        

# Population-weighted number of hot days - final estimate
pwhd <- sum_pop_hd / sum_pop

Solution

  • Without the data, it is not easy to guess your output. However, all this loop seems redundant and not necessary (for raster operation, looping is often not necessary). Using some dummy data and an example code, I will illustrate my idea. First, I will skip the part of cropping and masking on the boundaries of the shapefile since this operation could be done once for each country shp and do not change the main workflow. Second, I assume that population and hot days rasters are in the same crs, have the same extent and resolutions. Third, I would convert one raster grid into vector polygons and use that grid to extract the weighted mean of hot days.

    library(terra)
    library(exactextractr)
    library(sf)
    pop <- rast(nrows=10, ncols=10)
    values(pop) <- round(runif(100),2)*100
    pop_vect <- as.polygons(pop,dissolve=F)
    hd <- rast(nrows=10, ncols=10)
    values(hd) <- round(runif(100),2)*10
    
    wm <- exact_extract(hd,sf::st_as_sf(pop_vect),fun="weighted_mean",weights=pop)
    

    This is just an example, but I think that you can adapt it to obtain what you need. If you can provide more information about you rasters it would be easy to help. For instance, resolution, n° of layer, extent, crs etc. for hot_days and pop