Search code examples
rgisrasterterra

Replace all cell values in a raster from updated aggregate polygon values (i.e. Update gridded population estimates)


Goal: Produce an updated gridded population raster for 2023 based on 2020 gridded population raster and 2023 polygon population estimates.

To do this, I have taken the approach to try to update all cell values in a raster with new cell values stored in a table by cell number.

I have a large raster with gridded population estimates from 2020, county polygons, and a table of 2023 population by county. I would like to update the cell values in the 2020 raster in the same distribution/density, but reflecting the 2023 population.

I have started with the following:

  1. Use exactextractr::exact_extract to sum the 2020 cell values (population) within each county polygon.
  2. Use terra:extract to extract all cell values in the 2020 raster
  3. Joined 1 and 2 to calculate the proportion of the county population for each cell (i.e. cell value / county pop).
  4. Joined the 2023 county population estimates to 3 and calculated new cell values reflecting the 2023 population (2023 county pop * proportion of county population in each cell)
  5. I can't seem to figure out how to get the new cell values into the 2020 raster.

I have a table with the county ID, cell number, old cell value, and new cell value, but I'm unsure how to update the 2020 raster with the new values. If this was a dataframe, I would join by cell number or just directly assign the new values, assuming the same row order. I have looked at the following SO articles (and many more), but they seem to address only conditional value replacement.

Below is a somewhat reproducible example (thanks to Zonal Statistics R (raster/polygon) for the generic raster and polygons). The cell numbers are not showing up in the extract step the way they do with the real data, but otherwise quite similar. Ideally, I would like to join by cell number to have some peace of mind and not be dependent on the row order.

library(terra)
library(exactextractr)
library(tidyverse)


# Create raster
ras <- raster(nrows=100, ncols=80, xmn=0, xmx=1000, ymn=0, ymx=800)
val <- runif(ncell(ras))
values(ras) <- val

# Create polygon within raster extent
xym <- cbind(runif(3,0,1000), runif(3,0,800))
p <- Polygons(list(Polygon(xym)),1)
sp <- SpatialPolygons(list(p))
spdf <- SpatialPolygonsDataFrame(sp, data=data.frame(1))

# Mask raster to within polygon
r2 <- mask(ras, spdf)

plot(r2)
lines(spdf)

#Create table of for new polygon value
new_tbl <- 
  tibble(X1 = 1,
         new_sum = 10000)

#1. Use ```extact_extractr::exact_extract``` to sum the 2020 cell values within each polygon.

poly_pop <- exact_extract(
  r2, 
  spdf, 
  fun = 'sum',
  weights = 'area',
  append_cols = T
) 

#2. Use ```terra:extract``` to extract all cell values in the old raster

r2_cell_values <- 
  terra::extract(
  r2, # raster file
  spdf, # polygons
  df = TRUE, 
  cells = TRUE, # Include cell numbers
  ID = TRUE,  # Include polygon IDs
  exact = TRUE # Include the proportion of the cell covered by the polygon
)

#3. Joined 1 and 2 to calculate the proportion of the value for each cell (i.e. cell value / polygon sum).

cell_props <-
r2_cell_values %>%
  filter(!is.na(layer)) %>% # Remove NA values
  left_join(poly_pop, by = c("ID" = "X1")) %>%
  mutate(
    cell_prop = layer / sum
  )

#4. Joined thenew values to 3 and calculated new cell values reflecting the mew polygon value (new polygon value * proportion of old polygon value in each cell)

new_cell_tbl <- 
cell_props %>%
  left_join(new_tbl, by = c("ID" = "X1")) %>%
  mutate(new_cell_value = cell_prop * new_sum)

#5. I can't seem to figure out how to get the new cell values into the 2020 raster. 

x <- rasterize(spdf, r2, 1)
cellStats(x, 'sum') == nrow(new_cell_tbl) # Check whether number of values in r2 and new_cell_tbl are the same

values(r2) <- new_cell_tbl$new_cell_value

I end up with this error message: Error in setValues(x, value) : length(values) is not equal to ncell(x), or to 1, even though I try to test whether the number of raster cells and length of the new vector are the same in the step before.

I'm fairly new to this, so any guidance on how I can update the 2020 cells based on the 2023 admin area values would be appreciated. Thanks!


Solution

  • Example data

    library(terra)
    pop <- rast(system.file("ex/elev.tif", package="terra"))
    names(pop) <- "pop2020"
    adm <- vect(system.file("ex/lux.shp", package="terra"))
    

    Compute relative population per admin unit for "2020"

    adm <- extract(pop, adm, "sum", na.rm=TRUE, bind=TRUE)
    pop_agg <- rasterize(adm, pop, "pop2020")
    rel_pop <- pop / pop_agg
    

    Use the relative distribution to compute the raster cell counts for "2030"

    #example data
    adm$pop2030 <- adm$pop2020 * (1:12) / 2
    # solution
    new_pop <- rasterize(adm, pop, "pop2030") * rel_pop