Search code examples
rr-sfterra

How to adjust population data in a raster dataset using census data from an administrative boundary shapefile


I downloaded 100m resolution population raster data from WorldPop, and through transprojection and cropping, I got a city's population spatial distribution raster data, in addition, I have the administrative boundary shapefile data of the county level of this city, and my purpose is to correct the population raster based on the census (county level) data of the year.

##Mask and crop one county from the population data
popc <- trim(mask(pop, county[1, ]))
##Calculate the correction factor and make correction
POPc <- popc*(county[1, ]$census_data/global(popc, sum, na.rm = T)[1,])

I plan to crop out the population rasters for each county separately, then correct them by calculating their correction factors, and finally merge the rasters. However, this way is time-consuming, I wonder if there is an easier way to do this?


Solution

  • If your intention is to use the ratio between the county populations from WorldPop and county$census_data to adjust your WorldPop cell values, you can do this using raster arithmetic:

    1. Create a new version of the WorldPop raster where each cell represents the WorldPop county population it falls within e.g. pop1;
    2. Using the original WorldPop raster as a reference, use rasterize() to create a raster version of the county shapefile with corresponding county$census_data population values e.g. county_r;
    3. Use county_r and pop1 to define the ratio to adjust the cell values in pop

    Note that this example uses the default CRS from the tigris data, and is at a lower resolution than your WorldPop data for the sake of speed and simplicity.

    Step 0: Load packages and create example data

    library(sf) # 1.0-16 
    library(terra) # 1.7.71
    library(tigris) # 2.1
    library(dplyr) # 1.1.4
    
    # Create example city by county sf data using tigris package
    # Add ID to your actual data for left_join() in Step 1
    county <- counties("Maine", cb = TRUE) %>%
      mutate(ID = 1:n())
    
    # Get extent of county and use values to create a proxy WorldPop SpatRaster
    st_bbox(county)
    
    #      xmin      ymin      xmax      ymax 
    # -71.08392  42.97776 -66.94989  47.45969
    
    set.seed(1)
    pop <- rast(resolution = 100/11113.9, nlyrs = 1,
                xmin = -71.08392, xmax = -66.94989, 
                ymin = 42.97776, ymax = 47.45969, 
                names = "population",
                crs = crs(county)) %>%
      init("cell") %>%
      crop(county, mask = TRUE)
    
    pop[] <- sample(1:100, nrow(pop) * ncol(pop), replace = TRUE)
    
    pop
    # class       : SpatRaster
    # dimensions  : 498, 459, 1  (nrow, ncol, nlyr)
    # resolution  : 0.008997742, 0.008997742  (x, y)
    # extent      : -71.08392, -66.95396, 42.97776, 47.45864  (xmin, xmax, ymin, ymax)
    # coord. ref. : lon/lat NAD83 (EPSG:4269)
    # source(s)   : memory
    # name        : population
    # min value   :          1
    # max value   :        100
    

    Step 1: Create population by county raster from WorldPop data

    pop1 <- extract(pop, county, fun = sum, na.rm = TRUE) %>%
      left_join(county, ., by = "ID") %>%
      rasterize(., pop, field = "population")
    

    Step 2: rasterize() county shapefile with county$census population values to corresponding cells

    county_r <- extract(pop, county, fun = sum, na.rm = TRUE) %>%
      left_join(county, ., by = "ID") %>%
      mutate(census_data = ceiling(population * 1.1)) %>% # For illustrative purposes, delete this line for your actual data
      rasterize(., pop, field = "census_data")
    

    Step 3: Adjust original WorldPop values using county_r and pop1

    POPc <- pop * (county_r / pop1)
    
    POPc
    # class       : SpatRaster
    # dimensions  : 498, 459, 1  (nrow, ncol, nlyr)
    # resolution  : 0.008997742, 0.008997742  (x, y)
    # extent      : -71.08392, -66.95396, 42.97776, 47.45864  (xmin, xmax, ymin, ymax)
    # coord. ref. : lon/lat NAD83 (EPSG:4269)
    # source(s)   : memory
    # name        : population
    # min value   :     1.1000
    # max value   :   110.0018