Search code examples
pythonrmapsterratidyterra

How to extract the density population of specific longitude and latitude points from the GHSL population data?


I have a data frame with about 20,000 rows containing the latitude and longitude of points spread across the world. I need to determine the population density at those points, so I considered using the GHSL population data. However, I am unsure how to extract that information from the GHSL population data, or if it is even possible. (


Solution

  • As you haven't provided any example data, information about which GHSL year to use, or your preferred language, here is an R reprex using example data.

    I have used the 30arcsec resolution GHSL (~100m cell size) as the 3arcsec option was too large for my computer to handle. If you need a finer resolution, you can download the GHSL tile by tile, but that would be laborious. Given you point data are at global scale, I suspect ~100m will suffice. This example assumes your coordinates are WGS84/EPSG:4326.

    First, load required packages and create example point df:

    library(rnaturalearth)
    library(terra)
    library(sf)
    library(dplyr)
    library(ggplot2)
    
    # Load world polygons and project to WGS84 Pseudo Mercator/EPSG:3857
    world <- ne_countries() %>%
      filter(!admin == "Antarctica") %>%
      st_transform(3857)
    
    # Create example points df using world and convert coordinates to WGS84:EPSG4326
    set.seed(1)
    df <- st_sample(world, size = 20000, type = "random") %>%
      st_as_sf() %>%
      st_transform(4326) %>%
      mutate(ID = 1:n(),
             lon = st_coordinates(.)[,1],
             lat = st_coordinates(.)[,2]) %>%
      data.frame() %>%
      select(-x)
    
    head(df)
    #   ID          lon       lat
    # 1  1   26.2272108 -1.853224
    # 2  2  146.9548044 65.990806
    # 3  3 -105.8491530 73.182944
    # 4  4 -116.4395691 70.634154
    # 5  5   97.1429112 34.367544
    # 6  6   -0.8282728 46.360984
    

    Note the addition of the "ID" column. You will need to add this to your actual df as the output from terra::extract() creates an "ID" column, and adding the "ID" column provides a way to join your population density data back to your original data. Using dplyr::left_join() is not strictly necessary, you could just use the pop_den data, but I have assumed you have other columns in your df, so using a join makes things easier to keep track of. The workflow below joins the data to sf_points, but you could also just join pop_den back to your df using the same method.

    Next, load GHSL as a SpatRaster:

    # Load WGS84/EPSG:4326 GHSL, 30 arcsec, 202 from working directory,
    # previously unzipped and downloaded from
    # https://human-settlement.emergency.copernicus.eu/download.php
    ghsl <- rast("data/GHS_BUILT_S_E2020_GLOBE_R2023A_4326_30ss_V1_0.tif")
    ghsl <- ghsl * 1
    names(ghsl) <- "pop_density_2020"
    
    ghsl
    # class       : SpatRaster 
    # dimensions  : 21384, 43201, 1  (nrow, ncol, nlyr)
    # resolution  : 0.008333333, 0.008333333  (x, y)
    # extent      : -180.0012, 180.0071, -89.10042, 89.09958  (xmin, xmax, ymin, ymax)
    # coord. ref. : lon/lat WGS 84 (EPSG:4326) 
    # source      : spat_391c2813558f_14620.tif 
    # varname     : GHS_BUILT_S_E2020_GLOBE_R2023A_4326_30ss_V1_0 
    # name        : pop_density_2020 
    # min value   :                0 
    # max value   :           848108 
    

    Now create an sf object from your df, extract cell values from ghsl, and join the result to sf_points:

    # Create sf from df
    sf_points <- st_as_sf(df, coords = c("lon", "lat"), crs = 4326)
    
    # Return population density per point
    pop_den <- extract(ghsl, sf_points)
    
    # Join population density to sf_points
    sf_points <- left_join(sf_points, pop_den, by = "ID")
    
    # Examine result
    filter(sf_points, pop_density_2020 > 0)
    # Simple feature collection with 2542 features and 2 fields
    # Geometry type: POINT
    # Dimension:     XY
    # Bounding box:  xmin: -155.8613 ymin: -55.07594 xmax: 177.6207 ymax: 72.03816
    # Geodetic CRS:  WGS 84
    # First 10 features:
    #     ID pop_density_2020                    geometry
    # 1   22             1365  POINT (-91.87298 35.95215)
    # 2   45               21   POINT (21.92427 41.46684)
    # 3   53               19  POINT (-80.00786 37.48202)
    # 4   62              165   POINT (76.22779 19.11223)
    # 5   65               88   POINT (101.8265 24.57473)
    # 6   68             1712  POINT (-114.7794 42.88764)
    # 7   72              366  POINT (-87.25635 33.31995)
    # 8   73            76958   POINT (127.4405 36.84436)
    # 9   81              720 POINT (-79.93223 -1.357552)
    # 10 107               30  POINT (15.16695 -13.33818)