Search code examples
rgisspatialr-sf

Finding ratio of perimeter with surrounding neighbors in R


I have a shape file at Sample Data

I read the data in R as sf object named "SD" and have columns "GEOID" and "geometry" along with other columns. I created a new column in the data named "POP"

SD$POP <- sample(2000:8000, 35)

First, I want to measure the area and perimeter for each "GEOID".

Second, I need to find how ratio of each GEOID's perimeter shared with it's surrounding GEOIDs. For example: say GEOID 09 is surrounded by GEOID 06, 08, 10, 13. I want to find what percent of GEOID 09's perimeter is shared with 06, 08, 10, and 13 individually. Then Create new columns "POP_PERCENT" and "ADJ_POP" which will be

ADJ-POP = SUM OF POP for GEOID 06, 08, 10, and 13' .

POP_PERCENT =  Percent share with 06 + Percent share with 08 + Percent share with 10 +
Percent share with 13 

My apologies for verbose question. I would really appreciate any help.


Solution

  • I think generally what you need is st_intersection plus some data manipulation. The dataset does not have a GEOID column, and I'm not completely clear what the expected output for POP_PERCENT is (adjacent population weighted by percent perimeter overlap?). So this isn't a totally complete answer.

    Here is a start using sf and some other tidyverse functions. This calculates the area, perimeter, and intersections of each polygon. I created a GEOID column from the rownames. Hopefully that is enough to get you going in the right direction.

    First calculate area and perimeter of each

    library(tidyverse)
    library(sf)
    
    SD_ap <- SD %>%
      st_transform(5070) %>% #convert from lat/long to projected coord system
      mutate(area = st_area(.),
             length_perim = lwgeom::st_perimeter(.))
    
    #-----
      GEOID  POP                       geometry              area         perim
    1     1 6582 MULTIPOLYGON (((860088.4 12...   357945281 [m^2]  139932.2 [m]
    2     2 3199 MULTIPOLYGON (((843395.6 11...   137754181 [m^2]  100879.4 [m]
    

    Then calculate the overlapping perimeters. Convert the polygons to lines, then run st_intersection. Instead of the 35 polygons, this returns 141 line segments, distinguished by original overlapping polygons (origins)

    SD_overlap <- SD_ap %>%
      st_cast('MULTILINESTRING') %>%
      st_intersection() %>%
      mutate(length_intersection = st_length(.)) %>%
      filter(length_intersection > units::as_units(0,'m'))
    
    #----
         GEOID  POP              area  length_perim n.overlaps origins length_intersection
    1        1 6559   357945281 [m^2]  139932.2 [m]          2    1, 2        6761.611 [m]
    2        2 4781   137754181 [m^2]  100879.4 [m]          2    2, 3       24201.037 [m]
    2.2      2 4781   137754181 [m^2]  100879.4 [m]          2    2, 4       31628.198 [m]
    3        3 3424   295676500 [m^2]  212930.4 [m]          2    3, 4       22967.725 [m]
    

    Plot the first 4 GEOIDs to check this is working out as expected

    ggplot(data = filter(SD, GEOID %in% 1:4),
           aes(fill = as.factor(GEOID),
               label = GEOID)) +
      geom_sf(alpha = .2) +
      geom_sf(data = filter(SD_overlap, map_lgl(origins, ~all(.x %in% 1:4))), 
              aes(color = map_chr(origins, ~paste(.x, collapse = ','))),
              fill = NA,
              lwd = 2,
              show.legend = 'line') +
      geom_sf_label(fill = "white",  # override the fill from aes()
                    fun.geometry = sf::st_centroid) +
      guides(fill = 'none')+
      labs(color='Overlap')
    
    

    enter image description here


    To answer the question of what populations are adjacent, you can try something like this.

    sum_adj_pops <- function(GEOID_i){
      
      ID_adj <- SD_overlap %>%
        st_drop_geometry() %>%
        filter(map_lgl(origins, ~GEOID_i %in% .x)) %>%
        unnest_longer(origins) %>%
        filter(origins != GEOID_i) %>%
        pull(origins)
      
      SD %>%
        st_drop_geometry() %>%
        filter(GEOID %in% ID_adj) %>%
        summarize(POP_adj = sum(POP)) %>%
        pull(POP_adj)
    }
    
    SD %>%
      mutate(POP_adj = map_dbl(GEOID, sum_adj_pops))
    
    #----
       GEOID  POP                       geometry POP_adj
    1      1 6559 MULTIPOLYGON (((-86.64623 3...   24024
    2      2 4781 MULTIPOLYGON (((-86.861 33....   27132
    3      3 3424 MULTIPOLYGON (((-86.97402 3...   21907
    4      4 4230 MULTIPOLYGON (((-86.74407 3...   22259
    5      5 7054 MULTIPOLYGON (((-87.78275 3...   22021
    

    Data

    url_shp <- 'https://github.com/ahadzaman1002/example_data/raw/main/fe_2007_01_sldu00.zip'
    
    tmp_zip <- tempfile(fileext = '.zip')
    tmp_dir <- tempdir()
    download.file(url = url_shp, tmp_zip)
    unzip(tmp_zip, exdir = tmp_dir)
    
    path_shp <- list.files(tmp_dir, '.shp$', full.names = TRUE)
    
    set.seed(415) #for reproducibility with sample()
    
    SD <- st_read(path_shp) %>%
      mutate(POP = sample(2000:8000, 35)) %>%
      rowid_to_column('GEOID')