Search code examples
rr-sfr-sp

Is there a way to identify geographies that overlap areas, not just borders?


I'm trying to take a dataset of ZIP codes and limit it to just ZIP codes within Chicago. However, any way I try to do this merge captures either far too many or too few ZIP codes. Here's a reproducible example:

## Load packages
library(tigris)
library(sf)
library(ggplot2)

## Load shapefiles
ZIPs <- tigris::zctas(cb = TRUE) 
ZIPs <- sf::st_as_sf(ZIPs)

places <- tigris::places(state = "17", cb = T)
chicago <- places[places$NAME == "Chicago",]
chicago <- sf::st_as_sf(chicago)

## Filter ZIPs to those within Chicago using st_intersects
overlap <- st_filter(ZIPs, chicago, .predicate = st_intersects) #Using st_intersects captures too many ZIPs

## Visualize ZIPs vs Chicago
ggplot() +
  geom_sf(data = overlap, color = "black", size = 1) +
  geom_sf(data = chicago, color = NA, fill = "blue", alpha = .25)

first attempt - captures too many ZIP codes

## Try again using st_within
overlap <- st_filter(ZIPs, chicago, .predicate = st_within) #Using st_within captures too few ZIPs

## Visualize ZIPs vs Chicago
ggplot() +
  geom_sf(data = overlap, color = "black", size = 1) +
  geom_sf(data = chicago, color = NA, fill = "blue", alpha = .25)

second attempt - captures too few

I've also tried to use sp::over for this task but run into the same problem. There are clearly some ZIPs that are mostly outside of Chicago, but legitimately have some overlap (e.g. the three ZIPs on the top left of the first map). However there are others that only intersect along a border (e.g. top right), and even one that doesn't appear to intersect at all (bottom right). I want to exclude from this map any ZIPs that only intersect by border. Any advice?


Solution

  • Here I proposed a function that can filter the ZIPs based on the ratio of intersected area and the original area compared to a threshold. Below is an example how to use this function. It seems like threshold = 0.3 works pretty well, but you can set any threshold based on your needs.

    ## Load packages
    library(tigris)
    library(sf)
    library(ggplot2)
    library(dplyr)
    
    # A function that can filter ZIPs based on the ratio of intersected area to original area
    # The default of the threshold is set to be 0.3
    # If the ratio is larger than or equal to 0.3, that ZIPs would be kept
    intersection_area <- function(x, y, threshold = 0.3){
      z <- st_intersection(x, y)
      z2 <- z %>% 
        mutate(Area_Inter = as.numeric(st_area(.))) %>%
        select(ZCTA5CE10, Area_Inter) %>%
        st_set_geometry(NULL)
      x2 <- x %>%
        st_filter(y, .predicate = st_intersects)  %>%
        mutate(Area = as.numeric(st_area(.))) %>%
        select(ZCTA5CE10, Area) %>%
        left_join(z2, by = "ZCTA5CE10") %>%
        mutate(Area_Ratio = Area_Inter/Area) %>%
        filter(Area_Ratio >= threshold)
      return(x2)
    }
    
    overlap <- intersection_area(ZIPs, chicago)
    
    ## Visualize ZIPs vs Chicago
    ggplot() +
      geom_sf(data = overlap, color = "black", size = 1) +
      geom_sf(data = chicago, color = NA, fill = "blue", alpha = .25)
    

    enter image description here