Search code examples
rgeospatialgeocodingspatialr-sf

How to keep the only intersection of the spatial features & remove everything outside of a boundary?


I am trying to get rid of the spatial geometry that falls outside of the shapefile boundary I read. Is it possible to do this without manual software like Photoshop? Or me manually removing the tracts which span outside of the city's boundries. For example, I took out 14 tracts, this is there result: enter image description here

I have provided all of the subset of the data and the key to test it yourself. Code script is below, and the dataset is https://github.com/THsTestingGround/SO_geoSpatial_crop_Quest.

I have done st_intersection(gainsville_df$Geomtry$x, gnv_poly$geometry) after I converted Geomtry to the sf, but I don't know what to do next to get rid of those portions.

library(sf)
library(tigris)
library(tidyverse)
library(tidycensus)
library(readr)
library(data.table)

#reading the shapefile
gnv_poly <-  sf::st_read("PATH\\GIS_cgbound\\cgbound.shp") %>% 
                sf::st_transform(crs = 4326) %>% 
                sf::st_polygonize() %>% 
                sf::st_union()

#I have taken the "geometry" of latitude and longitude because it was corrupting my csv, but we can rebuild like so
gnv_latlon <- readr::read_csv("new_dataframe_data.csv") %>% 
                dplyr::select(ID,
                              Latitude,
                              Longitude,
                              Location) %>%
                dplyr::mutate(Location = gsub(x= Location, pattern = "POINT \\(|\\)", replacement = "")) %>% 
                tidyr::separate(col = "Location", into = c("lon", "lat"), sep = " ") %>% 
                sf::st_as_sf(coords = c(4,5)) %>% 
                sf::st_set_crs(4326)

#then you can match the ID from gnv_latlon to 
gainsville_df <- fread("new_dataframe_data.csv", drop = c("Latitude","Longitude", "Census Code"))

gainsville_df <-  merge(gnv_latlon, gainsville_df, by = "ID")

#remove latitude and longitude points that fall outside of the polygon
dplyr::mutate(gainsville_df, check = as.vector(sf::st_intersects(x = gnv_latlon, y = gnv_poly, sparse = FALSE))) -> outliers_before
sf::st_filter(x= outliers_before, y= gnv_poly, predicate= st_intersects) -> gainsville_df

#Took out my census api key because of a feed back from a SO member. Please add a comment
#if you would like my census key.

#I use this function from tidycensus to retrieve the country shapfiles. 

alachua <- tidycensus::get_acs(state = "FL", county = "Alachua",  geography = "tract", geometry = T, variables = "B01003_001")
gainsville_df$Geomtry <- NULL
gainsville_df$Geomtry <- alachua$geometry[match(as.character(gainsville_df$`Geo ID`), alachua$GEOID)]

#gets us the first graph with bounry
ggplot() + 
  geom_sf(data = gainsville_df,aes(geometry= Geomtry, fill= Population), alpha= 0.2) +
  coord_sf(crs = "+init=epsg:4326")+ 
  geom_sf(data= gnv_poly) #with alpha added, we get the transparent boundary

Now I would like to get the second image without doing any future manual manipulation.
From this..... enter image description here

to this, possible ? enter image description here

Found this Compare spatial polygons and keep or delete common boundaries in R but the person here wanted to remove just the boundaries from one shapefile. And i tried to manipulate it to nothing.

EDIT Here is what I've tried after SymbolixAU direction, but my idx variable is number from 1:7

fl <- sf::st_read("PATH\\GIS_cgbound\\cgbound.shp") %>%  sf::st_transform(crs = 4326)
gainsville_df$Geomtry <-  sf::st_as_sf(gainsville_df$Geomtry) %>%  sf::st_transform(crs= 4326)

#normal boundry plot
plot( fl[, "geometry"] )

# And we can make a boundary by selecting some of the goemetries and union-ing them
boundary <- fl[ gnv_poly$geometry %in% gainsville_df$Geomtry, ]
boundary <- sf::st_union( fl ) %>% sf::st_as_sf()

## So now 'boundary' represents the area you want to cut out of your total shapes

## So you can find the intersection by an appropriate method
## st_contains will tell you all the shapes from 'fl' contained within the boundary
idx <- sf::st_contains(x = boundary, y = fl)

#doesn't work, thus no way of knowing the overlaps
#plot( fl[ idx[[1]], "geometry" ] ) 

#several more plots which i can't make sense of
plot( fl[ st_intersection(gainsville_df$Geomtry, gnv_poly$geometry), ])
plot(gainsville_df$Geomtry) #this just plots tracts

Solution

  • I'm going to use library(mapdeck) to plot everything, mainly because it's a library I've developed so I'm very familiar with it. It uses Mapbox maps, so you'll need a Mapbox Token to use it.

    First, get the data

    library(sf)
    library(data.table)
    
    fl <- sf::st_read("~/Documents/github/SO_geoSpatial_crop_Quest/GIS_cgbound/cgbound.shp") %>%  sf::st_transform(crs = 4326)
    gainsville_df <- fread("~/Documents/github/SO_geoSpatial_crop_Quest/new_dataframe_data.csv")
    sf_gainsville <- sf::st_as_sf(gainsville_df, wkt = "Location")
    
    ## no need to transform, because it's already in Lon / Lat (?)
    sf::st_crs( sf_gainsville ) <- 4326
    #install.packages("tidycensus")
    library(tidycensus)
    
    tidycensus::census_api_key("21adc0b3d6e900378af9b7910d04110cdd38cd75", install = T, overwrite = T)
    alachua <- tidycensus::get_acs(state = "FL", county = "Alachua",  geography = "tract", geometry = T, variables = "B01003_001")
    alachua <- sf::st_transform( alachua, crs = 4326 )
    

    This is what we're working with. I'm plotting the polygons and the boundary path

    library(mapdeck)
    
    set_token( secret::get_secret("MAPBOX") )
    
    ## this is what the polygons and the Alachua boundary looks like
    mapdeck() %>%
      add_polygon(
        data = alachua
        , fill_colour = "NAME"
      ) %>%
      add_path(
        data = fl
        , stroke_width = 50
      )
    

    enter image description here

    To start with I'm going to make a polygon of the boundary

    boundary_poly <- sf::st_cast(fl, "POLYGON")
    

    Then we can get those polygons completely within the boundary

    idx <- sf::st_contains(
      x = boundary_poly
      , y = alachua
    )
    
    idx <- unlist( sapply( idx, `[`) )
    
    sf_contain <- alachua[ idx, ]
    
    mapdeck() %>%
      add_polygon(
        data = sf_contain
        , fill_colour = "NAME"
      ) %>%
      add_path(
        data = fl
      )
    

    enter image description here

    And those which 'touch' the boundary

    idx <- sf::st_crosses(
      x = fl
      , y = alachua
    )
    
    idx <- unlist( idx )
    
    sf_crosses <- alachua[ idx, ]
    
    mapdeck() %>%
      add_polygon(
        data = sf_crosses
        , fill_colour = "NAME"
      ) %>%
      add_path(
        data = fl
      )
    

    enter image description here

    Those which are completely on the outside are the polygons that neither touch the boundary, nor are inside it

    sf_outside <- sf::st_difference(
      x = alachua
      , y = sf::st_union( sf_crosses )
    )
    
    sf_outside <- sf::st_difference(
      x = sf_outside
      , y= sf::st_union( sf_contain )
    )
    
    mapdeck() %>%
      add_polygon(
        data = sf_outside
        , fill_colour = "NAME"
      ) %>%
      add_path(
        data = fl
      )
    

    enter image description here

    what we need is a way to 'cut' those which touch the boundary ( sf_crosses) so we have a 'inside' and an 'outside' section for each polygon

    We need to operate on each polygon at a time and 'split' it by the lines which intersect it.

    There may be a way to do this with lwgeom::st_split, but I kept getting errors

    To help with this I'm using a development version of my sfheaders library

    # devtools::install_github("dcooley/sfheaders")
    
    res <- lapply( 1:nrow( sf_crosses ), function(x) {
      
      ## get the intersection of the polygon and the boundary
      sf_int <- sf::st_intersection(
        x = sf_crosses[x, ]
        , y = fl
      )
      
      ## we only need lines, not MULTILINES
      sf_lines <- sfheaders::sf_cast(
        sf_int, "LINESTRING"
      )
      
      ## put a small buffer around the lines to make them polygons
      sf_polys <- sf::st_buffer( sf_lines, dist = 0.0005 )
      
      ## Find the difference of these buffers and the polygon
      sf_diff <- sf::st_difference(
        sf_crosses[x, ]
        , sf::st_union( sf_polys )
      )
      
      ## this result is a MULTIPOLYGON, which is the original polygon from 
      ## sf_crosses[x, ], split by the lines which cross it
      sf_diff
    })
    
    
    ## The result of this is all the polygons which touch the boundary path have been split
    sf_res <- do.call(rbind, res)
    

    so sf_res should now be all the polygons which 'touch' the path, but split where the path crosses them

    mapdeck() %>%
      add_polygon(
        data = sf_res
        , stroke_colour = "#FFFFFF"
        , stroke_width = 100
      ) %>%
      add_path(
        data = fl
        , stroke_colour = "#FF00FF"
      )
    

    enter image description here

    And we can see this by zooming in

    enter image description here

    Now we can find which ones are inside and outside the path

    sf_in <- sf::st_join(
      x = sf_res
      , y = boundary_poly
      , left = FALSE
    )
    
    sf_out <- sf::st_difference(
      x = sf_res
      , y = sf::st_union( boundary_poly )
    )
    
    
    mapdeck() %>%
      add_path(
        data = fl
        , stroke_width = 50
        , stroke_colour = "#000000"
      ) %>%
      add_polygon(
        data = sf_in
        , fill_colour = "NAME"
        , palette = "viridis"
        , layer_id = "in"
      ) %>%
      add_polygon(
        data = sf_out
        , fill_colour = "NAME"
        , palette = "plasma"
        , layer_id = "out"
      )
    

    enter image description here

    Now have all the objects we care about

    • sf_contain - all the polygons completely within the bondary
    • sf_in - all the polygons touching the boundary on the inside
    • sf_out - all the polygons touching the boundary on the outside
    • sf_outside - all the other polygons
    mapdeck() %>%
      add_path(
        data = fl
        , stroke_width = 50
        , stroke_colour = "#000000"
      ) %>%
      add_polygon(
        data = sf_contain
        , fill_colour = "NAME"
        , palette = "viridis"
        , layer_id = "contained_within_boundary"
      ) %>%
      add_polygon(
        data = sf_in
        , fill_colour = "NAME"
        , palette = "cividis"
        , layer_id = "touching_boundary_inside"
      ) %>%
      add_polygon(
        data = sf_out
        , fill_colour = "NAME"
        , palette = "plasma"
        , layer_id = "touching_boundary_outside"
      ) %>%
      add_polygon(
        data = sf_outside
        , fill_colour = "NAME"
        , palette = "viridis"
        , layer_id = "outside_boundary"
      )
    

    enter image description here