I have two spatial data frames that I've loaded using the sf
package in R. In the first data frame I have (61) US counties, while in the second I have a subpart of each of these (61) counties. Both data frames have the same identifying columns and the same values for these identifying columns, so for each county in the first data frame there is one, and only, corresponding subpart in the second data frame. My goal is: I would like to obtain a third spatial data frame whose geometry column is the difference between the county and its subpart.
I tried to implement the above idea basically using the following code:
library(tidyverse)
library(magrittr)
library(sf)
# Loading shapefiles
dfc <- st_read('data/counties.shp') %>% # Shapefile with (61) US counties
arrange(., st, cty) # Columns that uniquely identify observations
dfp <- st_read('data/county_parts.shp') %>% # Shapefile with (61) US county parts
arrange(., st, cty) # Columns that uniquely identify observations
# Geometry difference between each corresponding pair of observations
geom <- map(seq(nrow(dfc)), function(r)
st_difference(dfc[r, 'geo_cty'], dfp[r, 'geo_sub'])
(In the above code, geo_cty
is the column with the county geometry and geo_sub
is the county geometry subpart. Once sorted by st
and cty
, each row r
in dfc
and dfp
have have the same (st, cty)
values, and so the row geo_cty
covers geo_sub
.)
The above code ran smoothly, but I have not succeeded in creating a spatial data frame whose geometry column is the geom
list that I've created. For instance, I tried running the following code:
df <- bind_cols(dfc, select(dfp, geo_sub), geom)
and I've got the following error message:
Error in `stop_vctrs()`:
! Can't recycle `..1` (size 61) to match `..58` (size 0).
Run `rlang::last_error()` to see where the error occurred.
How can I create the desired spatial data frame? What is the best way to implement the idea that I have explained?
I'd rather use join here to make sure geometries are correctly aligned, this would also handle cases where dataset lengths differ for whatever reason (the following example generates such input). Join creates an sf
object with 2 geometry columns that can be conveniently mutated with map2()
+ st_difference()
:
library(sf)
#> Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
library(patchwork)
### prepare reprex
# sf example dataset
nc <-
read_sf(system.file("shape/nc.shp", package="sf")) |>
select(CNTY_ID, NAME)
# reduce the sample to Johnston and neighbouring polygons
nc <- st_filter(nc, st_buffer(nc[nc$NAME == "Johnston",], 10))
# generate "subregions", buffers around centroids;
# drop few records (20%) to simulte a partially matching datasest
set.seed(123)
nc_sub <- nc |>
select(-NAME) |>
st_centroid() |>
st_buffer(10000) |>
slice_sample(prop = .8)
# -------------------------------------------------------------------------
# join datasets, the other sf object (left_join y arg) must be converted to
# tibble/data.frame first; result is an sf object with 2 geometry columns
nc <- nc_sub |>
st_set_geometry("geo_sub") |>
as_tibble() |>
left_join(x = nc, y = _, by = "CNTY_ID")
# find difference between geom columns,
# store results in 3rd geomerty column (geom_diff)
nc <- nc |>
mutate(geo_diff = purrr::map2(geometry, geo_sub, st_difference) |>
st_sfc(crs = st_crs(geometry)))
# we can change active geometry and drop other geometry columns to get a
# regular single-geometry sf object, if needed:
nc_diff <- st_set_geometry(nc, "geo_diff") |> select(!starts_with("geo"))
Resulting sf
object:
nc_diff
#> Simple feature collection with 8 features and 2 fields
#> Geometry type: GEOMETRY
#> Dimension: XY
#> Bounding box: xmin: -79.21665 ymin: 34.55375 xmax: -77.67122 ymax: 36.26004
#> Geodetic CRS: NAD27
#> # A tibble: 8 × 3
#> CNTY_ID NAME geo_diff
#> <dbl> <chr> <GEOMETRY [°]>
#> 1 1897 Franklin MULTIPOLYGON (((-78.25455 35.81553, -78.26685 35.84838, -78.…
#> 2 1913 Nash POLYGON ((-78.20562 35.7254, -78.21155 35.73819, -78.23405 3…
#> 3 1938 Wake POLYGON ((-78.99881 35.60132, -78.93889 35.76144, -78.94444 …
#> 4 1979 Wilson POLYGON ((-78.1249 35.59752, -78.16245 35.68298, -78.1618 35…
#> 5 1989 Johnston MULTIPOLYGON (((-78.53874 35.31512, -78.53947 35.33646, -78.…
#> 6 2029 Wayne POLYGON ((-78.16518 35.19322, -78.2574 35.22095, -78.31013 3…
#> 7 2030 Harnett POLYGON ((-78.71606 35.25998, -78.81239 35.25872, -78.87457 …
#> 8 2083 Sampson POLYGON ((-78.11374 34.69918, -78.15676 34.67715, -78.25681 …
Visualize:
g1 <- ggplot(nc) +
geom_sf() +
ggtitle("regions") +
theme_bw()
g2 <- ggplot() +
geom_sf(data = st_union(nc), fill = NA) +
geom_sf(data = nc_sub) +
ggtitle("sub-regions") +
theme_bw() +
theme(axis.text.y = element_blank())
g3 <- ggplot(nc_diff) +
geom_sf() +
ggtitle("st_difference()") +
theme_bw() +
theme(axis.text.y = element_blank())
wrap_plots(g1,g2,g3)
Input dataset, a small subset of nc.shp
:
nc
#> Simple feature collection with 8 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -79.21665 ymin: 34.55375 xmax: -77.67122 ymax: 36.26004
#> Geodetic CRS: NAD27
#> # A tibble: 8 × 3
#> CNTY_ID NAME geometry
#> * <dbl> <chr> <MULTIPOLYGON [°]>
#> 1 1897 Franklin (((-78.25455 35.81553, -78.26685 35.84838, -78.30841 35.8934…
#> 2 1913 Nash (((-78.18693 35.72511, -78.20562 35.7254, -78.21155 35.73819…
#> 3 1938 Wake (((-78.92107 35.57886, -78.99881 35.60132, -78.93889 35.7614…
#> 4 1979 Wilson (((-78.06533 35.58204, -78.1249 35.59752, -78.16245 35.68298…
#> 5 1989 Johnston (((-78.53874 35.31512, -78.53947 35.33646, -78.60082 35.4030…
#> 6 2029 Wayne (((-78.16319 35.18229, -78.16518 35.19322, -78.2574 35.22095…
#> 7 2030 Harnett (((-78.61274 35.24383, -78.71606 35.25998, -78.81239 35.2587…
#> 8 2083 Sampson (((-78.11377 34.72099, -78.11374 34.69918, -78.15676 34.6771…
nc
after joining (note missing geo_sub
records) and st_difference()
, 3 geometry columns:
nc
#> Simple feature collection with 8 features and 2 fields
#> Active geometry column: geometry
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -79.21665 ymin: 34.55375 xmax: -77.67122 ymax: 36.26004
#> Geodetic CRS: NAD27
#> # A tibble: 8 × 5
#> CNTY_ID NAME geometry geo_sub
#> * <dbl> <chr> <MULTIPOLYGON [°]> <POLYGON [°]>
#> 1 1897 Franklin (((-78.25455 35.81553, -78.26685 3… EMPTY
#> 2 1913 Nash (((-78.18693 35.72511, -78.20562 3… ((-77.92674 35.88657, -7…
#> 3 1938 Wake (((-78.92107 35.57886, -78.99881 3… ((-78.65973 35.87538, -7…
#> 4 1979 Wilson (((-78.06533 35.58204, -78.1249 35… ((-77.83458 35.6446, -77…
#> 5 1989 Johnston (((-78.53874 35.31512, -78.53947 3… EMPTY
#> 6 2029 Wayne (((-78.16319 35.18229, -78.16518 3… ((-77.89675 35.34607, -7…
#> 7 2030 Harnett (((-78.61274 35.24383, -78.71606 3… ((-78.93731 35.44001, -7…
#> 8 2083 Sampson (((-78.11377 34.72099, -78.11374 3… ((-78.29317 35.05105, -7…
#> # ℹ 1 more variable: geo_diff <GEOMETRY [°]>
Created on 2023-11-01 with reprex v2.0.2