I have a shape file at Sample Data
I read the data in R as sf object named "SD" and have columns "GEOID" and "geometry" along with other columns. I created a new column in the data named "POP"
SD$POP <- sample(2000:8000, 35)
First, I want to measure the area and perimeter for each "GEOID".
Second, I need to find how ratio of each GEOID's perimeter shared with it's surrounding GEOIDs. For example: say GEOID 09 is surrounded by GEOID 06, 08, 10, 13. I want to find what percent of GEOID 09's perimeter is shared with 06, 08, 10, and 13 individually. Then Create new columns "POP_PERCENT" and "ADJ_POP" which will be
ADJ-POP = SUM OF POP for GEOID 06, 08, 10, and 13' .
POP_PERCENT = Percent share with 06 + Percent share with 08 + Percent share with 10 +
Percent share with 13
My apologies for verbose question. I would really appreciate any help.
I think generally what you need is st_intersection
plus some data manipulation. The dataset does not have a GEOID
column, and I'm not completely clear what the expected output for POP_PERCENT
is (adjacent population weighted by percent perimeter overlap?). So this isn't a totally complete answer.
Here is a start using sf
and some other tidyverse
functions. This calculates the area, perimeter, and intersections of each polygon. I created a GEOID
column from the rownames
. Hopefully that is enough to get you going in the right direction.
First calculate area and perimeter of each
library(tidyverse)
library(sf)
SD_ap <- SD %>%
st_transform(5070) %>% #convert from lat/long to projected coord system
mutate(area = st_area(.),
length_perim = lwgeom::st_perimeter(.))
#-----
GEOID POP geometry area perim
1 1 6582 MULTIPOLYGON (((860088.4 12... 357945281 [m^2] 139932.2 [m]
2 2 3199 MULTIPOLYGON (((843395.6 11... 137754181 [m^2] 100879.4 [m]
Then calculate the overlapping perimeters. Convert the polygons to lines, then run st_intersection
. Instead of the 35 polygons, this returns 141 line segments, distinguished by original overlapping polygons (origins
)
SD_overlap <- SD_ap %>%
st_cast('MULTILINESTRING') %>%
st_intersection() %>%
mutate(length_intersection = st_length(.)) %>%
filter(length_intersection > units::as_units(0,'m'))
#----
GEOID POP area length_perim n.overlaps origins length_intersection
1 1 6559 357945281 [m^2] 139932.2 [m] 2 1, 2 6761.611 [m]
2 2 4781 137754181 [m^2] 100879.4 [m] 2 2, 3 24201.037 [m]
2.2 2 4781 137754181 [m^2] 100879.4 [m] 2 2, 4 31628.198 [m]
3 3 3424 295676500 [m^2] 212930.4 [m] 2 3, 4 22967.725 [m]
Plot the first 4 GEOID
s to check this is working out as expected
ggplot(data = filter(SD, GEOID %in% 1:4),
aes(fill = as.factor(GEOID),
label = GEOID)) +
geom_sf(alpha = .2) +
geom_sf(data = filter(SD_overlap, map_lgl(origins, ~all(.x %in% 1:4))),
aes(color = map_chr(origins, ~paste(.x, collapse = ','))),
fill = NA,
lwd = 2,
show.legend = 'line') +
geom_sf_label(fill = "white", # override the fill from aes()
fun.geometry = sf::st_centroid) +
guides(fill = 'none')+
labs(color='Overlap')
To answer the question of what populations are adjacent, you can try something like this.
sum_adj_pops <- function(GEOID_i){
ID_adj <- SD_overlap %>%
st_drop_geometry() %>%
filter(map_lgl(origins, ~GEOID_i %in% .x)) %>%
unnest_longer(origins) %>%
filter(origins != GEOID_i) %>%
pull(origins)
SD %>%
st_drop_geometry() %>%
filter(GEOID %in% ID_adj) %>%
summarize(POP_adj = sum(POP)) %>%
pull(POP_adj)
}
SD %>%
mutate(POP_adj = map_dbl(GEOID, sum_adj_pops))
#----
GEOID POP geometry POP_adj
1 1 6559 MULTIPOLYGON (((-86.64623 3... 24024
2 2 4781 MULTIPOLYGON (((-86.861 33.... 27132
3 3 3424 MULTIPOLYGON (((-86.97402 3... 21907
4 4 4230 MULTIPOLYGON (((-86.74407 3... 22259
5 5 7054 MULTIPOLYGON (((-87.78275 3... 22021
Data
url_shp <- 'https://github.com/ahadzaman1002/example_data/raw/main/fe_2007_01_sldu00.zip'
tmp_zip <- tempfile(fileext = '.zip')
tmp_dir <- tempdir()
download.file(url = url_shp, tmp_zip)
unzip(tmp_zip, exdir = tmp_dir)
path_shp <- list.files(tmp_dir, '.shp$', full.names = TRUE)
set.seed(415) #for reproducibility with sample()
SD <- st_read(path_shp) %>%
mutate(POP = sample(2000:8000, 35)) %>%
rowid_to_column('GEOID')