How can I get a data frame listing each state and county in the U.S. along with the latitude and longitude of the center of each county?
A version of this question has been asked many times over the past 10-15 years and many of the answers use packages or functions that are no longer available
(see example here using the defunct R package GISTools
).
I have tried using a more recent answer by @sebdalgarno from the link above that used the state of North Carolina as an example.
library(sf)
library(ggplot2)
library(regos) # not in the original answer but needed for this example
# I transform to utm because st_centroid is not recommended for use on long/lat
nc <- st_read(system.file('shape/nc.shp', package = "sf")) %>%
st_transform(32617)
# using rgeos
sp_cent <- gCentroid(as(nc, "Spatial"), byid = TRUE)
# using sf
sf_cent <- st_centroid(nc)
# plot both together to confirm that they are equivalent
ggplot() +
geom_sf(data = nc, fill = 'white') +
geom_sf(data = sp_cent %>% st_as_sf, color = 'blue') +
geom_sf(data = sf_cent, color = 'red')
However, the problem with the answer above is two-fold: 1) it does not work with other states (I tried Virginia by replacing nc
with va
, see below) and 2) the example is only for one state and I want all 50.
# I transform to utm because st_centroid is not recommended for use on long/lat
va <- st_read(system.file('shape/va.shp', package = "sf")) %>%
st_transform(32617)
# Error: `dsn` must point to a source, not an empty string.
I would like a data frame that has four columns which include state name, county name, latitude, and longitude of the center of each county.
In case anyone else comes across this question @JindraLacko answer was very helpful. Below is some code to answer the specific question regarding the requested data frame.
library(tidyverse) # to extract lat and long from a geometry object
library(dplyr) # data wrangling
library(sf) # for getting centroids
library(tigris) # for getting county shapefile
library(cdlTools) # converting state fips to state name
# includes all US territories
centroids <- counties(resolution = "20m") %>%
st_centroid()
# convert state fips to state name
centroids$STATE_NAME <- fips(centroids$STATEFP, to = 'Name')
# isolating the lower 48 states and Washington DC
centroids$STATEFP <- as.numeric(centroids$STATEFP)
lower48_cent <- centroids[centroids$STATEFP %in% c(1,3:14,16:56),]
# Extracting centroid latitude and longitude from the geometry column
lower48_cent <- lower48_cent %>%
mutate(lat = unlist(map(lower48_cent$geometry,2)),
lon = unlist(map(lower48_cent$geometry,1)))
# subsetting down to the final data frame of state name, county, lat and long centroids
lower48_cent <- as.data.frame(lower48_cent[,c(19,6,20,21)])
lower48_cent <- lower48_cent[,1:4]
View(lower48_cent)