Search code examples
rmapsr-sfr-sp

Extract centroid latitude and longitude for each county in the U.S. using R


How can I get a data frame listing each state and county in the U.S. along with the latitude and longitude of the center of each county?

A version of this question has been asked many times over the past 10-15 years and many of the answers use packages or functions that are no longer available (see example here using the defunct R package GISTools).

I have tried using a more recent answer by @sebdalgarno from the link above that used the state of North Carolina as an example.

library(sf)
library(ggplot2)
library(regos) # not in the original answer but needed for this example

# I transform to utm because st_centroid is not recommended for use on long/lat 
nc <- st_read(system.file('shape/nc.shp', package = "sf")) %>% 
  st_transform(32617)

# using rgeos
sp_cent <- gCentroid(as(nc, "Spatial"), byid = TRUE)

# using sf
sf_cent <- st_centroid(nc)

# plot both together to confirm that they are equivalent
ggplot() + 
  geom_sf(data = nc, fill = 'white') +
  geom_sf(data = sp_cent %>% st_as_sf, color = 'blue') + 
  geom_sf(data = sf_cent, color = 'red') 

However, the problem with the answer above is two-fold: 1) it does not work with other states (I tried Virginia by replacing nc with va, see below) and 2) the example is only for one state and I want all 50.

# I transform to utm because st_centroid is not recommended for use on long/lat 

va <- st_read(system.file('shape/va.shp', package = "sf")) %>% 
  st_transform(32617)

# Error: `dsn` must point to a source, not an empty string.

I would like a data frame that has four columns which include state name, county name, latitude, and longitude of the center of each county.


Solution

  • In case anyone else comes across this question @JindraLacko answer was very helpful. Below is some code to answer the specific question regarding the requested data frame.

    library(tidyverse) # to extract lat and long from a geometry object
    library(dplyr) # data wrangling
    library(sf) # for getting centroids
    library(tigris) # for getting county shapefile
    library(cdlTools) # converting state fips to state name
    
    # includes all US territories
    
    centroids <- counties(resolution = "20m") %>% 
      st_centroid()
    
    # convert state fips to state name
    
    centroids$STATE_NAME <- fips(centroids$STATEFP, to = 'Name')
    
    # isolating the lower 48 states and Washington DC
    
    centroids$STATEFP <- as.numeric(centroids$STATEFP)
    
    lower48_cent <- centroids[centroids$STATEFP %in% c(1,3:14,16:56),]
    
    # Extracting centroid latitude and longitude from the geometry column
    
    lower48_cent <- lower48_cent %>%
      mutate(lat = unlist(map(lower48_cent$geometry,2)),
             lon = unlist(map(lower48_cent$geometry,1)))
    
    # subsetting down to the final data frame of state name, county, lat and long centroids
    
    lower48_cent <- as.data.frame(lower48_cent[,c(19,6,20,21)])
    lower48_cent <- lower48_cent[,1:4]
    
    View(lower48_cent)