Search code examples
rgisgeospatialspatialgeo

Getting the center latitude and longitude of counties in the United States and Canada


I am able to get the center latitude and longitude of each US county and Canadian census geographic unit (CGU). However, there are some false counties that arise through the process, particularly in the Great Lakes region. In this region, there are some random polygons within the Great Lakes (state/international borders?) that produce these false counties. Is there a way to remove these aquatic polygons so that I only get the central latitude and longitude of the land polygons?

library(ggplot2)
library(dplyr)
library(geodata)
library(sf)

# Gets the polygons of each US county and Canadian CGU

aa <- gadm(country = c('CAN', 'USA'), level = 2, path=tempdir()) %>%
  st_as_sf()

# Gets the central latitude and longitude of each county and CGU

centroids_aa <- aa %>% 
  st_centroid(of_largest_polygon = TRUE)

# Map of centroids

ggplot() +
  geom_sf(data = aa,
          mapping = aes(geometry = geometry),
          color = "black") +
  geom_sf(data = centroids_aa,
          mapping = aes(geometry = geometry),
          color = "black",
          fill = 'red',
          shape = 21,
          size = 2) +
  coord_sf(xlim = c(-93, -81.5), ylim = c(41.5, 50), expand = F) +
  theme_bw() +
  theme(plot.margin = grid::unit(c(2,2,0,2), "mm"),
        text = element_text(size = 16),
        axis.text.x = element_text(size = 14, color = "black"),
        axis.text.y = element_text(size = 14, color = "black"),
        panel.grid.major = element_blank())

The obviously incorrect counties and CGUs are yellow, the red ones appear correct. I say obvious because there are no counties or CGUs in the aquatic sections of the Great Lakes. enter image description here


Solution

  • Looks like the TYPE_2 column (and ENGTYPE_2, which might be better) of the data set is Water body for those records - can you filter them out based on that? So maybe:

    aa <- gadm(country = c('CAN', 'USA'), level = 2, path=tempdir()) %>%
      st_as_sf() %>% 
      filter(ENGTYPE_2 != 'Water body')
    
    centroids_aa <- aa %>% 
      st_centroid(of_largest_polygon = TRUE)
    

    FWIW, changing the geom_sf for centroids_aa shows the values (moving and changing the fill argument):

    ggplot() +
      geom_sf(data = aa,
              mapping = aes(geometry = geometry),
              color = "black") +
      geom_sf(data = centroids_aa,
              mapping = aes(geometry = geometry, fill = TYPE_2),
              color = "black",
              shape = 21,
              size = 2) +
      coord_sf(xlim = c(-93, -81.5), ylim = c(41.5, 50), expand = F) +
      theme_bw() +
      theme(plot.margin = grid::unit(c(2,2,0,2), "mm"),
            text = element_text(size = 16),
            axis.text.x = element_text(size = 14, color = "black"),
            axis.text.y = element_text(size = 14, color = "black"),
            panel.grid.major = element_blank())