Search code examples
rgoogle-mapsggplot2ggmap

Plotting convex hulls in R using ggplot/ggmap


I'm interested in plotting a polygon/convex hull representations of the neighbourhoods in the Ames dataset, similar to the approach used in the O'Reilly version of Tidy-modelling with R.

It appears as though Chapter 4 for the same paragraph has two different images depending on which source you use, the first image is from the O'Reilly version of the book from here and the second is from tmwr.org

O'Reilly version

tmwr.org version

Personally I prefer the first as it's easier to see the overlap in neighbourhoods. Tmwr.org has a GitHub repo however the convex hull code is not captured there.

From their repo I can see they download a shape file from this website and use this for the road mapping in the background and then overlay the polygons.

I'm trying to create the polygons and use google maps as a background however I'm not having much success adding the google maps layer.

At the bottom of the code I can plot the google maps layer but when I introduce ames_sf I get the following error:

Error in `geom_sf()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 4th layer.
Caused by error:
! object 'lon' not found

As you can see from the convex hulls I've produced with ggplot Vs. the google map image they're using different coordinate values which may be the issue. I'm confused as on line 46 ames_sf is defined with crs = 3857 and then towards the bottom on line 90 ggmap(map) is passed to coord_sf where the st_crs is also set to 3857.

Below is the code I've cobbled together (some of which is taken from here) so far and I'm looking for some help layering in the google maps background. Any help that can be shared would be greatly appreciated.

# LOAD THE REQUIRED LIBRARIES
pacman::p_load(sf, ggplot2, dplyr, RColorBrewer, AmesHousing)

# ASSIGN A PALETTE OF 8 COLORS FROM THE 'DARK2' PALETTE TO A VARIABLE
col_vals <- brewer.pal(8, "Dark2")

# CREATE A NAMED VECTOR 'ames_cols' WITH NEIGHBORHOOD NAMES MAPPED TO SPECIFIC COLORS FROM 'col_vals'
# COPIED FROM THE TMWR.ORG REPO
ames_cols <- c(
  North_Ames = col_vals[3],
  College_Creek = col_vals[4],
  Old_Town = col_vals[8],
  Edwards = col_vals[5],
  Somerset = col_vals[8],
  Northridge_Heights = col_vals[4],
  Gilbert = col_vals[2],
  Sawyer = col_vals[7],
  Northwest_Ames = col_vals[6],
  Sawyer_West = col_vals[8],
  Mitchell = col_vals[3],
  Brookside = col_vals[1],
  Crawford = col_vals[4],
  Iowa_DOT_and_Rail_Road = col_vals[2],
  Timberland = col_vals[2],
  Northridge = col_vals[1],
  Stone_Brook = col_vals[5],
  South_and_West_of_Iowa_State_University = col_vals[3],
  Clear_Creek = col_vals[1],
  Meadow_Village = col_vals[1],
  Briardale = col_vals[7],
  Bloomington_Heights = col_vals[1],
  Veenker = col_vals[2],
  Northpark_Villa = col_vals[2],
  Blueste = col_vals[5],
  Greens = col_vals[6],
  Green_Hills = col_vals[8],
  Landmark = col_vals[3],
  Hayden_Lake = "red" # 'Hayden_Lake' SPECIFICALLY ASSIGNED THE COLOR RED
)

# LOAD AND PREPARE THE AMES HOUSING DATA
ames_data <- make_ames() %>%
  janitor::clean_names()

# CONVERT AMES DATA INTO AN SF (SIMPLE FEATURES) OBJECT FOR SPATIAL ANALYSIS AND
# TRANSFORM EPSG 3857 (Pseudo-Mercator, what Google uses)
ames_sf <- st_as_sf(ames_data, coords = c("longitude", "latitude"), crs = 3857)

# CREATE POLYGONS FOR EACH NEIGHBORHOOD BY AGGREGATING POINTS INTO A CONVEX HULL
ames_sf <- ames_sf %>%
  group_by(neighborhood) %>%
  summarize() %>%
  st_convex_hull()

# PLOT THE CONVEX HULLS USING GGPLOT
#ggplot(data = ames_sf) +
#  geom_sf(aes(fill = neighborhood), color = "white", size = 0.5) +
#  scale_fill_manual(values = ames_cols) +
#  theme_minimal() +
#  labs(title = 'Neighborhoods in Ames') +
#  theme(legend.position = "none")

map <- get_map("Ames, Iowa", maptype = "roadmap", zoom = "auto", scale = "auto", source = "google")

# Define a function to fix the bbox to be in EPSG:3857
ggmap_bbox <- function(map) {
  if (!inherits(map, "ggmap")) stop("map must be a ggmap object")
  # Extract the bounding box (in lat/lon) from the ggmap to a numeric vector,
  # and set the names to what sf::st_bbox expects:
  map_bbox <- setNames(
    unlist(attr(map, "bb")),
    c("ymin", "xmin", "ymax", "xmax")
  )

  # Coonvert the bbox to an sf polygon, transform it to 3857,
  # and convert back to a bbox (convoluted, but it works)
  bbox_3857 <- st_bbox(st_transform(st_as_sfc(st_bbox(map_bbox, crs = 4326)), 3857))

  # Overwrite the bbox of the ggmap object with the transformed coordinates
  attr(map, "bb")$ll.lat <- bbox_3857["ymin"]
  attr(map, "bb")$ll.lon <- bbox_3857["xmin"]
  attr(map, "bb")$ur.lat <- bbox_3857["ymax"]
  attr(map, "bb")$ur.lon <- bbox_3857["xmax"]
  map
}

# Use the function:
map <- ggmap_bbox(map)

ggmap(map) +
  coord_sf(crs = st_crs(3857)) + # force the ggplot2 map to be in 3857
  geom_sf(data = ames_sf, aes(fill = neighborhood), color = "white", size = 0.5, alpha = 0.5)

Using ggmaps I can create the polygons and print them to a coordinate grid but I'm struggling to add in the map background.

ggplot coordinate system

google maps coordinate system


Solution

  • When you compare those 2 plots in you question, the first (or at least 2nd) thing you should notice is the difference in lat/lon values. Even when not quite sure what the normal range for lat/lon should be, having housing data from U.S. placed on the Equator (lat ~0) is a quite clear indicator that it's better to take a few steps back.

    Main culprit here is crs handling, with st_as_sf(ames_data, ... ,crs = 3857) you are not actually transforming coordinates but defining lon/lat values as easting and northing of 3857 Pseudo-Mercator, crs here should match the actual crs of input coordinates. To spot such issues early, visualize geospatial data as much as you can, after every step, if needed. And it's far more convenient with something like mapview()

    Instead of ggmap and Google tiles, the following example uses ggspatial and Carto Light render of OpenStreetMap data.

    library(AmesHousing)
    library(sf)
    #> Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
    library(ggplot2)
    library(dplyr)
    library(mapview)
    library(ggspatial)
    
    ames_sf <- make_ames() %>% 
      select(Neighborhood, Latitude, Longitude) %>% 
      janitor::clean_names() %>% 
      # crs in st_as_sf does not transform but defines the projection of source coordinates,
      # when dealing with lat/lon, it's almost always WGS84 aka EPSG:4326
      st_as_sf(coords = c("longitude", "latitude"), crs = "WGS84") %>% 
      group_by(neighborhood) %>% 
      summarise() %>% 
      st_convex_hull()
    print(ames_sf, n = 3)
    #> Simple feature collection with 28 features and 1 field
    #> Geometry type: GEOMETRY
    #> Dimension:     XY
    #> Bounding box:  xmin: -93.69315 ymin: 41.9865 xmax: -93.57743 ymax: 42.06339
    #> Geodetic CRS:  WGS 84
    #> # A tibble: 28 × 2
    #>   neighborhood                                                          geometry
    #> * <fct>                                                            <POLYGON [°]>
    #> 1 North_Ames    ((-93.60507 42.03456, -93.60795 42.03456, -93.62474 42.03461, -…
    #> 2 College_Creek ((-93.6843 42.01339, -93.68792 42.01404, -93.69205 42.01621, -9…
    #> 3 Old_Town      ((-93.62186 42.0258, -93.62374 42.02648, -93.62416 42.02997, -9…
    #> # ℹ 25 more rows
    
    # visualize with mapview to check that coordinates / location are OK
    mapview(ames_sf, zcol = "neighborhood")
    

    ggplot(ames_sf) +
      annotation_map_tile(type = "cartolight", zoomin = 0 ) +
      geom_sf(aes(fill = neighborhood), alpha = .2, show.legend = FALSE)
    #> Zoom: 13
    #> Fetching 16 missing tiles
    #> ...complete!
    

    Created on 2023-11-06 with reprex v2.0.2