Search code examples
rggplot2geospatialtigris

Use merge to combine shapefile and data to map using US zip codes


I'm trying to make a map based on zip codes in California, US.

Using the solution from my previous question (How to create a map using zip codes in R?), I obtained CA zip codes using:

library(tigris)
library(ggplot2)

zipcode <- zctas(year = 2000, state = "CA", cb = TRUE) 

Then, I have my cleaned zip code file (named "ca") that has a column of zip codes and a column of count number, as integer.

I combined the two dataset using the following, zipcode goes first because that will save geometry.

zipcodes <- merge(zipcode, ca, by = "ZCTA", all.x = TRUE) %>%
  rename(COUNT = "Count")

Here's the code to plot:

ggplot() +
  geom_sf(data = zipcodes, aes(fill = COUNT)) +
  scale_fill_gradientn(colors = c("red", "cornsilk", "#f0d080"),
                       values = c(1, 14150, 28307, na.value = "gray")) + #28307 is the max count.
  theme_minimal() 

It's producing the following map, which doesn't seem to like my gradient code...

enter image description here

Some rows do have NA for COUNT because my dataset doesn't have a count for every zip, but I want to keep their geometry for the purpose of plotting a complete map.

When plotting, I also got the error message saying:

Warning messages:
1: In xy.coords(x, y, setLab = FALSE) : NAs introduced by coercion
2: In xy.coords(x, y, setLab = FALSE) : NAs introduced by coercion
3: In xy.coords(x, y, setLab = FALSE) : NAs introduced by coercion

Solution

  • According to the docs (see ?scale_fill_gradientn) values

    gives the position (between 0 and 1) for each colour in the colours vector.

    Hence, you have to rescale the the vector passed to values to the range of 0 to 1, e.g. by dividing by the max value or using scales::rescale(c(1, 14150, 28307)) which will also put the first value at the 0 position.

    Besides that, you included the na.value in the values vector.

    library(tigris)
    library(ggplot2)
    library(dplyr)
    
    zipcode <- zctas(year = 2000, state = "CA", cb = TRUE)
    
    set.seed(123)
    
    ca <- data.frame(
      ZCTA = sample(zipcode$ZCTA, 200),
      Count = sample(28307, 200)
    )
    
    zipcodes <- merge(zipcode, ca, by = "ZCTA", all.x = TRUE) %>%
      rename(COUNT = "Count")
    
    ggplot() +
      geom_sf(data = zipcodes, aes(fill = COUNT)) +
      scale_fill_gradientn(
        colors = c("red", "cornsilk", "#f0d080"),
        values = scales::rescale(c(1, 14150, 28307)),
        na.value = "gray"
      ) +
      theme_minimal()