Search code examples
rggplot2mapschoroplethchoroplethr

Display values corresponding to the USA states over the state name


I was trying to create the US states map using the package "choroplethr" and using a simple df2 data set (it has the same region and values column) and I used the code provided in package document.

require (choroplethr)

data("df_pop_state")

df2 <- read.csv("ShareDF-chro.csv", header=TRUE, stringsAsFactors=FALSE)



# here is the data ShareDF-chro

region = c("alabama", "alaska", "arizona", "arkansas", 
"california", "colorado", "connecticut", "delaware", "district of columbia", 
"florida", "georgia", "hawaii", "idaho", "illinois", "indiana", 
"iowa", "kansas", "kentucky", "louisiana", "maine", "maryland", 
"massachusetts", "michigan", "minnesota", "mississippi", "missouri", 
"montana", "nebraska", "nevada", "new hampshire", "new jersey", 
"new mexico", "new york", "north carolina", "north dakota", "ohio", 
"oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina", 
"south dakota", "tennessee", "texas", "utah", "vermont", "virginia", 
"washington", "west virginia", "wisconsin", "wyoming"), 

value = c(1.15, 0.11, 6.21, 2.41, 8.42, 13.57, 3.57, 4.55, 7.08, 9.42, 5.21, 
0.108, 9.09, 2.56, 4.51, 9.65, 6.76, 3.54, 0.17, 1.99, 6.66, 
3.88, 7.31, 4.86, 4.85, 2.39, 0.25, 0.05, 0.21, 0.11, 3.86, 0.05, 
7.31, 1.91, 0.41, 4.55, 0.002, 2.65, 3.14, 0.71, 1.94, 0.13, 
2.2, 12.65, 0.05, 0.074, 5.79, 7.5, 0.12, 2.6, 0.33)

df_pop_state$value <- df2$value

state_choropleth(df_pop_state,title = "US State's X-Capital share data",num_colors = 2,legend = "Capital Share")

enter image description here

My question is: How can I insert the corresponding X-capital share values inside the map along with the state's acronyms (while like to keep the acronym's font size bit smaller). Thanks and I appreciate your help.


Solution

  • Here a solution just with ggplot2.

    • Get the polygon data with usmap::us_map. (as you did)
    • Left join with your share data (Capitalise Your Region Names First)
    • Create centroids for the text annotation.
    • Those centroids and the share are best put into a separate data frame
    • Draw polygons with geom_polygon
    • Draw your labels (State abbreviation and shares) with geom_text, using paste.(you can also use annotate)
    • Pass the data separately to each layer. (Empty ggplot main call)

    The advantage is the use of ggplot syntax makes control of color/ fill aesthetic very easy and you can also very easily customise line thickness and size of text.

    As for the state abbreviations, I only used the first to letters - this may not be the official abbreviation. There is most certainly some vector out there how to convert this easily.

    library(usmap)
    library(tidyverse)
    
    us <- usmap::us_map()
    
    region <- str_to_title(region)
    
    share_df <- data.frame(region, share)
    
    us_val <- 
      left_join(us, share_df, by = c("full" ="region")) 
    #> Warning: Column `full`/`region` joining character vector and factor, coercing
    #> into character vector
    
    us_centroids <- 
      us_val %>%
      group_by(full) %>% 
      summarise(centroid.x = mean(range(x)), 
                centroid.y = mean(range(y)),
                label = unique(toupper(str_sub(full,1,2))),
                share = unique(share))
    
    ggplot() + 
      geom_polygon(data = us_val, 
                   aes(x,y, group = group, fill = share > 3), 
                   color = "black",
                   size = .1) +
      geom_text(data = us_centroids, 
                aes(centroid.x, centroid.y, label = paste(label, "\n", share)),
                size = 5/14*8) +
      scale_fill_brewer(name = "State Share", 
                        palette = "Blues", 
                        labels = c(`TRUE`="More than 3",`FALSE`="Less than 3")) +
      theme_void()
    

    Created on 2020-05-06 by the reprex package (v0.3.0)

    update Having said that with the abbreviation - check out ?datasets::state. It contains those abbreviations (state.abb), and state names (state.name). It also contains data on the centroids (state.center). So, a lot of data already inbuilt :)

    Data

    region =  c("alabama", "alaska", "arizona", "arkansas", 
               "california", "colorado", "connecticut", "delaware", "district of columbia", 
               "florida", "georgia", "hawaii", "idaho", "illinois", "indiana", 
               "iowa", "kansas", "kentucky", "louisiana", "maine", "maryland", 
               "massachusetts", "michigan", "minnesota", "mississippi", "missouri", 
               "montana", "nebraska", "nevada", "new hampshire", "new jersey", 
               "new mexico", "new york", "north carolina", "north dakota", "ohio", 
               "oklahoma", "oregon", "pennsylvania", "rhode island", "south carolina", 
               "south dakota", "tennessee", "texas", "utah", "vermont", "virginia", 
               "washington", "west virginia", "wisconsin", "wyoming")
    
    share = c(1.15, 0.11, 6.21, 2.41, 8.42, 13.57, 3.57, 4.55, 7.08, 9.42, 5.21, 
              0.108, 9.09, 2.56, 4.51, 9.65, 6.76, 3.54, 0.17, 1.99, 6.66, 
              3.88, 7.31, 4.86, 4.85, 2.39, 0.25, 0.05, 0.21, 0.11, 3.86, 0.05, 
              7.31, 1.91, 0.41, 4.55, 0.002, 2.65, 3.14, 0.71, 1.94, 0.13, 
              2.2, 12.65, 0.05, 0.074, 5.79, 7.5, 0.12, 2.6, 0.33)