Search code examples
rggplot2data-manipulationdata-wrangling

How to color countries according to the count of rows on a map in R?


The columns of interest of the data look like this

wheels

  name                  country  
1 The Southern Star     Australia
2 Wheel of Brisbane     Australia
3 Wheel of Perth        Australia
4 Wiener Riesenrad      Austria  
5 Niagara SkyWheel      Canada   
6 Beijing Great Wheel   China    
7 Changsha Ferris Wheel China   

How to color each country according to the number of rows on the following map ?

library(tidyverse); library(maptools); library(sf)
# Main data
wheels <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-08-09/wheels.csv')

# Data for the map
world <- map_data("world")
data(wrld_simpl)
countries <- st_as_sf(wrld_simpl)

# Plot
df_plot <- right_join(wheels, countries, by = c("country" = "NAME"))
ggplot(df_plot, aes(LON, LAT)) +
  geom_map(data = world, map = world, aes(long, lat, map_id = region), color = "white", fill = "#66BBBB", size = 0.1) 

Do I have to summarise the data using dplyr::count for example ?

EDIT : I made a dataframe giving the number of rows per country in the wheels dataframe

# Number of rows per country
data_count <- wheels %>% count(country)
# Joining the dataframes 
df_plot <- right_join(wheels, countries, by = c("country" = "NAME")) %>% 
  left_join(data_count)

enter image description here


Solution

  • You can summarize the wheels data frame by using group_by and count. Then left join this to your countries data frame:

    df_plot <- left_join(countries, wheels %>% group_by(country) %>% count(),
                         by = c("NAME" = "country"))
    
    ggplot(df_plot) + 
      geom_sf(aes(fill = n)) +
      scale_fill_viridis_c(na.value = "#000035")
    

    enter image description here

    The problem here is that your left_join requires an exact match between the name of the country stored in wheels$country and countries$NAME, but there are some mismatches:

    unique(wheels$country)[is.na(match(unique(wheels$country), countries$NAME))]
    #> [1] "USA"          "Tailand"      "UK"           "UAE"         
    #> [5] "Phillippines" "Iran"         "S Korea"      "Dubai" 
    

    So a more complete solution would be:

    left_join(countries,
              wheels %>%
                mutate(country = case_when(
                   country == "UK"           ~ "United Kingdom",
                   country == "USA"          ~ "United States",
                   country == "Tailand"      ~ "Thailand",
                   country == "UAE"          ~ "United Arab Emirates",
                   country == "Dubai"        ~ "United Arab Emirates",
                   country == "S Korea"      ~ "Korea, Republic of",
                   country == "Phillippines" ~ "Philippines",
                   country == "Iran"         ~ "Iran (Islamic Republic of)",
                   TRUE ~ country)) %>%
                count(country),
               by = c("NAME" = "country")) %>%
      ggplot() + 
      geom_sf(aes(fill = n)) +
      scale_fill_viridis_c(na.value = "#000040")
    

    enter image description here