Search code examples
rggplot2countryfuzzy-comparison

Correcting country names to make them match a different naming convention


I want to make a world map with ggplot as follows:

library(ggplot2)
library(countrycodes)
library(dplyr)
library(tidyverse)

worldmap_AMIS_Market <- map_data("world")
# Set colors
vec_AMIS_Market <- c("Canada", "China","United States of America", "Republic of Korea", "Russian Federation")
worldmap_AMIS_Market <- mutate(worldmap, fill = ifelse(region %in% vec_AMIS_Market, "green", "lightgrey"))

# Use scale_fiil_identity to set correct colors
ggplot(worldmap_AMIS_Market, aes(long, lat, fill = fill, group=group)) + 
  geom_polygon(colour="gray") + ggtitle("Map of World") + 
  ggtitle("Availability of AMIS Supply and Demand Data - Monthly") +
  scale_fill_identity()

As you can see the US does not light up in green, because in the worldmap_AMIS_Market data, the US is written as USA, while the vector uses United States of America. The same goes for Russia and South Korea. As I am going to go through this process for around 50 different datasets, I would prefer to not manually correct all countries that do no match.

Is there any way to solve issues like this? I have a couple of ideas, but not an actual solution:

  1. I could do fuzzy matching, but that won't work for USA -> United States.
  2. I know the package countrycodes can convert countries to iso codes etc, but I don't think it has the option to correct country names (https://github.com/vincentarelbundock/countrycode).
  3. I could somehow collect all alternative naming conventions for all countries, and then do a fuzzy match on that. But I don't know where to get the alternative names from, and I am not sure I would be able to write the fuzzy code for this scenario anymore.

Could someone perhaps help me fix this?


Solution

  • One option would be countrycode::countryname to convert the country names.

    Note: countrycode::countryname throws a warning so it will probably not work in all cases. But at least to me the cases where it fails are rather exotic and small countries or islands.

    library(ggplot2)
    library(countrycode)
    library(dplyr)
    library(tidyverse)
    
    worldmap <- map_data("world")
    # Set colors
    vec_AMIS_Market <- c("Canada", "China","United States of America", "Republic of Korea", "Russian Federation")
    
    worldmap_AMIS_Market <- mutate(worldmap, region = countryname(region), fill = ifelse(region %in% countryname(vec_AMIS_Market), "green", "lightgrey"))
    #> Warning in countrycode_convert(sourcevar = sourcevar, origin = origin, destination = dest, : Some values were not matched unambiguously: Ascension Island, Azores, Barbuda, Canary Islands, Chagos Archipelago, Grenadines, Heard Island, Madeira Islands, Micronesia, Saba, Saint Martin, Siachen Glacier, Sint Eustatius, Virgin Islands
    
    # Use scale_fiil_identity to set correct colors
    ggplot(worldmap_AMIS_Market, aes(long, lat, fill = fill, group=group)) + 
      geom_polygon(colour="gray") + ggtitle("Map of World") + 
      ggtitle("Availability of AMIS Supply and Demand Data - Monthly") +
      scale_fill_identity()