Search code examples
rdplyrconcatenationstr-replace

Missing observations when using str_replace_all


I have a dataset of map data using the following:

worldMap_df <- map_data("world") %>%
  rename(Economy = region) %>%
  filter(Economy != "Antarctica") %>%
  mutate(Economy = str_replace_all(Economy,
                                   c("Brunei" = "Brunei Darussalam",
                                     "Macedonia" = "Macedonia, FYR",
                                     "Puerto Rico" = "Puerto Rico US",
                                     "Russia" = "Russian Federation",
                                     "UK" = "United Kingdom",
                                     "USA" = "United States",
                                     "Palestine" = "West Bank and Gaza",
                                     "Saint Lucia" = "St Lucia",
                                     "East Timor" = "Timor-Leste")))

There are a number of countries (under Economy) that I am trying to use str_replace_all to concatenate. One example is observations for which Economy is either "Trinidad" or "Tobago".

I've used the following but this seems to only partially re-label observations:

trin_tobago_vector <- c("Trinidad", "Tobago")
worldMap_df$Economy <- str_replace_all(worldMap_df$Economy, trin_tobago_vector, "Trinidad and Tobago")

However, certain observations still have Trinidad and Tobago under Economy whilst others remain Trinidad OR Tobago. Can anyone see what I'm doing wrong here?


Solution

  • You supply str_replace_all with a pattern that is a vector: trin_tobago_vector. It will then iterate over your 'Economy' column and check the first element with "Trinidad", the second element with "Tobago", the third with "Trinidad", and so on. You should do this replacement in two steps instead:

    worldMap_df$Economy <- str_replace_all(worldMap_df$Economy, "^Trinidad$", "Trinidad and Tobago")
    worldMap_df$Economy <- str_replace_all(worldMap_df$Economy, "^Tobago$", "Trinidad and Tobago")
    

    or use a named vector:

    trin_tobago_vector <- c("^Trinidad$" = "Trinidad and Tobago", "^Tobago$" = "Trinidad and Tobago")
    worldMap_df$Economy <- str_replace_all(worldMap_df$Economy, trin_tobago_vector)
    

    The ^ and $ inside the pattern vector make sure that only the literal strings "Trinidad" and "Tobago" are replaced.