How to group a column with character values in a new column in r

I have a data set with countries column, I want to create a new column and classify the countries into the following categories (first world, second world, third world) countries. I'm relatively new to R and I'm finding it difficult to find a proper function that deals with characters!

My dataset contains the countries like this, and I have three vectors with a list of countries as shown below:

nt_final_table$`Country name`
#[1] "Finland"                   "Denmark"                   "Switzerland"              
#[4] "Iceland"                   "Netherlands"               "Norway"                   
#[7] "Sweden"                    "Luxembourg"                "New Zealand"              
#[10] "Austria"                   "Australia"                 "Israel"       

first_world_countries <- c("Australia","Austria","Belgium","Canada","Denmark","France","Germany","Greece","Iceland","Ireland","Israel","Italy","Japan","Luxembourg","Netherlands","New Zealand","Norway","Portugal","South Korea",
"Spain","Sweden","Switzerland","Turkey","United Kingdom","USA")

Second_world_countries <- c("Albania","Armenia","Azerbaijan","Belarus","Bosnia and Herzegovina","Bulgaria","China","Croatia","Cuba","Czech Republic","EastGermany","Estonia","Georgia","Hungary","Kazakhstan","Kyrgyzstan","Laos","Poland","Romania","Russia","Serbia","Slovakia","Slovenia","Tajikistan","Turkmenistan","Ukraine","Uzbekistan","Vietnam")

Third_world_countries <- ("Somalia","Niger","South Sudan")

I would want a new column that contains the following values : First World, Second World, Third World based on the Country name column

Any help would be appreciated! Thanks!

Solution

Here are 2 ways you could do this.

Using dplyr package

You could use case_when from the dplyr package to do this.


library(dplyr)

country_name <-c("Finland", "Denmark", "Switzerland","Iceland", "Netherlands", "Norway", "Sweden", "Luxembourg", "New Zealand",
                 "Austria", "Australia", "Israel")

nt_final_table <- data.frame(country_name)

first_world_countries <- c("Australia","Austria","Belgium","Canada","Denmark","France","Germany","Greece","Iceland","Ireland","Israel","Italy","Japan","Luxembourg","Netherlands","New Zealand","Norway","Portugal","South Korea", "Spain","Sweden","Switzerland","Turkey","United Kingdom","USA")

second_world_countries <- c("Albania","Armenia","Azerbaijan","Belarus","Bosnia and Herzegovina","Bulgaria","China","Croatia","Cuba","Czech Republic","EastGermany","Estonia","Georgia","Hungary","Kazakhstan","Kyrgyzstan","Laos","Poland","Romania","Russia","Serbia","Slovakia","Slovenia","Tajikistan","Turkmenistan","Ukraine","Uzbekistan","Vietnam")

third_world_countries <- c("Somalia","Niger","South Sudan")

nt_final_table_categorized <- nt_final_table %>% mutate(category = case_when(country_name %in% first_world_countries ~ "First",
                                               country_name %in% second_world_countries ~ "Second",
                                               country_name %in% third_world_countries ~ "Third",
                                               TRUE ~"Not listed"))

nt_final_table_categorized

Sample output

   country_name   category
1       Finland Not listed
2       Denmark      First
3   Switzerland      First
4       Iceland      First
5   Netherlands      First
6        Norway      First
7        Sweden      First
8    Luxembourg      First
9   New Zealand      First
10      Austria      First
11    Australia      First
12       Israel      First

Using base R

In base R we could create a data frame that lists the countries and their category then use merge to perform a left-join on the 2 dataframes.

country_name <-c("Finland", "Denmark", "Switzerland","Iceland", "Netherlands", "Norway", "Sweden", "Luxembourg", "New Zealand",
                 "Austria", "Australia", "Israel")

nt_final_table <- data.frame(country_name)

first_world_countries <- c("Australia","Austria","Belgium","Canada","Denmark","France","Germany","Greece","Iceland","Ireland","Israel","Italy","Japan","Luxembourg","Netherlands","New Zealand","Norway","Portugal","South Korea", "Spain","Sweden","Switzerland","Turkey","United Kingdom","USA")

second_world_countries <- c("Albania","Armenia","Azerbaijan","Belarus","Bosnia and Herzegovina","Bulgaria","China","Croatia","Cuba","Czech Republic","EastGermany","Estonia","Georgia","Hungary","Kazakhstan","Kyrgyzstan","Laos","Poland","Romania","Russia","Serbia","Slovakia","Slovenia","Tajikistan","Turkmenistan","Ukraine","Uzbekistan","Vietnam")

third_world_countries <- c("Somalia","Niger","South Sudan")

country_name <- c(first_world_countries,second_world_countries,third_world_countries)

categories <- c(rep("First", length(first_world_countries)),
                rep("Second",length(second_world_countries)),
                rep("Third",length(third_world_countries)))

all_countries_categorised <- data.frame(country_name, categories)

nt_final_table_categorized <-merge(nt_final_table, all_countries_categorised, by ="country_name", all.x=TRUE)

nt_final_table_categorized

Sample output

   country_name categories
1     Australia      First
2       Austria      First
3       Denmark      First
4       Finland       <NA>
5       Iceland      First
6        Israel      First
7    Luxembourg      First
8   Netherlands      First
9   New Zealand      First
10       Norway      First
11       Sweden      First
12  Switzerland      First