I have a dataset that includes individuals nested within countries. One of the variables is the individuals' native language. I also have another dataset, let's call it lookup, that includes a list of national languages for each country. For example (my actual data differs, but has the same basic structure):
individuals <- data.frame(
cntry = c("AT", "AT", "AT", "BE", "BE", "BE", "HU"),
lang = c("GER", "ENG", "FRE", "FRE", "DUT", "ARA", "HUN"))
languages <- data.frame(
lcntry = c("AT", "DE", "BE", "BE", "HU"),
llang = c("GER", "GER", "FRE", "DUT", "HUN")
)
I want to generate a new logical variable in the individuals data frame which tells me whether the language recorded for each individual is contained in the respective country's national language list.
To clarify: I want natlang
to only be TRUE if the language is listed as a national language for that specific country, so just doing this will not work for me:
individuals <- individuals %>% mutate(natlang = lang %in% languages$llang)
I have tried the following, which runs, but gives incorrect results:
individuals <- individuals %>%
mutate(natlang = lang %in% languages$llang[languages$lcntry == cntry])
I have also tried the following:
individuals <- individuals %>%
mutate(natlang = lang %in% filter(languages, lcntry == cntry)$llang)
This fails with the error:
Error in `mutate()`:
ℹ In argument: `natlang = lang %in% filter(languages, lcntry == cntry)$lang`.
Caused by error in `filter()`:
ℹ In argument: `lcntry == cntry`.
Caused by error:
! `..1` must be of size 5 or 1, not size 7.
Backtrace:
1. individuals %>% ...
17. dplyr:::dplyr_internal_error(...)
I assume that my problem has to do with the two data frames being of different lengths, and mutate trying to vectorize everything to the length of the first data frame, but I'm not sure about this.
You can simply left join:
left_join(
individuals,
mutate(languages, natlang=TRUE),
by=c("cntry" = "lcntry", "lang"="llang")
)
Output:
cntry lang natlang
1 AT GER TRUE
2 AT ENG NA
3 AT FRE NA
4 BE FRE TRUE
5 BE DUT TRUE
6 BE ARA NA
7 HU HUN TRUE