I would like to ask you if there is a similar function like "match" in excel in R.
For example if I have a dataset with people's educational degrees:
> edu
chr [1:4] "Bachelor" "NA" "Master" "Superieur"
And an international mapping system by ISCED:
> ISCED
Main education program English translation Code
Brevet d'enseignement supérieur (BES) certificate of higher education 5
bachelier de transition Bachelor 6
Bachelor Bachelor 6
Master Master 7
I wonder if there is a function that can help identify partially the strings from the vector edu from the first column of the dataframe ISCED, and then if there is a match, the code (5, 6 or 7) will be returned.
I know there are functions like "%like%" or "grepl", but I am looking for something that can skim through all values of the vector edu and not just one particular string defined each time.
Does anybody have any insights? Or would you guys suggest using a loop with the "grepl"?
Thank you!
One way, is using grep
.
Making a vector of strings with paste0
and getting an index wherever it matches the first column (Main_education_group
). Using that index to fetch the respective Code
from the data frame.
ISCED$Code[grep(paste0(edu, collapse = "|"), ISCED$Main_education_program)]
#[1] 6 7
EDIT
To get the updated output as per OP's request we can use sapply
and loop over ever element in edu
and check of it is present or not in Main_education_program
sapply(edu, function(x) if(length(grep(x, ISCED$Main_education_program)) > 0)
ISCED$Code[grep(x, ISCED$Main_education_program)] else NA)
which returns
# Bachelor NA Master Superieur
# 6 NA 7 NA
If we need it without the names we can wrap it in unname
unname(sapply(edu, function(x) if(length(grep(x, ISCED$Main_education_program))>0)
ISCED$Code[grep(x, ISCED$Main_education_program)] else NA ))
#[1] 6 NA 7 NA