I'm trying to find matching string with my_list and data frame(df) and depending on TRUE/FALSE I need to populate new_name column in df with first sting in matching list (my_list[[i]][1]) in case TRUE , or "cat" column value in case no match.
My data frame is as follows:
name <- c("NETFLIX.COM", "BlueTV", "smv", "trafi", "alkatel")
cat<- c("none", "none", "none", "transportation", "communication")
df<-data.frame(name, cat)
My list:
travel<- c("travel","air_com", "AIRCAT", "tivago")
leasure<- c("leasure","MTV", "NETFLIX.COM")
my_list<- list(travel, leasure)
My for loop with ifelse and grepl is as follows:
for (j in 1:nrow(df)) {
for (i in 1:length(my_list)) {
df[j, "new_name"]<- ifelse(
grepl(paste(my_list[[i]], collapse="|"), tolower(df[j, "name"])),
my_list[[i]][1],
df[j, "cat"])
Expected output is :
df["new_name"]<- c("leasure", "none", "none", "transportation", "communication")
df
name cat new_name
1 NETFLIX.COM none leasure
2 BlueTV none none
3 smv none none
4 trafi transportation transportation
5 alkatel communication communication
Currently with the for loop I wrote I obtain exact copy of "cat" column meaning that all cases are considered as nonmatching (FALSE) in ifelse function. I'm note sure what's wrong here... Any help would be appreciated!
It doesn't make sense to use ifelse()
in that context: it is for vectorized selection. But your code would work if you had the pattern matching right. Unfortunately, for j == 1
and i == 2
(when you expected a match), your pattern is
"leasure|MTV|NETFLIX.COM"
and you are trying to match it to tolower(df[j, "name"])
, which is
"netflix.com"
You should map both strings to lowercase, or set ignore.case = TRUE
in the grepl()
call. For example,
name <- c("NETFLIX.COM", "BlueTV", "smv", "trafi", "alkatel")
cat<- c("none", "none", "none", "transportation", "communication")
df<-data.frame(name, cat)
travel<- c("travel","air_com", "AIRCAT", "tivago")
leasure<- c("leasure","MTV", "NETFLIX.COM")
my_list<- list(travel, leasure)
for (j in 1:nrow(df)) {
for (i in 1:length(my_list)) {
df[j, "new_name"] <-
if( grepl(paste(my_list[[i]], collapse="|"), df[j, "name"],
ignore.case = TRUE))
my_list[[i]][1]
else df[j, "cat"]
}
}
df
#> name cat new_name
#> 1 NETFLIX.COM none leasure
#> 2 BlueTV none none
#> 3 smv none none
#> 4 trafi transportation transportation
#> 5 alkatel communication communication
Created on 2021-08-10 by the reprex package (v2.0.0)
Generally speaking using pattern matching to find if a string is in a list is tricky; be really careful that your strings in my_list
never include any characters that grepl()
treats as special in a regular expression. For your example you'll get the same result as the grepl()
gives using the test
tolower(df[j, "name"]) %in% tolower(my_list[[i]])
but that's not true for all possible name
values: the grepl()
code will allow partial matches (e.g. df[i, "name"]
equal to "netflix.com in a long string"
) and %in%
won't.