Search code examples
rfor-loopif-statementgrepl

R, for loop with ifelse and grepl function does not give expected results


I'm trying to find matching string with my_list and data frame(df) and depending on TRUE/FALSE I need to populate new_name column in df with first sting in matching list (my_list[[i]][1]) in case TRUE , or "cat" column value in case no match.

My data frame is as follows:

name <- c("NETFLIX.COM", "BlueTV", "smv", "trafi", "alkatel")
cat<- c("none", "none", "none", "transportation", "communication")
df<-data.frame(name, cat)

My list:

travel<- c("travel","air_com", "AIRCAT", "tivago")
leasure<- c("leasure","MTV", "NETFLIX.COM")
my_list<- list(travel, leasure)

My for loop with ifelse and grepl is as follows:

for (j in 1:nrow(df)) {
      for (i in 1:length(my_list)) {
        df[j, "new_name"]<- ifelse( 
        grepl(paste(my_list[[i]], collapse="|"), tolower(df[j, "name"])),
          my_list[[i]][1], 
          df[j, "cat"])

Expected output is :

df["new_name"]<- c("leasure", "none", "none", "transportation", "communication")
df

name            cat       new_name
1 NETFLIX.COM           none        leasure
2      BlueTV           none           none
3         smv           none           none
4       trafi transportation transportation
5     alkatel  communication  communication

Currently with the for loop I wrote I obtain exact copy of "cat" column meaning that all cases are considered as nonmatching (FALSE) in ifelse function. I'm note sure what's wrong here... Any help would be appreciated!


Solution

  • It doesn't make sense to use ifelse() in that context: it is for vectorized selection. But your code would work if you had the pattern matching right. Unfortunately, for j == 1 and i == 2 (when you expected a match), your pattern is

    "leasure|MTV|NETFLIX.COM"
    

    and you are trying to match it to tolower(df[j, "name"]), which is

    "netflix.com"
    

    You should map both strings to lowercase, or set ignore.case = TRUE in the grepl() call. For example,

    name <- c("NETFLIX.COM", "BlueTV", "smv", "trafi", "alkatel")
    cat<- c("none", "none", "none", "transportation", "communication")
    df<-data.frame(name, cat)
    
    travel<- c("travel","air_com", "AIRCAT", "tivago")
    leasure<- c("leasure","MTV", "NETFLIX.COM")
    my_list<- list(travel, leasure)
    
    for (j in 1:nrow(df)) {
      for (i in 1:length(my_list)) {
        df[j, "new_name"] <- 
          if( grepl(paste(my_list[[i]], collapse="|"), df[j, "name"],
                ignore.case = TRUE))
            my_list[[i]][1] 
          else df[j, "cat"]
      }
    }
    df
    #>          name            cat       new_name
    #> 1 NETFLIX.COM           none        leasure
    #> 2      BlueTV           none           none
    #> 3         smv           none           none
    #> 4       trafi transportation transportation
    #> 5     alkatel  communication  communication
    

    Created on 2021-08-10 by the reprex package (v2.0.0)

    Generally speaking using pattern matching to find if a string is in a list is tricky; be really careful that your strings in my_list never include any characters that grepl() treats as special in a regular expression. For your example you'll get the same result as the grepl() gives using the test

    tolower(df[j, "name"]) %in% tolower(my_list[[i]])
    

    but that's not true for all possible name values: the grepl() code will allow partial matches (e.g. df[i, "name"] equal to "netflix.com in a long string") and %in% won't.