Search code examples
rif-statementnestedgrepl

Reducing nested if else statements with grepl in R


In R, I have a data frame, which has a column 'food' with 100+ different string values.

For instance:

id<-c("1", "2", "3", "4", "5", "6")
food <- c("X1_", "X2_", "X3_", "X4_", "X5_", "X100_")
df <- data.frame(id, food)

I would like to create a new column ‘food_final’ based on the strings in the column ‘food’. I started writing the code using nested ifelses and grepl, but given that there are 100+ different string values, I know having 100+ if elses is definitely not the cleanest way of doing this and in any case, there is a limit to how many one can have.

Example of what I have tried so far:

df$food_final<-ifelse(grepl("X1_", df$food, ignore.case=TRUE), "1",
                      ifelse(grepl("X2_", df$food, ignore.case=TRUE), "2",
                             ifelse(grepl("X3_", df$food, ignore.case=TRUE), "3",
                                    ifelse(grepl("X4_", df$food, ignore.case=TRUE), "4",
                                        ifelse(grepl("X5_", df$food, ignore.case=TRUE), "5",
                                             ifelse(grepl("X100_", df$food, ignore.case=TRUE), "100", NA))))))

What is the best way of creating this new column 'food_final', instead of using so many nested ifelse statements?

Thank you in advance.


Solution

  • In case you want to extract the number:

    df$food_final <- gsub("\\D", "", df$food)
    
    df
    #  id  food food_final
    #1  1   X1_          1
    #2  2   X2_          2
    #3  3   X3_          3
    #4  4   X4_          4
    #5  5   X5_          5
    #6  6 X100_        100
    

    or in case there are different linkages, doing basically the same what you are doing with the nested ifelse.

    x <- c("1"="X1_", "2"="X2_", "3"="X3_", "4"="X4_", "5"="X5_", "100"="X100_")
    apply(sapply(x, grepl, df$food, ignore.case=TRUE), 1, function(y) names(x)[y][1])
    #[1] "1"   "2"   "3"   "4"   "5"   "100"
    

    Or using Reduce:

    x <- c("1"="X1_", "2"="X2_", "3"="X3_", "4"="X4_", "5"="X5_", "100"="X100_")
    Reduce(function(a,b) {
      i <- is.na(a)
      a[i][grepl(x[b], df$food[i], ignore.case=TRUE)] <- b
      a
    }, names(x), rep(NA, nrow(df)))
    #[1] "1"   "2"   "3"   "4"   "5"   "100"