Search code examples
rdataframeif-statementsapplystrsplit

Apply strsplit and ifelse condition accross multiple columns in R


I have a dataframe as shown below and the output from this code is shown below.

  • Essentially, I want to read values from columns which can be of two formats as shown in col1 and col2.
  • If the format is as in col1 then we split text and read third element and if the format is as in col2 we read the cell value as it is.
  • We then want to pass condition if the values in columns (col1 and col3 after splitting col1) to create col4 and col5.
  • Finally, we want the result in col6.
  • the variable "selection" is a list of numbers

Question: Since in the original dataframe, there can be multiple columns of value format type as in col1 and col2, how the code below can be modified to get the desired output?

#STEP1

df <- data.frame(
  col1 = c("abc_1_102", "abc_1_103", "xyz_1_104"),
  col2 = c("107", "108", "106")
)

#STEP2

split_text <- strsplit(df$col1, "_")

third_elements <- sapply(split_text, function(x) if(length(x) >= 3) x[3] else NA)

#STEP3

df$col3<-third_elements

#STEP4

selection<-c(107,102,108)

df$col4<-ifelse(df$col2 %in% selection,"SELECT","NOTSELECT")

df$col5<-ifelse(df$col3 %in% selection,"SELECT","NOTSELECT")

#STEP5

df$col6<-paste(df$col4,df$col5,sep = ",")

Output from above code:

       col1 col2 col3      col4      col5                col6
1 abc_1_102  107  102    SELECT    SELECT       SELECT,SELECT
2 abc_1_103  108  103    SELECT NOTSELECT    SELECT,NOTSELECT
3 xyz_1_104  106  104 NOTSELECT NOTSELECT NOTSELECT,NOTSELECT

Desired output

       col1 col2                col6
1 abc_1_102  107       SELECT,SELECT
2 abc_1_103  108    NOTSELECT,SELECT
3 xyz_1_104  106 NOTSELECT,NOTSELECT

Solution

  • You can do this all in one go with by pasting two ifelse statements together. The ifelse for col2 is straightforward. The ifelse for col3 uses grepl to search for any of the numbers in select by creating a search string using paste(..., collapse = "|") (pasting the "or" operator between them). The outer paste(..., sep = ",") puts it all together:

    df$col6 <- paste(ifelse(df$col2 %in% selection, "SELECT", "NOTSELECT"),
                     ifelse(grepl(paste(selection, collapse = "|"), df$col1), "SELECT", "NOTSELECT"),
                     sep = ",")
    

    Or to more safely add a word boundary to the second ifelse (thanks to @r2evans!)

    df$col6 <- paste(ifelse(df$col2 %in% selection, "SELECT", "NOTSELECT"),
                     ifelse(grepl(
                       paste0("(^|_)(", paste(selection, collapse = "|"), ")(_|$)\\b"), 
                       df$col1),
                       "SELECT", "NOTSELECT"),
                     sep = ",")
    

    Both give the same output in this example:

           col1 col2                col6
    1 abc_1_102  107       SELECT,SELECT
    2 abc_1_103  108    SELECT,NOTSELECT
    3 xyz_1_104  106 NOTSELECT,NOTSELECT