Search code examples
rconditional-statementscreate-tabledplyrmutated

create(mutate) column with a condition of another one


I have this data

     COL
    AABC1
   AAAABD2
  AAAAAABF3

I would like to make a certain column like this:

     COL         NEW_COL
    AABC1          T1
   AAAABD2         T2
  AAAAAABF3        T3

If COL contains 'BC', NEW_COL will be T1

contains 'BD', it will be T2

contains 'BF', it will be T3.

I would like to use mutate and grepl function but I have 80 conditions (like BC>T1) so that the code does not work in the R.

With the table like:

    CLASS       NEW_COL
    BC          T1
    BD          T2
    BF          T3

Could I use mutate(create) new column with above standard table ??


Solution

  • Here's your data:

    DF <- data.frame(COL = c("AABC1",
                             "AAAABD2",
                             "AAAAABF3"), 
                     stringsAsFactors = FALSE)
    
    lookup_tbl <- data.frame(CLASS = c("BC", "BD", "BF"),
                             NEW_COL = c("T1", "T2", "T3"), 
                             stringsAsFactors = FALSE)
    

    Your problem is solved by a join, after some initial preparation.

    To prepare DF, you need to add a column that extracts any instance of CLASS in the lookup table from COL in DF. Then you can join normally. In R:

    library(dplyr)
    DF %>%
      mutate(CLASS = gsub(paste0("^.*(", 
                                paste0(lookup_tbl[["CLASS"]], collapse = "|"),
                                ").*$"), 
                          "\\1",
                          lookup_tbl[["CLASS"]])) %>%
      # or inner_join as required
      left_join(lookup_tbl, by = "CLASS")
    

    How the solution should behave COL matches zero or more than one instance in CLASS will need to be specified. The above handles both cases, but maybe not how you'd like.