Search code examples
rlogical-operators

Combine logical vectors to create non-logical vector


I have two logical vectors in a data frame:

df <- data.frame(log1 = c(FALSE, FALSE, TRUE, FALSE, TRUE), log2 = c(TRUE, FALSE, FALSE, FALSE, TRUE))

I want to make a third column by combining these two. But this new column should not simply contain logical values. Instead, it should assign one of three values - "high", "outlier", or "normal" to the third column. "High" takes precedence, so the third column should show "high" and not "outlier" for row 5.

I guess it's possible to do this with using if and else, but I couldn't make it work using the following code:

df$new <- NA
if(df$log1 == TRUE){
  df$new <-  "high"
  } else if(df$log2 == TRUE) {
    df$new  <-  "outlier"
    } else {
      df$new  <-  "normal"
      }

Can anyone help?


Solution

  • This is all about ifelse and its derivatives.

    base R

    ifelse(df$log1, "high", ifelse(df$log2, "outlier", "normal"))
    # [1] "outlier" "normal" "high"   "normal" "high"  
    

    dplyr

    We can nest dplyr::if_else, but nesting generally encourages us to use case_when.

    library(dplyr)
    df %>%
      mutate(
        new1 = if_else(log1, "high", if_else(log2, "outlier", "normal")), 
        new2 = case_when(log1 ~ "high", log2 ~ "outlier", TRUE ~ "normal")
      )
    #    log1  log2    new1    new2
    # 1 FALSE  TRUE outlier outlier
    # 2 FALSE FALSE  normal  normal
    # 3  TRUE FALSE    high    high
    # 4 FALSE FALSE  normal  normal
    # 5  TRUE  TRUE    high    high
    

    data.table

    Similarly, fifelse and fcase:

    library(data.table)
    as.data.table(df)[, new1 := fifelse(log1, "high", fifelse(log2, "outlier", "normal"))
      ][, new2 := fcase(log1, "high", log2, "outlier", default = "normal")][]
    #      log1   log2    new1    new2
    #    <lgcl> <lgcl>  <char>  <char>
    # 1:  FALSE   TRUE outlier outlier
    # 2:  FALSE  FALSE  normal  normal
    # 3:   TRUE  FALSE    high    high
    # 4:  FALSE  FALSE  normal  normal
    # 5:   TRUE   TRUE    high    high
    

    Note that while dplyr::case_when above uses tilde-formulas as in cond1 ~ value1, cond2 ~ value2, the fcase variant uses alternating arguments, cond1, value1, cond2, value2, ...).

    Also, the default= argument works so long as it is a constant. If a dynamic default value (i.e., based on table contents) is desired, then one needs to have an all-true vector as in fcase(..., rep(TRUE, .N), NEWVALUE).