Search code examples
rif-statementdplyrmutaterowwise

Mutate a new column according to the values of each row


I have the following toy data frame.

toy.df <- data.frame(Name = c("group1", "group2", "group3", "group4", "group5", "group6", "group7"), 
                 col1 = c("pos", "neg", "NA", "pos","neg", "NA", "pos"),
                 col2 = c("pos", "pos", "NA", "pos","neg","NA", "neg"),
                 col3 = c("pos", "NA", "pos", "NA", "neg", "neg", "neg"))

I would like to mutate a new column that check the values of all columns per row. If they are all "pos" or "NA" mutate "pos", if they are all "neg" or "NA" mutate "neg" and if they are "pos" or "neg" or "NA" mutate "both".

The new column looks as follows:

col4 <- c("pos", "both", "pos", "pos","neg", "neg","both")

Here is the final data frame:

 Name  col1 col2 col3 col4
group1  pos  pos  pos  pos
group2  neg  pos  NA  both
group3  NA   NA   pos  pos
group4  pos  pos   NA  pos
group5  neg  neg  neg  neg
group6  NA   NA   neg  neg
group7  pos  neg  neg both

Solution

  • Since the "NA" in your data frame is literal "NA", we need to turn it into real missing value NA by na_if. Then use case_when to supply the conditions for new column assignment. We need rowwise for it to work in every row. The final TRUE ~ "unknown" in case_when captures strings other than "pos" and "neg" in col1 to col3.

    I added two entries to show the behaviour when all rows are NA, or when there's a typo in the columns.

    library(dplyr)
    
    toy.df %>% 
      rowwise() %>%  
      mutate(across(everything(), ~na_if(.x, "NA")),
             col4 = case_when(all(is.na(c_across(col1:col3))) ~ NA,
                              all(c_across(col1:col3) == "pos", na.rm = T) ~ "pos",
                              all(c_across(col1:col3) == "neg", na.rm = T) ~ "neg",
                              all(c_across(col1:col3) %in% c("pos", "neg", NA)) ~ "both",
                              TRUE ~ "unknown")) %>% 
      ungroup()
    
    # A tibble: 9 × 5
      Name   col1  col2  col3  col4   
      <chr>  <chr> <chr> <chr> <chr>  
    1 group1 pos   pos   pos   pos    
    2 group2 neg   pos   NA    both   
    3 group3 NA    NA    pos   pos    
    4 group4 pos   pos   NA    pos    
    5 group5 neg   neg   neg   neg    
    6 group6 NA    NA    neg   neg    
    7 group7 pos   neg   neg   both   
    8 group8 NA    NA    NA    NA     
    9 group9 pos   pos   typo  unknown
    

    Data

    toy.df <- structure(list(Name = c("group1", "group2", "group3", "group4", 
    "group5", "group6", "group7", "group8", "group9"), col1 = c("pos", 
    "neg", "NA", "pos", "neg", "NA", "pos", NA, "pos"), col2 = c("pos", 
    "pos", "NA", "pos", "neg", "NA", "neg", NA, "pos"), col3 = c("pos", 
    "NA", "pos", "NA", "neg", "neg", "neg", NA, "typo")), class = "data.frame", row.names = c(NA, 
    -9L))