Search code examples
rdplyrany

create new column with mutate() when value in any of several other columns is TRUE (or 1)


I have a dataframe (my_dataframe) with 5 columns. All have 0 or 1 values. I would like to create a new column called cn7_any, which should have values of 1 when any values from columns 2:5 are ==1.

structure(list(cn7_normal = c(1L, 1L, 1L, 1L, 1L, 1L), 
    cn7_right_paralysis_central = c(0L, 0L, 0L, 0L, 0L, 0L), 
    cn7_right_paralysis_peripheral = c(0L, 0L, 0L, 0L, 0L, 0L), 
    cn7_left_paralysis_central = c(0L, 0L, 0L, 0L, 0L, 0L), 
    cn7_left_paralysis_peripheral = c(0L, 0L, 0L, 0L, 0L, 0L)), 
    row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
> head(my_dataframe)
# A tibble: 6 x 5
  cn7_normal cn7_right_paralysis_cen… cn7_right_paralysis_perip… cn7_left_paralysis_cen… cn7_left_paralysis_peri…
       <int>                    <int>                      <int>                   <int>                    <int>
1          1                        0                          0                       0                        0
2          1                        0                          0                       0                        0

I could do it successfully with case_when():

my_dataframe<-my_dataframe%>%
        mutate(cn7_paralisis_any=case_when(cn7_right_paralysis_central==1 ~ 1,
                                           cn7_right_paralysis_peripheral==1 ~ 1,
                                           cn7_left_paralysis_central==1 ~ 1,
                                           cn7_left_paralysis_peripheral==1 ~ 1,
                                           TRUE ~ 0)
                )

Although it worked, I wonder whether there is a simpler, less verbose solution. I feel I should be using any() somehow. Any ideas?


Solution

  • Your data is all zeroes, so I'll change a couple to prove the point.

    rowSums(my_dataframe[,2:5]) > 0
    # [1] FALSE  TRUE FALSE  TRUE FALSE FALSE
    +(rowSums(my_dataframe[,2:5]) > 0)
    # [1] 0 1 0 1 0 0
    
    my_dataframe$cn7_any <- +(rowSums(my_dataframe[,2:5]) > 0)
    

    Within dplyr,

    my_dataframe %>%
      mutate(cn7_any = rowSums(across(-cn7_normal, ~ . > 0)) > 0)
    # # A tibble: 6 x 6
    #   cn7_normal cn7_right_paralysis_central cn7_right_paralysis_peripheral cn7_left_paralysis_central cn7_left_paralysis_peripheral cn7_any
    #        <int>                       <int>                          <int>                      <int>                         <int> <lgl>  
    # 1          1                           0                              0                          0                             0 FALSE  
    # 2          1                           0                              0                          0                             1 TRUE   
    # 3          1                           0                              0                          0                             0 FALSE  
    # 4          1                           0                              0                          1                             0 TRUE   
    # 5          1                           0                              0                          0                             0 FALSE  
    # 6          1                           0                              0                          0                             0 FALSE  
    

    It seems like a logical thing you're doing, not a number thing, but if you want numbers, just use the +(.) trick as above:

    my_dataframe %>%
      mutate(cn7_any = +(rowSums(across(-cn7_normal, ~ . > 0)) > 0))