Search code examples
dataframedplyrdata-manipulation

Mutate column conditionally, if any NAs take the highest value and grab the column names


Consider this sample data

# A tibble: 10 x 3
       x     y     z
   <int> <dbl> <dbl>
 1     1   1       5
 2     2   3       6
 3     3   4       7
 4     4   7       8
 5     5  NA       9
 6     6  12      10
 7     7  NA      NA
 8     8   2      20
 9     9   5.5    25
10    10  NA       8

I would like to mutate a new column value that rowSums if there is no NAs present in any of the columns.

  • If there are, take the highest value in the row times 1.2.

  • BUT, if there are only one column with a value, take that value

  • Finally, another column NA_column with the names of columns containing NA in that row!

What I have in mind, but could not figure out the rest.

df %>% 
  mutate(
    value = case_when(any_vars(is.na()) ~ BIGGEST VALUE * 1.2,
                      TRUE ~ rowsum()), 
    NA_column = columns that has NA in that row
  )

DATA

df <- tibble(
  x = 1:10, 
  y = c(1, 3, 4, 7, NA, 12, NA, 2, 5.5, NA), 
  z = c(5:10, NA, 20, 25, 8)
)

Solution

  • Use rowwise and c_across to perform operations

    library(dplyr)
    
    df_out <- df %>% 
      rowwise() %>%
      mutate(
        value = ifelse(anyNA(c_across(everything())), max(c_across(everything()), na.rm = T) * 1.2, x), # or 'y' or 'z' instead of x
        NA_column = paste(colnames(df)[is.na(c_across(x:z))], collapse = " | ")
      ) %>% 
      ungroup() 
    
    df_out
    # A tibble: 10 × 5
           x     y     z value NA_column
       <int> <dbl> <dbl> <dbl> <chr>    
     1     1   1       5   1   ""       
     2     2   3       6   2   ""       
     3     3   4       7   3   ""       
     4     4   7       8   4   ""       
     5     5  NA       9  10.8 "y"      
     6     6  12      10   6   ""       
     7     7  NA      NA   8.4 "y | z"  
     8     8   2      20   8   ""       
     9     9   5.5    25   9   ""       
    10    10  NA       8  12   "y"