Search code examples
rdplyrcase-when

Using a column name as a fail output in case_when


I'm comparing a couple of columns of ID variables pulled in from a group of files to a master list, these IDs should match across columns. I'm using case_when to make sure they each meet the master list, but I was wondering if there was a convenient way to change the "TRUE ~" output to the column name of the value that failed the test?

Here's the basic gist,

a <- c(1,2,3,4)
b <- c(1,2,3,4)
c <- c(1,2, "NA", "NA")
d <- c(1,2,3,4)

example <- data.table(a,b,c,d)

example %>% mutate(
   test = case_when(
      a==b & a==c & a==d ~ "PASS",
   TRUE ~ "FAIL")
)

Which will do the basic job of flagging any values that fail the test, but is there a modular way to change the TRUE ~ "FAIL" output to the name of the column that produced the failure, "c" in this case?


Solution

  • library(dplyr)
    
    example %>% 
      rowwise() %>% 
      dplyr::mutate(cond = cond = names(example)[which(!replace_na(a == c_across(b:d), F)) + 1] %>% 
                      paste(collapse = ",")) %>% 
      ungroup()
    

    Note: The + 1 is because this looks up column names by index based on which did not pass. + 1 is to skip column a so that cond shows the correct column. paste is used in case there are multiple columns that do not meet the condition. They will be comma separated in cond.

    Also as mentioned in the comments by @r2evans about coersion I modified c <- c(1,2, "NA", "NA") to be c <- c(1,2, NA, NA).

    Output

          a     b     c     d cond 
      <dbl> <dbl> <dbl> <dbl> <chr>
    1     1     1     1     1 ""   
    2     2     2     2     2 ""   
    3     3     3    NA     3 "c"  
    4     4     4    NA     4 "c" 
    

    Multiple columns fail condition

    a <- c(1,2,3,4)
    b <- c(1,2,3,4)
    c <- c(1,2, NA, NA)
    d <- c(1,2,3,6)
    
    example <- data.frame(a,b,c,d)
    
    example %>% 
      rowwise() %>% 
      dplyr::mutate(cond = cond = names(example)[which(!replace_na(a == c_across(b:d), F)) + 1] %>% 
                      paste(collapse = ",")) %>% 
      ungroup()
    
          a     b     c     d cond 
      <dbl> <dbl> <dbl> <dbl> <chr>
    1     1     1     1     1 ""   
    2     2     2     2     2 ""   
    3     3     3    NA     3 "c"  
    4     4     4    NA     6 "c,d"