I'm comparing a couple of columns of ID variables pulled in from a group of files to a master list, these IDs should match across columns. I'm using case_when to make sure they each meet the master list, but I was wondering if there was a convenient way to change the "TRUE ~" output to the column name of the value that failed the test?
Here's the basic gist,
a <- c(1,2,3,4)
b <- c(1,2,3,4)
c <- c(1,2, "NA", "NA")
d <- c(1,2,3,4)
example <- data.table(a,b,c,d)
example %>% mutate(
test = case_when(
a==b & a==c & a==d ~ "PASS",
TRUE ~ "FAIL")
)
Which will do the basic job of flagging any values that fail the test, but is there a modular way to change the TRUE ~ "FAIL" output to the name of the column that produced the failure, "c" in this case?
library(dplyr)
example %>%
rowwise() %>%
dplyr::mutate(cond = cond = names(example)[which(!replace_na(a == c_across(b:d), F)) + 1] %>%
paste(collapse = ",")) %>%
ungroup()
Note: The + 1
is because this looks up column names by index based on which did not pass. + 1
is to skip column a
so that cond
shows the correct column. paste
is used in case there are multiple columns that do not meet the condition. They will be comma separated in cond
.
Also as mentioned in the comments by @r2evans about coersion I modified c <- c(1,2, "NA", "NA")
to be c <- c(1,2, NA, NA)
.
Output
a b c d cond
<dbl> <dbl> <dbl> <dbl> <chr>
1 1 1 1 1 ""
2 2 2 2 2 ""
3 3 3 NA 3 "c"
4 4 4 NA 4 "c"
Multiple columns fail condition
a <- c(1,2,3,4)
b <- c(1,2,3,4)
c <- c(1,2, NA, NA)
d <- c(1,2,3,6)
example <- data.frame(a,b,c,d)
example %>%
rowwise() %>%
dplyr::mutate(cond = cond = names(example)[which(!replace_na(a == c_across(b:d), F)) + 1] %>%
paste(collapse = ",")) %>%
ungroup()
a b c d cond
<dbl> <dbl> <dbl> <dbl> <chr>
1 1 1 1 1 ""
2 2 2 2 2 ""
3 3 3 NA 3 "c"
4 4 4 NA 6 "c,d"