When I filter to check for inequality between two columns, I need to be explicit about NAs in the data, because filter keeps rows where the condition evaluates to TRUE, and NAs sort of break that evaluation. Here is what I mean:
Take the following data:
df <- data.frame(x = c(1:4,NA,6:10), y = c(1,2,NA,3,6,NA,7,8,9,11))
df
x y
1 1 1
2 2 2
3 3 NA
4 4 3
5 NA 6
6 6 NA
7 7 7
8 8 8
9 9 9
10 10 11
When I want to know which rows are unequal, the following code doesn't cut it:
library(dplyr)
df %>% filter(x != y)
x y
1 4 3
2 10 11
Instead, I need to be explicit about NAs:
df %>% filter(x != y | (is.na(x) & !is.na(y)) | (!is.na(x) & is.na(y)))
x y
1 3 NA
2 4 3
3 NA 6
4 6 NA
5 10 11
...but that is just too verbose. I tried with other methods, like base subsetting and subset
, but this inconvenience persists.
So, is there a more straightforward way to check for inequality accounting for NAs??
Format them first:
df %>% filter(format(x) != format(y))
giving:
x y
1 3 NA
2 4 3
3 NA 6
4 6 NA
5 10 11