Search code examples
rfilterinequality

Efficient way to check for inequality between two columns accounting for NAs in R?


When I filter to check for inequality between two columns, I need to be explicit about NAs in the data, because filter keeps rows where the condition evaluates to TRUE, and NAs sort of break that evaluation. Here is what I mean:

Take the following data:

df <- data.frame(x = c(1:4,NA,6:10), y = c(1,2,NA,3,6,NA,7,8,9,11))
df
    x  y
1   1  1
2   2  2
3   3 NA
4   4  3
5  NA  6
6   6 NA
7   7  7
8   8  8
9   9  9
10 10 11

When I want to know which rows are unequal, the following code doesn't cut it:

library(dplyr)
df %>% filter(x != y)

   x  y
1  4  3
2 10 11

Instead, I need to be explicit about NAs:

df %>% filter(x != y | (is.na(x) & !is.na(y)) | (!is.na(x) & is.na(y)))

   x  y
1  3 NA
2  4  3
3 NA  6
4  6 NA
5 10 11

...but that is just too verbose. I tried with other methods, like base subsetting and subset, but this inconvenience persists.

So, is there a more straightforward way to check for inequality accounting for NAs??


Solution

  • Format them first:

    df %>% filter(format(x) != format(y))
    

    giving:

       x  y
    1  3 NA
    2  4  3
    3 NA  6
    4  6 NA
    5 10 11