Search code examples
rdplyrcompare

How to extract the IDs of non-matching values between 2 data frames by ID in R?


I am trying to build a report with all non-matching values between 2 data frames. I was trying to apply the solution here, but the intersect function does not work due to number of columns being different.

I am using the comparedf function from arsenal package, which does a good job at showing me the differences between dataframes, but I am not sure how to keep the non-matching rows into another data frame or another vector.

here is an example:

df1 <- data.frame(id = c("a", "b", "c", "d","e"),
                  var = c(1, 2, 3, 4, 5),
                  var2 = c(1,2,3,4,5))
df2 <- data.frame(id = c("a", "b", "c", "d","e"),
                  var =c(1,3,4,2,5),
                  var2 = c(1,2,4,3,5))

library(arsenal)
summary(comparedf(df1, df2, by ="id"))

Which gives the solution here:

Table: Differences detected

var.x   var.y   id   values.x   values.y    row.x   row.y
------  ------  ---  ---------  ---------  ------  ------
var     var     b    2          3               2       2
var     var     c    3          4               3       3
var     var     d    4          2               4       4
var2    var2    c    3          4               3       3
var2    var2    d    4          3               4       4

Is there a way to extract the IDs from this table as a vector? Or subset the df1 using only these IDs would also work.

Edit: I added another variable column because in my real dataset multiple columns are being compared at the same time.


Solution

  • This would return a list of the ids from the comparedf function

    df1 <- data.frame(id = c("a", "b", "c", "d","e"),
                      var = c(1, 2, 3, 4, 5))
    df2 <- data.frame(id = c("a", "b", "c", "d","e"),
                      var =c(1,3,4,2,5))
    library(arsenal)
    vec1 <- summary(comparedf(df1, df2, by="id"))
    df4 <- vec1$diffs.table
    list1 <- df4$id