I am trying to build a report with all non-matching values between 2 data frames. I was trying to apply the solution here, but the intersect
function does not work due to number of columns being different.
I am using the comparedf
function from arsenal
package, which does a good job at showing me the differences between dataframes, but I am not sure how to keep the non-matching rows into another data frame or another vector.
here is an example:
df1 <- data.frame(id = c("a", "b", "c", "d","e"),
var = c(1, 2, 3, 4, 5),
var2 = c(1,2,3,4,5))
df2 <- data.frame(id = c("a", "b", "c", "d","e"),
var =c(1,3,4,2,5),
var2 = c(1,2,4,3,5))
library(arsenal)
summary(comparedf(df1, df2, by ="id"))
Which gives the solution here:
Table: Differences detected
var.x var.y id values.x values.y row.x row.y
------ ------ --- --------- --------- ------ ------
var var b 2 3 2 2
var var c 3 4 3 3
var var d 4 2 4 4
var2 var2 c 3 4 3 3
var2 var2 d 4 3 4 4
Is there a way to extract the IDs from this table as a vector? Or subset the df1 using only these IDs would also work.
Edit: I added another variable column because in my real dataset multiple columns are being compared at the same time.
This would return a list of the ids from the comparedf
function
df1 <- data.frame(id = c("a", "b", "c", "d","e"),
var = c(1, 2, 3, 4, 5))
df2 <- data.frame(id = c("a", "b", "c", "d","e"),
var =c(1,3,4,2,5))
library(arsenal)
vec1 <- summary(comparedf(df1, df2, by="id"))
df4 <- vec1$diffs.table
list1 <- df4$id