I have two large datasets in newdata & olddata dataframes with ID as primary column key & 30 other columns. While most of the IDs are common between new & old, there are few old IDs that are deleted & not present in new and vice versa, new IDs are added in New that were not present in Old.
now, i am trying to put label to all 3 categories as Common, New & Old and while New & Common are working fine, i am not having so much success with Old. Not Sure what am i missing and any pointers would be appreciated
commonIDs<-data.frame(intersect(old_data$IDs,new_data$IDs))
#New Ids
added <- NULL
added <- new_data[!new_data$IDs %in% commonIDs$IDs,]
added <- data.frame(Remarks ="New", added)
#Deleted IDs
deleted <- NULL
deleted <- old_data[!old_data$IDs %in% commonIDs$IDs,]
I also tried another approach but still not luck
new <- data.frame(new_data[is.na(match(new_data$IDs,commonIDs$IDs)),])
old <- data.frame(old_data[is.na(match(old_data$IDs,commonIDs$IDs)),])
You might also consider trying a package developed around this use case.
Using @r2evans' data:
dat <- data.frame(id = 1:10)
dat1 <- dat[-(3:4),,drop=FALSE]
dat2 <- dat[-(6:10),,drop=FALSE]
We could run:
library(waldo)
compare(dat1, dat2)
and get this output (pasted as picture to retain color formatting), which shows us that dat1
is missing the data and rownames of rows 3-4, and dat2
is missing those for rows 6-10.