Search code examples
rdplyrset-difference

How to print observations existing in one dataframe and missing from the other r?


I have a one dataframe with 3719 (actual data) rows and another with 3721 (from coding) rows. I got 2 extra observations.

I have tried with setdiff but it is giving zero rows

dplyr::setdiff(d1,d2)

o/p: [1] col1 col2 col3 col4       
     [5] col5 col6                
<0 rows> (or 0-length row.names)

I have tried vise versa also,i.e,

dplyr::setdiff(d2,d1)

o/p: [1] col1 col2 col3 col4       
     [5] col5 col6                
<0 rows> (or 0-length row.names)

How to identify those 2 extra observation in R?


Solution

  • Option 1 You can use the %in% operator

    #Make Fake Data
    a <- mtcars
    b <- mtcars[ 3:nrow(mtcars) , ] 
    
    a$id <- rownames( a )
    b$id <- rownames( b )
    
    #In A not B
    a[ !(a$id %in% b$id) , ]
    #In B not A
    b[ !(b$id %in% a$id) , ]
    

    Option 2 - use merge with all=T

    a$flaga <- 1
    b$flagb <- 1
    
    d <- merge( a[ ,c("id","flaga")] , b[ ,c("id","flagb")], by= "id" , all=T)
    
    d[ is.na(d$flaga) | is.na(d$flagb) , "id" ]