Search code examples
rduplicatesunique

Extract rows that have duplicates for certain column but are unique in another column


I'm am doing some coding in R. I would want to show the rows that have duplicates for columns ID and NAME but have different values for AGE.

For example I have this table:

ID |    NAME | AGE
111|     Mark| 22
222|     Anne| 21
333|    Chery| 30
444|    Megan| 16
555|  Charles| 37
111|     Mark| 23
222|     Anne| 22
333|    Chery| 30
111|     Mark| 22

As of now I have this code:

readfile <- read.csv(file='/home/user/shane/names.csv')
dat <- data.frame(ID=c(readfile$ID),NAME=c(readfile$NAME),AGE=c(readfile$AGE))
nam <- duplicated(dat[,c('ID','NAME)]) | duplicated(dat[,c('ID','NAME], fromLast = TRUE)
readfile[nam,]

The output looks like this:

ID |    NAME | AGE
111|     Mark| 22
222|     Anne| 21
333|    Chery| 30
111|     Mark| 23
222|     Anne| 22
333|    Chery| 30
111|     Mark| 22

I would want the output to be:

ID |    NAME | AGE
111|     Mark| 22
222|     Anne| 21
111|     Mark| 23
222|     Anne| 22
111|     Mark| 22

I would want to remove the columns with the ID = 333 as they have the same value in Age. would anyone have a suggestion?


Solution

  • I just tweaked your code :)

    library(plyr) 
    
    dat1 <- ddply(dat, .(ID, NAME, AGE), nrow) 
    dat2 <- merge(dat1, dat, by=c("ID", "NAME", "AGE")) 
    dat3 <- dat2[!(!duplicated(dat2[, 1:2], fromLast=T) & !duplicated(dat2[, 1:2])),] 
    dat3[dat3$ID %in% dat3[dat3$V1 == 1, 1], 1:3]
    

    Output is:

       ID NAME AGE
    1 111 Mark  22
    2 111 Mark  22
    3 111 Mark  23
    4 222 Anne  21
    5 222 Anne  22
    

    Sample data:

    dat <- data.frame(ID=c(111,222,333,444,555,111,222,333,111), 
                      NAME=c('Mark','Anne','Chery','Megan','Charles','Mark','Anne','Chery','Mark'), 
                      AGE=c(22,21,30,16,37,23,22,30,22)) 
    #   ID    NAME AGE
    #1 111    Mark  22
    #2 222    Anne  21
    #3 333   Chery  30
    #4 444   Megan  16
    #5 555 Charles  37
    #6 111    Mark  23
    #7 222    Anne  22
    #8 333   Chery  30
    #9 111    Mark  22
    


    Update: Corrected formatting for better reading