Search code examples
rduplicatesunique

Select multiple duplicate rows based on specific values in next column


This is a follow up question to Kikapp's answer.

I want to remove participant IDs which lack all the time-points. In other way around, I want to select rows which have all the four time (11, 21, 31, 41). See the sample data dropbox link

Here is my try based on Kikapp's answer. For some reason, it doesn't work. Let me know how to make it better.

data2 <- df[df$ID %in% names(table(df$ID))[table(df$ID) > 3],] 

I get 4695 rows or objects or IDs for time == 11, time == 21,time == 41 while 4693 for time == 31; however, I want they should be equal.


Solution

  • You can use dplyr for this task for a much faster result:

    df1 <- df %>% group_by(ID) %>% 
        dplyr::filter(length(unique(time)) == 4) %>% 
        data.frame()
    

    However, when you get the counts of ID's for each time value you will find out that there are 32's hidden there (2 rows in total):

    df1 %>% group_by(time) %>% 
        dplyr::summarise(Counts = n()) %>% 
        data.frame()
    
    #Output:
    time Counts
     11   4695  
     21   4695  
     31   4693  
     32      2  
     41   4695 
    

    This shows that you have 2 rows with values of 32. As it turns out, that was due to a typo on your part. So you can change them with df$time[df$time == 32] <- 31 and run the code again.

    I hope this was helpful.

    Thanks!