Search code examples
rsubset

Remove groups with less than three unique observations


I would like to subset my data frame to keep only groups that have 3 or more observations on DIFFERENT days. I want to get rid of groups that have less than 3 observations, or the observations they have are not from 3 different days.

Here is a sample data set:

Group   Day
1       1 
1       3
1       5
1       5
2       2
2       2  
2       4 
2       4
3       1
3       2
3       3
4       1
4       5

So for the above example, group 1 and group 3 will be kept and group 2 and 4 will be removed from the data frame.

I hope this makes sense, I imagine the solution will be quite simple but I can't work it out (I'm quite new to R and not very fast at coming up with solutions to things like this). I thought maybe the diff function could come in handy but didn't get much further.


Solution

  • With you could do:

    library(data.table)
    DT[, if(uniqueN(Day) >= 3) .SD, by = Group]
    

    which gives:

       Group Day
    1:     1   1
    2:     1   3
    3:     1   5
    4:     1   5
    5:     3   1
    6:     3   2
    7:     3   3
    

    Or with dplyr:

    library(dplyr)
    DT %>% 
      group_by(Group) %>% 
      filter(n_distinct(Day) >= 3)
    

    which gives the same result.