Search code examples
rrow

Deleting rows for which a column value appears fewer than a given number of times conditional on uniqueness


I have data that looks like

date player market 
1-1     1     1
1-1     2     1
1-1     1     2
1-2     2     1
1-2     3     2
1-2    12     3
1-2    11     3
1-2     2     3
1-3    24     3
1-3     4     4
1-3    21     1
1-4     1     1
1-4    51     1
1-4     1     1
1-5     1     2

I want to delete all rows related to the column Market that have appear in fewer than in three unique dates.

So this would delete all rows except for those in which the Market column is 1 or 2. This is a simplified version of what I am doing, so looping is out of the question.

Is there a quick way to do this?

There are 60,000 markets.


Solution

  • df is your data.frame to begin with -

    library(data.table)
    dt <- data.table(df)
    dt[,Freq := length(unique(date)), by = market]
    dt2 <- dt[Freq > 2]
    

    Output -

    > dt2
        date player market Freq
     1:  1-1      1      1    4
     2:  1-1      2      1    4
     3:  1-1      1      2    3
     4:  1-2      2      1    4
     5:  1-2      3      2    3
     6:  1-3     21      1    4
     7:  1-4      1      1    4
     8:  1-4     51      1    4
     9:  1-4      1      1    4
    10:  1-5      1      2    3