Search code examples
rindexingdataframedelete-row

R delete rows in data frame where nrow of index is smaller than certain value


I want to delete certain rows in a data frame when the number of rows with the same index is smaller than a pre-specified value.

> fof.6.5[1:15, 1:3]
   draw Fund.ID Firm.ID
1     1    1667     666
2     1    1572     622
3     1    1392     553
4     1     248      80
5     1    3223     332
6     2    2959    1998
7     2    2659    1561
8     2   14233    2517
9     2   10521   12579
10    2    3742    1045
11    3    9093   10121
12    3   15681   21626
13    3   26371   70170
14    4   27633   52720
15    4   13751     656

In this example, I want each index to have 5 rows. The third draw (which is my index) has fewer than 5 rows. How can I delete the draws like the third one if they have fewer than 5 rows?


Solution

  • You could do this using dplyr (assuming your data is in a data frame called dt:

    dt %>% group_by(draw) %>% filter(n() >= 5) %>% ungroup()
    

    Or you could use table or xtabs:

    tab <- xtabs(~ draw, dt)
    
    dt[!dt$draw %in% as.numeric(names(which(tab < 5))), ]