Search code examples
rdataframedplyrdata-analysispercentile

how to filter top 10 percentile of a column in a data frame group by id using dplyr


I have the following data frame:

id   total_transfered_amount day
1       1000                 2
1       2000                 3
1       3000                 4
1       1000                 1
1       10000                4
2       5000                 3
2       6000                 4
2       40000                2
2       4000                 3
2       4000                 3
3       1000                 1
3       2000                 2
3       3000                 3
3       30000                3
3       3000                 3

Need to filter out rows that fall above 90 percentile in 'total_transfered_amount' column for every id seperately using dplyr package preferabely , for example I need to filter out following rows:

2       40000                2
3       30000                3

Solution

  • We can use data.table

     library(data.table)
     setDT(df1)[,.SD[quantile(total_transfered_amount, 0.9) < 
                    total_transfered_amount] , by = id]
     #    id total_transfered_amount day
     #1:  1                   10000   4
     #2:  2                   40000   2
     #3:  3                   30000   3
    

    Or we can use base R

    df1[with(df1, as.logical(ave(total_transfered_amount, id, 
                  FUN=function(x) quantile(x, 0.9) < x))),]
    #   id total_transfered_amount day
    #5   1                   10000   4
    #8   2                   40000   2
    #14  3                   30000   3