Search code examples
raggregatequantile

List values in all rows and columns based on all unique values in another column to find 80th percentile value for each unique value


I have a data frame of 100 rows and 20 column. Column 1 contains IDs which can be grouped. Column 2 to 20 contains numeric integer values. The aim is to find the 75 percentile value for each unique ID in column 1 considering all values in column 2 to 20.

What I have tried?

I have used the code in the following link using aggregate() function.

aggregate(GO~GeneID,data=df,FUN = function(x) paste0(x,collapse = '; '))

How to list row values in a column based on grouping value in R?

However, you could use this code to make a list of values for one column at a time for all unique IDs in column 1. So the 80th percentile of the list of values when derived is only the 80th percentile for the values in that column. What I want is 80th percentile of all values in all columns for each unique ID in column 1.


Solution

  • A data.table grouping operation with .SD makes it easy.

    library(data.table)
    
    setDT(df)[,.(p75 = quantile(unlist(.SD), 0.75)), keyby = id]
    #>     id    p75
    #>  1:  1 797.00
    #>  2:  2 786.00
    #>  3:  3 753.75
    #>  4:  4 724.50
    #>  5:  5 793.75
    #>  6:  6 781.75
    #>  7:  7 808.00
    #>  8:  8 788.50
    #>  9:  9 761.00
    #> 10: 10 784.00
    

    Data:

    set.seed(1935200967)
    
    df <- cbind(
      data.frame(id = sample(10, 100, 1)),
      matrix(sample(1e3, 1900, 1), 100)
    )