Search code examples
rdplyrapplylapplysummarize

How to find the quantiles of each variable of a data.frame


I have a dataframe with multiple variables and I would like to find the quantiles () of each of these variables

Sample code:

testtable = data.frame(groupvar = c(rep('x',100), rep('y',100)), 
                       numericvar = rnorm(200))

I want to apply quantile(., c(.05, .1, .25, .5, .75, .9, .95)) to each of the variables in testtable. The ideal result would look like

   x    y
  .05 .05
  .1  .1
  .25 .25
  .5  .5
  .75 .75
  .9  .9
  .95 .95

where each entry is a quantile of x or y. For sample, .05 is the 5th percentile of the x .1 is the 10th percentile distribution of x, etc.

I tried summarise in dplyr but ran into a problem because my quantile function is returning a vector of length 7.

What is the best way to do this?


Solution

  • Here is a base R solution where we unstack the data frame and calculating the quantile for each column, for each quantile, i.e.

    sapply(unstack(testtable, numericvar ~ groupvar), function(i) quantile(i, v1))
    

    which gives,

                  x           y
    5%  -1.82980882 -1.49900735
    10% -1.26047295 -1.02626933
    25% -0.83928910 -0.68248217
    50%  0.02757385 -0.02096953
    75%  0.64842517  0.48624513
    90%  1.63382801  1.09722178
    95%  1.91104161  1.72846846
    

    where v1 <- c(0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95)