Search code examples
rdataframesummarypercentile

R - Get a summary table containing specified percentile levels for a dataframe


I want to get a summary table that displays more than the typical descriptive statistics generated by the summary(x) function in R. For instance 10% percentile, 90% percentile. Other answers that I found online recommend ways that give the answers but not in a tabulated form.

I was looking for a way that would just add the specified percentile level in the summary table generated by the summary(x) function.

Here's example data:

df = data.frame("a"=seq(1,10), "b"=seq(10,100,10),
                "c"=letters[seq(1,10)], "d"=seq(5,95,10))

enter image description here


Solution

  • # generate data
    df = data.frame("a"=seq(1,10), "b"=seq(10,100,10), "c"=letters[seq(1,10)], "d"=seq(5,95,10))
    
    # filter numerical columns
    ndf = Filter(is.numeric,df)
    features = colnames(ndf)
    
    # percentiles reqd
    p_reqd = c(0,0.10,0.25,0.5,0.75,0.90,0.95,1)   # more percentile levels can be specified here
                                                   # after adding/removing, adjust p_lev as well
    
    # labels for specified percentiles + mean
    p_lev = c('Min','10%','25%','50%','Mean','75%','90%','95%','Max')
    
    # created empty dataframe with row names specified
    final = data.frame(row.names = p_lev)
    
    # loop
    for (i in features) {
      x = ndf[,i]
      sm = data.frame("dStats" = quantile(x, p_reqd))
      final[1:which(rownames(final)=="50%"),i] = sm$dStats[1:which(rownames(sm)=="50%")]
      final[which(rownames(final)=="50%")+1,i] = round(mean(x),2)
      final[(which(rownames(final)=="50%")+2):nrow(final), i] = 
        sm$dStats[(which(rownames(sm)=="50%")+1):nrow(sm)]  
    }
    
    # custom summary table
    final
    

    enter image description here