Search code examples
rdplyrsummarize

dplyr summarise_all with quantile and other functions


I have a dataframe PatientA

    Height Weight   Age   BMI
    <dbl>  <dbl> <dbl> <dbl>
 1   161    72.2    27  27.9
 2   164    61.0    21  22.8
 3   171    72.0    30  24.6
 4   169.   63.9    25  22.9
 5   174.   64.4    27  21.1
 6   160    50.9    22  19.9
 7   172    77.5    22  26.3
 8   165    54.5    22  20  
 9   173    82.4    29  27.5
10   169    76.6    22  26.9

and I would like to get some statistics for each column. I have the next working code which deals only with quantiles

genStat <- PatientsA  %>%
  summarise_all(funs(list(quantile(., probs = c(0.25, 0.5, 0.75))))) %>%
  unnest %>%
  transpose %>%
  setNames(., c('25%', '50%', '75%')) %>%
  map_df(unlist) %>%
  bind_cols(data.frame(vars = names(PatientsA)), .)

and I need to add mean and sd to summarise_all like this

genStat <- PatientsA  %>%
      summarise_all(funs(mean,sd,list(quantile(., probs = c(0.25, 0.5, 0.75))))) %>%
      unnest %>%
      transpose %>%
      setNames(., c('mean','sd','25%', '50%', '75%')) %>%
      map_df(unlist) %>%
      bind_cols(data.frame(vars = names(PatientsA)), .)

This straightforward approach fails returning the next error:

Error in names(object) <- nm : 'names' attribute [5] must be the same length as the vector [3]

I'm a newbie in R, so what is the right syntax for completing this task?


Solution

  • This is what I would suggest. There is a little repetition in the code (calling quantile three times) but overall I think it is easier to understand and debug.

    library(tidyverse)    
    
    PatientsA %>% 
      gather("variable", "value") %>% 
      group_by(variable) %>% 
      summarize(mean_val = mean(value), 
                sd_val = sd(value), 
                q25 = quantile(value, probs = .25),
                q50 = quantile(value, probs = .5),
                q75 = quantile(value, probs = .75))
    
    
    ## A tibble: 4 x 6
    #  variable mean_val sd_val   q25   q50   q75
    #  <chr>       <dbl>  <dbl> <dbl> <dbl> <dbl>
    #1 Age          24.7   3.33  22    23.5  27  
    #2 BMI          24.0   3.08  21.5  23.8  26.7
    #3 Height      168.    5.01 164.  169   172. 
    #4 Weight       67.5  10.3   61.7  68.2  75.5