Search code examples
rdplyrpurrr

dplyr – get certain summary statics for multiple columns of a dataframe


I want to create a summary statistics table for some summary functions for multiple variables. I've managed to do it using summarise and across, but I get a wide dataframe which is hard to read. Is there a better alternative (perhaps using purrr), or is there an easy way of reshaping the data?

Here is a reproducible example (the funs list contains additional functions I've created myself):

data <- as.data.frame(cbind(estimator1 = rnorm(3), 
                            estimator2 = runif(3)))
funs <- list(mean = mean, median = median)

If I use summarise and across I obtain:

estimator1_mean estimator1_median estimator2_mean estimator2_median
0.9506083          1.138536       0.5789924         0.7598719

What I would like to obtain is:

         estimator1 estimator2
mean     0.9506083  0.5789924        
median   1.138536   0.7598719

Solution

  • You can use pivot_longer() with .value (".value" indicates that the corresponding component of the column name defines the name of the output column containing the cell values, overriding values_to entirely, see here), eg.

      library(dplyr)  
      data |>
        summarise(across(everything(), list(mean = mean, median = median, var = var))) |>
        tidyr::pivot_longer(cols = everything(), names_to = c(".value", "stats"), names_sep = "_")
    
      stats  estimator1 estimator2
      <chr>       <dbl>      <dbl>
    1 mean        0.221    0.448  
    2 median      0.110    0.429  
    3 var         0.770    0.00288