I want to create a summary statistics table for some summary functions for multiple variables. I've managed to do it using summarise
and across
, but I get a wide dataframe which is hard to read. Is there a better alternative (perhaps using purrr
), or is there an easy way of reshaping the data?
Here is a reproducible example (the funs
list contains additional functions I've created myself):
data <- as.data.frame(cbind(estimator1 = rnorm(3),
estimator2 = runif(3)))
funs <- list(mean = mean, median = median)
If I use summarise
and across
I obtain:
estimator1_mean estimator1_median estimator2_mean estimator2_median
0.9506083 1.138536 0.5789924 0.7598719
What I would like to obtain is:
estimator1 estimator2
mean 0.9506083 0.5789924
median 1.138536 0.7598719
You can use pivot_longer()
with .value
(".value
" indicates that the corresponding component of the column name defines the name of the output column containing the cell values, overriding values_to
entirely, see here), eg.
library(dplyr)
data |>
summarise(across(everything(), list(mean = mean, median = median, var = var))) |>
tidyr::pivot_longer(cols = everything(), names_to = c(".value", "stats"), names_sep = "_")
stats estimator1 estimator2
<chr> <dbl> <dbl>
1 mean 0.221 0.448
2 median 0.110 0.429
3 var 0.770 0.00288