Search code examples
rdplyrapplytidyversesummary

Creating list of lists with summary statistics for input to summary_table() in R


I am following the instructions laid out here to create a clean table of summary statistics.

In these instructions, the input to the summary_table() function is a list of lists, as shown here:

our_summary1 <-
  list("Miles Per Gallon" =
   list("min" = ~ min(.data$mpg),
        "max" = ~ max(.data$mpg),
        "mean (sd)" = ~ qwraps2::mean_sd(.data$mpg)),
   "Displacement" =
   list("min" = ~ min(.data$disp),
        "median" = ~ median(.data$disp),
        "max" = ~ max(.data$disp),
        "mean (sd)" = ~ qwraps2::mean_sd(.data$disp)),
   "Weight (1000 lbs)" =
   list("min" = ~ min(.data$wt),
        "max" = ~ max(.data$wt),
        "mean (sd)" = ~ qwraps2::mean_sd(.data$wt)),
   "Forward Gears" =
   list("Three" = ~ qwraps2::n_perc0(.data$gear == 3),
        "Four"  = ~ qwraps2::n_perc0(.data$gear == 4),
        "Five"  = ~ qwraps2::n_perc0(.data$gear == 5))
   )

I have 48 variables in my dataset, and each variable has its own column. Is there a cleaner way for me to cycle through all the columns in my dataframe to create an object like the one above without typing it out manually like this? I would ideally prefer a solution using the tidyverse.

One thing I was considering doing was changing my data to long format, then using group_by() to group by each of the columns from the original data, then using summarise(). However, my understanding is that this would yield a single list, not a list of lists like is necessary for summary_table().

If there is a completely different way of creating a summary table than what I am trying to do here, please let me know. This one looked the neatest of the options I was considering. For each variable, I'd like to be able to rename it and include the minimum value, maximum value, mean, and standard deviation.


Solution

  • As you noted, you could turn your data to a longer format and use summarize(). The trick is to create a list column within each summarize:

    library(dplyr)
    library(tidyr)
    
    summarized <- mtcars %>%
      pivot_longer(cols = c(mpg, wt, disp)) %>%
      group_by(name) %>%
      summarize(lst = list(list(mean = mean(value),
                                max = max(value),
                                min = min(value),
                                sd = sd(value))))
    
    summarized
    #> # A tibble: 3 x 2
    #>   name  lst             
    #> * <chr> <list>          
    #> 1 disp  <named list [4]>
    #> 2 mpg   <named list [4]>
    #> 3 wt    <named list [4]>
    

    This can then be turned into a list of lists with deframe() from the tibble package.

    library(tibble)
    result <- deframe(summarized)
    
    str(result)
    #> List of 3
    #>  $ disp:List of 4
    #>   ..$ mean: num 231
    #>   ..$ max : num 472
    #>   ..$ min : num 71.1
    #>   ..$ sd  : num 124
    #>  $ mpg :List of 4
    #>   ..$ mean: num 20.1
    #>   ..$ max : num 33.9
    #>   ..$ min : num 10.4
    #>   ..$ sd  : num 6.03
    #>  $ wt  :List of 4
    #>   ..$ mean: num 3.22
    #>   ..$ max : num 5.42
    #>   ..$ min : num 1.51
    #>   ..$ sd  : num 0.978