Search code examples
rdplyrsummarize

dplyr: summarise each column and return list columns


I am looking to summarize each column in a tibble with a custom summary function that will return different sized tibbles depending on the data.

Let’s say my summary function is this:

mysummary <- function(x) {quantile(x)[1:sample(1:5, 1)] %>% as_tibble}

It can be applied to one column as such:

cars %>% summarise(speed.summary = list(mysummary(speed)))

But I can't figure out a way to achieve this using summarise_all (or something similar).

Using the cars data, the desired output would be:

tribble(
~speed.summary,        ~dist.summary, 
mysummary(cars$speed), mysummary(cars$dist)
)

# A tibble: 1 x 2
  speed.summary    dist.summary    
  <list>           <list>          
1 <tibble [5 x 1]> <tibble [2 x 1]>    

Of course the actual data has many more columns...

Suggestions?


Solution

  • We can use

    res <- cars %>%
            summarise_all(funs(summary = list(mysummary(.)))) %>% 
            as.tibble
    res
    # A tibble: 1 x 2
    #   speed_summary    dist_summary    
    #  <list>           <list>          
    #1 <tibble [3 x 1]> <tibble [2 x 1]>
    
    res$speed_summary
    #[[1]]
    # A tibble: 3 x 1
    #   value
    #* <dbl>
    #1  4.00
    #2 12.0 
    #3 15.0