Search code examples
rdplyr

Getting a summary of a dataframe that will have a column of dataframes containing the data of each group in R


I would normally use the following code to get what I want, but was wondering what would be a good way to do it now that cur_data is being deprecated.

df <- data.frame(col_a = c("a","a","b","c","c"),
  col_b=c(1,2,3,4,5),
  col_c=c("q","w","e","r","t"))
dfdata <- df %>% dplyr::group_by(col_a) %>% dplyr::summarise(data = list(dplyr::cur_data()))
dfdata
# A tibble: 3 × 2
  col_a data
  <chr> <list>
1 a     <tibble [2 × 2]>
2 b     <tibble [1 × 2]>
3 c     <tibble [2 × 2]>
dfdata$data[[1]]
# A tibble: 2 × 2
  col_b col_c
  <dbl> <chr>
1     1 q
2     2 w

Thank you!

I have been looking through the documentation of pick() and reframe() but wasn't able to figure anything out.


Solution

  • I would use tidyr::nest():

    library(tidyr)
    
    dfdata <- df %>% 
      nest(.by = col_a, .key = "data")
    

    But an approach similar to yours using pick() would be:

    library(dplyr)
    
    df %>% 
      group_by(col_a) %>% 
      summarise(data = list(pick(everything())))
    

    Result from either approach:

    #> dfdata
    # A tibble: 3 × 2
      col_a data
      <chr> <list>
    1 a     <tibble [2 × 2]>
    2 b     <tibble [1 × 2]>
    3 c     <tibble [2 × 2]>
    
    #> dfdata$data[[1]]
    # A tibble: 2 × 2
      col_b col_c
      <dbl> <chr>
    1     1 q
    2     2 w