Search code examples
rdplyrsummarize

How to combine two different dplyr summaries in a single command


I am trying to create a grouped summary that reports the number of records in each group and then also shows the means of a series of variables.

I can only work out how to do this as two separate summaries which I then join together. This works fine but I wonder if there is a more elegant way to do this?

dailyn<-daily %>% # this summarises n
  group_by(type) %>%
  summarise(n=n()) %>%

dailymeans <- daily %>% # this summarises the means
  group_by(type) %>%
  summarise_at(vars(starts_with("d.")),funs(mean(., na.rm = TRUE))) %>%

dailysummary<-inner_join(dailyn,dailymeans) #this joins the two parts together

The data I'm working with is a dataframe like this:

daily<-data.frame(type=c("A","A","B","C","C","C"),
                  d.happy=c(1,5,3,7,2,4),
                  d.sad=c(5,3,6,3,1,2))

Solution

  • You can do this in one call, by grouping, using mutate instead of summarize, and then use slice() to keep the first row of each type:

    daily %>% group_by(type) %>% 
      mutate(n = n()) %>% 
      mutate_at(vars(starts_with("d.")),funs(mean(., na.rm = TRUE))) %>% 
      slice(1L)
    

    Edit: It might be clearer how this works, in this modified example

    daily_summary <- daily %>% group_by(type) %>% 
      mutate(n = n()) %>% 
      mutate_at(vars(starts_with("d.")),funs("mean" = mean(., na.rm = TRUE)))
    
    daily_summary
    # Source: local data frame [6 x 6]
    # Groups: type [3]
    # 
    # # A tibble: 6 x 6
    #    type d.happy d.sad     n d.happy_mean d.sad_mean
    #  <fctr>   <dbl> <dbl> <int>        <dbl>      <dbl>
    #1      A       1     5     2     3.000000          4
    #2      A       5     3     2     3.000000          4
    #3      B       3     6     1     3.000000          6
    #4      C       7     3     3     4.333333          2
    #5      C       2     1     3     4.333333          2
    #6      C       4     2     3     4.333333          2
    
    daily_summary %>% 
      slice(1L)
    
    # Source: local data frame [3 x 6]
    # Groups: type [3]
    # 
    # # A tibble: 3 x 6
    #    type d.happy d.sad     n d.happy_mean d.sad_mean
    #  <fctr>   <dbl> <dbl> <int>        <dbl>      <dbl>
    #1      A       1     5     2     3.000000          4
    #2      B       3     6     1     3.000000          6
    #3      C       7     3     3     4.333333          2