Search code examples
rdataframedplyrsummarize

Find mean of counts within groups


I have a dataframe that looks like this:

library(tidyverse)    
x <- tibble(
   batch = rep(c(1,2), each=10),
   exp_id = c(rep('a',3),rep('b',2),rep('c',5),rep('d',6),rep('e',4))
 )

I can run the code below to get the count perexp_id:

x %>% group_by(batch,exp_id) %>% 
  summarise(count=n())  

which generates:

  batch exp_id count
  <dbl> <chr>  <dbl>
1     1 a          3
2     1 b          2
3     1 c          5
4     2 d          6
5     2 e          4

A really ugly way to generate the mean of these counts is:

x %>% group_by(batch,exp_id) %>% 
  summarise(count=n()) %>% 
  ungroup() %>% 
  group_by(batch) %>% 
  summarise(avg_exp = mean(count))

which generates:

  batch avg_exp
  <dbl>   <dbl>
1     1    3.33
2     2    5 

Is there a more succinct and "tidy" way generate this?


Solution

  • library(dplyr)
    group_by(x, batch) %>%
      summarize(avg_exp = mean(table(exp_id)))
    # # A tibble: 2 x 2
    #   batch avg_exp
    #   <dbl>   <dbl>
    # 1     1    3.33
    # 2     2    5