Search code examples
rdplyr

Getting mean of rows above criteria by group with dplyr


I am trying to get the mean for rows above a certain threshold (by group). Unfortunately, my attempts are not leading to valid results.

Data:

df <- data.frame(
  id=c(1:10),
  group=c("a", "a", "a", "a", "a", "b", "b", "b", "b", "b"),
  b=rnorm(10,5,1)
)
> df
   id group        b
1   1     a 4.154182
2   2     a 5.958000
3   3     a 3.346686
4   4     a 5.689609
5   5     a 5.003576
6   6     b 5.127969
7   7     b 4.841127
8   8     b 3.268419
9   9     b 3.601477
10 10     b 5.796909

Attempts:

df %>%
  dplyr::group_by(group) %>%
  summarise(
    mean=mean(b>4)
  )

df %>%
  dplyr::group_by(group) %>%
  summarise(
    mean=mean(which(b>4))
  )

Solution

  • I think you need b[b>4] before running mean

    df %>%
      summarise(mean = mean(b[b > 4]), .by = group)
    

    or you can do filter before summarise

    df %>%
      filter(b > 4) %>%
      summarise(mean = mean(b), .by = group)