How can I group variables by several criteria in a data frame to count the number of values greater than each grouped mean?

I am trying to count data on a big data frame. First I estimated means by group, and I 've calculated means for each group of each variable. I you want to count the occurrences of values in each group that are greater than their respective group means.

I have a data frame simmilar to this:

stage   sex cort    granu   
a   m   0.53    50
a   m   0.65    93
a   f   3.18    49
a   m   1.51    45
a   m   0.28    57
a   f   0.50    36
a   f   1.20    21
a   f   2.36    58
b   m   0.27    74
b   m   0.16    36
b   f   0.42    27
b   m   0.88    65
b   f   2.19    85
b   m   0.24    74
b   f   0.89    52
b   m   0.34    98
b   f   2.59    52
b   f   0.74    63
c   m   6.36    54
c   f   2.07    24
c   m   1.46    12
c   m   2.07    36
c   f   1.46    75
c   m   2.19    96
c   f   6.36    21
c   f   2.19    85

I would like to count the number of values that are greater than the mean of the group, to the moment I did something like this




mean_c<- aggregate(x= data$cort,
                       by = list(data$sex,data$stage),      
                       FUN = mean)

cah<-dplyr::filter(data,stage=="a" & sex=="h")
cah<-dplyr::select(cah, c(cort))
countah<-corta>1.81 #mean for the group (a,h)

cam<-dplyr::filter(data,stage=="a" & sex=="m")
cam<-dplyr::select(cam, c(cort))
countam<-corta>0.74 #mean for the group (a,m)

sum(countah)

But, I need to do the same with several variables in all the stages (a,b and c) and with both sexes (f,m). I can repeat the last four lines with each mean, but I think there may be an easier and less redundant way to do this but I can't think of any.

Solution

The functions group_by() and summarise() are what you need:

library(dplyr)

df <- read.table(text = "stage   sex cort    granu   
a   m   0.53    50
a   m   0.65    93
a   f   3.18    49
a   m   1.51    45
a   m   0.28    57
a   f   0.50    36
a   f   1.20    21
a   f   2.36    58
b   m   0.27    74
b   m   0.16    36
b   f   0.42    27
b   m   0.88    65
b   f   2.19    85
b   m   0.24    74
b   f   0.89    52
b   m   0.34    98
b   f   2.59    52
b   f   0.74    63
c   m   6.36    54
c   f   2.07    24
c   m   1.46    12
c   m   2.07    36
c   f   1.46    75
c   m   2.19    96
c   f   6.36    21
c   f   2.19    85", header = TRUE, check.names = FALSE)

df1 <- df %>%
  group_by(stage, sex) %>%
  summarise(sum_gt_mean = sum(cort > mean(cort)), .groups = "drop")

df1
# A tibble: 6 × 3
stage    sex    sum_gt_mean
<chr>   <chr>     <int>
1 a       f         2
2 a       m         1
3 b       f         2
4 b       m         1
5 c       f         1
6 c       m         1