I am trying to count data on a big data frame. First I estimated means by group, and I 've calculated means for each group of each variable. I you want to count the occurrences of values in each group that are greater than their respective group means.
I have a data frame simmilar to this:
stage sex cort granu
a m 0.53 50
a m 0.65 93
a f 3.18 49
a m 1.51 45
a m 0.28 57
a f 0.50 36
a f 1.20 21
a f 2.36 58
b m 0.27 74
b m 0.16 36
b f 0.42 27
b m 0.88 65
b f 2.19 85
b m 0.24 74
b f 0.89 52
b m 0.34 98
b f 2.59 52
b f 0.74 63
c m 6.36 54
c f 2.07 24
c m 1.46 12
c m 2.07 36
c f 1.46 75
c m 2.19 96
c f 6.36 21
c f 2.19 85
I would like to count the number of values that are greater than the mean of the group, to the moment I did something like this
mean_c<- aggregate(x= data$cort,
by = list(data$sex,data$stage),
FUN = mean)
cah<-dplyr::filter(data,stage=="a" & sex=="h")
cah<-dplyr::select(cah, c(cort))
countah<-corta>1.81 #mean for the group (a,h)
cam<-dplyr::filter(data,stage=="a" & sex=="m")
cam<-dplyr::select(cam, c(cort))
countam<-corta>0.74 #mean for the group (a,m)
sum(countah)
But, I need to do the same with several variables in all the stages (a,b and c) and with both sexes (f,m). I can repeat the last four lines with each mean, but I think there may be an easier and less redundant way to do this but I can't think of any.
The functions group_by()
and summarise()
are what you need:
library(dplyr)
df <- read.table(text = "stage sex cort granu
a m 0.53 50
a m 0.65 93
a f 3.18 49
a m 1.51 45
a m 0.28 57
a f 0.50 36
a f 1.20 21
a f 2.36 58
b m 0.27 74
b m 0.16 36
b f 0.42 27
b m 0.88 65
b f 2.19 85
b m 0.24 74
b f 0.89 52
b m 0.34 98
b f 2.59 52
b f 0.74 63
c m 6.36 54
c f 2.07 24
c m 1.46 12
c m 2.07 36
c f 1.46 75
c m 2.19 96
c f 6.36 21
c f 2.19 85", header = TRUE, check.names = FALSE)
df1 <- df %>%
group_by(stage, sex) %>%
summarise(sum_gt_mean = sum(cort > mean(cort)), .groups = "drop")
df1
# A tibble: 6 × 3
stage sex sum_gt_mean
<chr> <chr> <int>
1 a f 2
2 a m 1
3 b f 2
4 b m 1
5 c f 1
6 c m 1