I would like to summarise (or aggregate) a data frame so that factors are summarised as counts and numbers are summarised by means.
So for df1, I would like the output summarised by cat1 to be as df2. Thanks.
id1 <- 1:10
cat1 <- c("A","A","A","B","C","C","C","C","C","C")
fact1 <- c("M","M","F","M","F","F","M","M","M","M")
set.seed(11)
num1 <- runif(10)
df1 <- data.frame(id1, cat1, fact1, num1)
df1$cat1 <- as.factor(df1$cat1)
cat2 <- c("A","B","C")
fact2.F <- c(1, 0, 2)
fact2.M <- c(2, 1, 4)
num2.mean<- c(0.2627922, 0.01404791, 0.3999875)
df2 <- data.frame(cat2, fact2.F, fact2.M, num2.mean)
To re-iterate, the summary/aggregation should be done as counts for each level of a factor and as a mean for numeric data. So for cat1=="A" there are two "M"'s and one "F".
A base R approach, combining 2 aggregate
s, one for the mean
of num1 and one for the sum
of fact1
cbind(aggregate(fact1 ~ cat1, df1, function(x)
sapply(unique(df1$fact1), function(y) sum(x %in% y))),
aggregate(num1 ~ cat1, df1, mean)[-1])
cat1 fact1.M fact1.F num1
1 A 2 1 0.26279216
2 B 1 0 0.01404791
3 C 4 2 0.39998755