Search code examples
rcategoriessummary

R Summarise (aggregate) a data frame with counts or statistics


I would like to summarise (or aggregate) a data frame so that factors are summarised as counts and numbers are summarised by means.

So for df1, I would like the output summarised by cat1 to be as df2. Thanks.

id1 <- 1:10
cat1 <- c("A","A","A","B","C","C","C","C","C","C")
fact1 <- c("M","M","F","M","F","F","M","M","M","M")
set.seed(11)
num1 <- runif(10)

df1 <- data.frame(id1, cat1, fact1, num1)
df1$cat1 <- as.factor(df1$cat1)

cat2 <- c("A","B","C")
fact2.F <- c(1, 0, 2)
fact2.M <- c(2, 1, 4)
num2.mean<- c(0.2627922, 0.01404791, 0.3999875)

df2 <- data.frame(cat2, fact2.F, fact2.M, num2.mean)

To re-iterate, the summary/aggregation should be done as counts for each level of a factor and as a mean for numeric data. So for cat1=="A" there are two "M"'s and one "F".


Solution

  • A base R approach, combining 2 aggregates, one for the mean of num1 and one for the sum of fact1

    cbind(aggregate(fact1 ~ cat1, df1, function(x) 
            sapply(unique(df1$fact1), function(y) sum(x %in% y))), 
          aggregate(num1 ~ cat1, df1, mean)[-1])
      cat1 fact1.M fact1.F       num1
    1    A       2       1 0.26279216
    2    B       1       0 0.01404791
    3    C       4       2 0.39998755