I need to do some descriptive statistic on a dataset. I need to create a table from a dataset that give me, for each level in a factor the mean of another variable.
city mean(age)
1 14
2 15
3 23
4 34
Which is the fastest way to do it in R?
Another thing that I have to do is the same thing, but on 2 dimensions:
mean(age) male female
city
1 12 13
2 15 16
3 21 22
4 34 33
And I wonder if there is also the possibility to apply also other functions like max, min,sum....
Edit: I add a dataset to create examples easier:
data.frame(years=rep(c(12,13,14,15,15,16,34,67,45,78,17,42),2),sex=rep(c("M","F"),12),city=rep(c(1,2,3,4,4,3,2,1),3))
Could try (added data.table
package for faster dcast
on big data sets)
library(data.table)
library(reshape2)
dcast.data.table(setDT(dato), city ~ sex, value.var = "years", fun = mean)
# city F M
# 1: 1 41.33333 24.00000
# 2: 2 35.66667 21.66667
# 3: 3 35.66667 21.66667
# 4: 4 41.33333 24.00000
You could also just use data.table
in a regular way
dato <- setkey(setDT(dato)[, list(mean = mean(years)), by = list(city, sex)])
# city sex mean
# 1: 1 F 41.33333
# 2: 1 M 24.00000
# 3: 2 F 35.66667
# 4: 2 M 21.66667
# 5: 3 F 35.66667
# 6: 3 M 21.66667
# 7: 4 F 41.33333
# 8: 4 M 24.00000
Or dplyr
package (also very fast)
library(dplyr)
dato %>%
group_by(city, sex) %>%
summarize(mean(years))
# city sex mean(years)
# 1 1 F 41.33333
# 2 1 M 24.00000
# 3 2 F 35.66667
# 4 2 M 21.66667
# 5 3 F 35.66667
# 6 3 M 21.66667
# 7 4 F 41.33333
# 8 4 M 24.00000