I am using ddply in R and I break the data in two different ways, but I want a subtotal of both. This is the function I am using
require(plyr)
dfx <- data.frame(
group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
sex = sample(c("M", "F"), size = 29, replace = TRUE),
age = runif(n = 29, min = 18, max = 54)
)
# Note the use of the '.' function to allow
# group and sex to be used without quoting
ddply(dfx, .(group, sex), summarize,
mean = round(mean(age), 2),
sd = round(sd(age), 2))
I also want to summarize (mean, sd) by group and (mean,sd) summary of the entire data set. Is there a way to include this in the same ddply?
You can replicate the data 4 times: - including sex and group - including sex - including group - not including any column
The columns that are not included become "all"
require(plyr)
dfx <- data.frame(
group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
sex = sample(c("M", "F"), size = 29, replace = TRUE),
age = runif(n = 29, min = 18, max = 54)
)
# replicate the data not taking account of one or more attributed
dfAll <- dfx
dfAll$group <- "all"
dfAll$sex <- "all"
dfGroup <- dfx
dfGroup$group <- "all_group"
dfSex <- dfx
dfSex$group <- "all_sex"
dfToGroup <- rbind(dfx, dfGroup, dfSex, dfAll)
# Note the use of the '.' function to allow
# group and sex to be used without quoting
ddply(dfToGroup, .(group, sex), summarize,
mean = round(mean(age), 2),
sd = round(sd(age), 2))