Search code examples
rplyrsubtotal

subtotal with ddply in R


I am using ddply in R and I break the data in two different ways, but I want a subtotal of both. This is the function I am using

    require(plyr)
dfx <- data.frame(
  group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
  sex = sample(c("M", "F"), size = 29, replace = TRUE),
  age = runif(n = 29, min = 18, max = 54)
)

# Note the use of the '.' function to allow
# group and sex to be used without quoting
ddply(dfx, .(group, sex), summarize,
 mean = round(mean(age), 2),
 sd = round(sd(age), 2))

I also want to summarize (mean, sd) by group and (mean,sd) summary of the entire data set. Is there a way to include this in the same ddply?


Solution

  • You can replicate the data 4 times: - including sex and group - including sex - including group - not including any column

    The columns that are not included become "all"

    require(plyr)
    dfx <- data.frame(
      group = c(rep('A', 8), rep('B', 15), rep('C', 6)),
      sex = sample(c("M", "F"), size = 29, replace = TRUE),
      age = runif(n = 29, min = 18, max = 54)
    )
    
    # replicate the data not taking account of one or more attributed
    dfAll <- dfx
    dfAll$group <- "all"
    dfAll$sex <- "all"
    dfGroup <- dfx
    dfGroup$group <- "all_group"
    dfSex <- dfx
    dfSex$group <- "all_sex"
    dfToGroup <- rbind(dfx, dfGroup, dfSex, dfAll)
    
    # Note the use of the '.' function to allow
    # group and sex to be used without quoting
    ddply(dfToGroup, .(group, sex), summarize,
          mean = round(mean(age), 2),
          sd = round(sd(age), 2))