Search code examples
rdataframeplyrsummary

converting summary created using 'by' to data.frame


df1=data.frame(c(2,1,2),c(1,2,3,4,5,6),seq(141,170)) #create data.frame
names(df1) = c("gender","age","height") #column names
df1$gender <- factor(df1$gender,
levels=c(1,2),
labels=c("female","male")) #gives levels and labels to gender
df1$age <- factor(df1$age,
levels=c(1,2,3,4,5,6),
labels=c("16-24","25-34","35-44","45-54","55-64","65+")) # gives levels and labels to age groups

I am looking to produce a summary of the height values subsetted by gender and then age.

Using the subset and by functions as provides the output I want:

females<-subset(df1,df1$gender==1) #subsetting by gender
males<-subset(df1,df1$gender==2)

foutput=by(females$height,females$age,summary) #producing summary subsetted by age
moutput=by(males$height,males$age,summary)

However I require it to be in a data.frame so that I can export these results alongside frequency tables using XLconnect.

Is there an way to convert the output to a data.frame or an elegant alternative, possibly using plyr?


Solution

  • Here's one approach using plyr:

    > ddply(df1, c("gender", "age"), function(x) summary(x$height))
      gender   age Min. 1st Qu. Median Mean 3rd Qu. Max.
    1 female 25-34  142     148    154  154     160  166
    2 female 55-64  145     151    157  157     163  169
    3   male 16-24  141     147    153  153     159  165
    4   male 35-44  143     149    155  155     161  167
    5   male 45-54  144     150    156  156     162  168
    6   male   65+  146     152    158  158     164  170