Short version of question: How can I use ddply to summarize my dataframe grouped by several variables?
I currently use this code to summarize by Condition:
ddply(ExampleData, .(Condition), summarize, Average=mean(Var1, na.rm=TRUE), SD=sd(Var1),N=length(Var1), Med =median(Var1))
How can I adjust the code to summarize by two variables (Condition and Block)?
Desired output format something like:
Condition Block Average SD N Med
1 A 1 0.50 .. .. ..
2 A 2 0.80 .. .. ..
3 B 1 0.90 .. .. ..
4 B 2 0.75 .. .. ..
====
Longer version of question with example data.
Dataframe:
ExampleData <- structure(list(Condition = c("A", "A", "A", "B", "B", "B"), Block = c(1,
2, 1, 2, 1, 2), Var1= c(0.6, 0.8, 0.4, 1, 0.9, 0.5)), row.names = c(NA,
6L), class = "data.frame")
which is:
Condition Block Average SD N Med
1 A 1 0.6
2 A 2 0.8
3 A 1 0.4
4 B 2 1.0
5 B 1 0.9
6 B 2 0.5
I realize there are alternative ways to get the summary but it would be good for my learning if I understood how to adjust the function that I have. I just didnt succeed in making it work and I couldnt find an example to help me here on stackoverflow. I am looking for something like:
ddply(ExampleData, .c(Condition,Block), summarize, Average=mean(Var1, na.rm=TRUE), SD=sd(Var1),N=length(Var1), Med =median(Var1))
(or .(Condition*Block) or list(Condition,Block) or ... ??)
Just remove the c in the .variables
argument, so your code is:
library(plyr)
ddply(ExampleData, .(Condition, Block), summarize, Average=mean(Var1, na.rm=TRUE), SD=sd(Var1),N=length(Var1), Med =median(Var1))
By the way, you might want to switch to using dplyr
instead of plyr
.
https://blog.rstudio.com/2014/01/17/introducing-dplyr/
If you were to do this in dplyr
:
summarize(group_by(ExampleData, Condition, Block), Average=mean(Var1, na.rm=TRUE), SD=sd(Var1),N=length(Var1), Med =median(Var1))
You could also use the piping so this could be:
ExampleData %>%
group_by(Condition, Block) %>%
summarise(Average=mean(Var1, na.rm=TRUE),
SD=sd(Var1),
N=length(Var1),
Med =median(Var1))