this is a simple question, and I am sure it is easily solvable with either tapply, apply, or by, etc. However, I am still relatively new to this, and I would like to ask for advice.
The problem:
I have a data frame with say 5 columns. Columns 4 and 5 are factors, say. For each factor in column 5, I want to execute a function over columns 1:3 for each group in my column 5. This is, in principle, easily doable. However, I want to have the output as a nice table, and I want to learn how to do this in an elegant way, which is why I would like to ask you here.
Example:
df <- data.frame(x1=1:6, x2=12:17, x3=3:8, y=1:2, f=1:3)
Now, the command
by(df[,1:3], df$y, sum)
would give me the sum based on each factor level in y
, which is almost what I want. Two additional steps are needed: one is to do this for each factor level in f
. This is almost trivial. I could easily wrap lapply
around the above command and I would get what I want, except this: I want to generate a table with the results, and maybe even use it to generate a heatmap.
Hence: is there an easy and more elegant way to do this and to generate a matrix with corresponding output? This seems like an everyday-task for data scientists, which is why I suspect that there is an existing built-in solution...
Thanks for any help or any hint, no matter how small!
You can use the reshape2
and plyr
packages to accomplish this.
library(plyr)
df2 <- ddply(df, .(y, f), sum)
and then to turn it into a f by y matrix:
library(reshape2)
acast(df2, f ~ y, value.var = "V1")