Let's use mtcars
to explain the situation.
What I want to do is the same below for multiple columns. To have the mean of a column qsec
(in the example) regarding another column with a specific value (4 and 6, in the example below). I'll compare the result later so maybe I would store the results in a vector
table(mtcars$cyl)
4 6 8
11 7 14
mean(mtcars$qsec[mtcars$cyl == 4], na.rm = T)
mean(mtcars$qsec[mtcars$gear == 4], na.rm = T)
I would like to check the means of qsec
regarding the cyl
, and let's say gear
and carb
, with the same "pattern" for the mean i.e. mean of observations with 4 and mean of observations 6. In the true dataset would be several columns that have the same set of numbers (2, 0 and 1). I'll compare the means of a column (in the example qsec
) with observations 2 and 0.
I've tried to look at the functions like tapply
, apply
, sapply
. But I'm stuck in having the condition in the mean applying for every column (at once).
Hope I made myself clear. Thank you!
What I understand you're looking for is the mean of qsec
for each level of cyl
, gear
, and carb
separately, not in combination. This code gets you that, but doesn't directly let you select specific levels of those factors. If you need to be able to do that second part, I think you should be able to tweak this to get there, but I'm not sure how...
apply(mtcars[,c("cyl","gear","carb")], 2, function(x) {
aggregate(mtcars[,"qsec"],list(x),mean)
})
Output:
$cyl
Group.1 x
1 4 19.13727
2 6 17.97714
3 8 16.77214
$gear
Group.1 x
1 3 17.692
2 4 18.965
3 5 15.640
$carb
Group.1 x
1 1 19.50714
2 2 18.18600
3 3 17.66667
4 4 16.96500
5 6 15.50000
6 8 14.60000