Search code examples
rdataframeaggregatemeantapply

Mean with condition for multiple columns in r


Let's use mtcars to explain the situation.

What I want to do is the same below for multiple columns. To have the mean of a column qsec (in the example) regarding another column with a specific value (4 and 6, in the example below). I'll compare the result later so maybe I would store the results in a vector

table(mtcars$cyl)
4  6  8
11  7 14

mean(mtcars$qsec[mtcars$cyl == 4], na.rm = T)
mean(mtcars$qsec[mtcars$gear == 4], na.rm = T)

I would like to check the means of qsec regarding the cyl, and let's say gear and carb, with the same "pattern" for the mean i.e. mean of observations with 4 and mean of observations 6. In the true dataset would be several columns that have the same set of numbers (2, 0 and 1). I'll compare the means of a column (in the example qsec) with observations 2 and 0.

I've tried to look at the functions like tapply, apply, sapply. But I'm stuck in having the condition in the mean applying for every column (at once).

Hope I made myself clear. Thank you!


Solution

  • What I understand you're looking for is the mean of qsec for each level of cyl, gear, and carb separately, not in combination. This code gets you that, but doesn't directly let you select specific levels of those factors. If you need to be able to do that second part, I think you should be able to tweak this to get there, but I'm not sure how...

    apply(mtcars[,c("cyl","gear","carb")], 2, function(x) {
      aggregate(mtcars[,"qsec"],list(x),mean)
    })
    

    Output:

    $cyl
      Group.1        x
    1       4 19.13727
    2       6 17.97714
    3       8 16.77214
    
    $gear
      Group.1      x
    1       3 17.692
    2       4 18.965
    3       5 15.640
    
    $carb
      Group.1        x
    1       1 19.50714
    2       2 18.18600
    3       3 17.66667
    4       4 16.96500
    5       6 15.50000
    6       8 14.60000