Search code examples
rdplyrapplylapplytapply

Combine different apply functions in R


I really love the apply-family in R, but I think I still do not get the best of it.

with(mtcars, tapply(mpg, cyl, mean))

sapply(mtcars, mean)

These two functions for example are really nice, but how can I combine them to get the mean for each variable for every category of the variable cyl?

With dplyr it is quite easy I guess:

mtcars %>%
    group_by(cyl) %>%
    summarise_all(mean)

For dplyr it seems to be quite easy. So maybe another questions might be why it is useful to even learn all these apply functions, when dplyr makes it easy to solve the problem? :-)


Solution

  • If you're looking for a base R solution, then you can use split to separate your data frame by cyl, then use sapply as before:

    S <- split( mtcars, mtcars$cyl )
    lapply( S, function(x) sapply(x, mean) )
    

    Your second question is primarily opinion-based, so I'll give mine: tidyverse packages, like dplyr, build on top of base R functionality to provide convenient and consistent interface for common data manipulation operations. For this reason, it is generally preferable, but may not always be available in a particular development environment. In the latter case, it is helpful to know how to fall back on base R functionality.