Search code examples
rfunctionmean

how to write a function to calculate grouped means only in numeric variables in dataframe


I need to create a function that receives a set of data and the name or position of one of its factor type variables, in such a way that it calculates the average value of each numerical variable for each of the levels of this factor. I need to use a function, not to do it with the help of packages, because I'm learning to program functions.

I have this function but is not working, the results return missing values

promedioXvariable <- function(df, cat) {
  res <- list()
  for (x in levels(df[[cat]])) {
    aux <- list()
    for (var in colnames(df)) {
      if(class(df[[var]]) == "numeric") {
        aux[[var]] <- with(df, tapply(var, x, mean))
      }
    }
    res[[x]] <- aux
  }
  return(res)
}

The result I want is something like this, but I have with the function NAs:

$setosa $setosa$Sepal.Length setosa NA

Solution

  • Your main problem is here:

    aux[[var]] <- with(df, tapply(var, x, mean))
    

    tapply() expects a factor or list of factors as the INDEX arg, but you’re just passing one factor level as a character (x). Instead, you can subset your data to rows where the cat variable is equal to the factor level x:

    promedioXvariable <- function(df, cat) {
      res <- list()
      for (x in levels(df[[cat]])) {
        aux <- list()
        for (var in colnames(df)) {
          if(class(df[[var]]) == "numeric") {
            aux[[var]] <- mean(df[df[[cat]] == x, var])
          }
        }
        res[[x]] <- unlist(aux)
      }
      res
    }
    
    promedioXvariable(iris, "Species")
    
    $setosa
    Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
           5.006        3.428        1.462        0.246 
    
    $versicolor
    Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
           5.936        2.770        4.260        1.326 
    
    $virginica
    Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
           6.588        2.974        5.552        2.026