I need to create a function that receives a set of data and the name or position of one of its factor type variables, in such a way that it calculates the average value of each numerical variable for each of the levels of this factor. I need to use a function, not to do it with the help of packages, because I'm learning to program functions.
I have this function but is not working, the results return missing values
promedioXvariable <- function(df, cat) {
res <- list()
for (x in levels(df[[cat]])) {
aux <- list()
for (var in colnames(df)) {
if(class(df[[var]]) == "numeric") {
aux[[var]] <- with(df, tapply(var, x, mean))
}
}
res[[x]] <- aux
}
return(res)
}
The result I want is something like this, but I have with the function NAs:
$setosa $setosa$Sepal.Length setosa NA
Your main problem is here:
aux[[var]] <- with(df, tapply(var, x, mean))
tapply()
expects a factor or list of factors as the INDEX
arg, but you’re just passing one factor level as a character (x
). Instead, you can subset your data to rows where the cat
variable is equal to the factor level x
:
promedioXvariable <- function(df, cat) {
res <- list()
for (x in levels(df[[cat]])) {
aux <- list()
for (var in colnames(df)) {
if(class(df[[var]]) == "numeric") {
aux[[var]] <- mean(df[df[[cat]] == x, var])
}
}
res[[x]] <- unlist(aux)
}
res
}
promedioXvariable(iris, "Species")
$setosa
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.006 3.428 1.462 0.246
$versicolor
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.936 2.770 4.260 1.326
$virginica
Sepal.Length Sepal.Width Petal.Length Petal.Width
6.588 2.974 5.552 2.026