Lots of questions on SO with similar titles, but I can't find any that match my circumstances or adapt them to resolve my error. From what I understand there is an issue with object lengths, but I don't understand why?
I'm looking for a base R solution, to calculate the means of multiple columns in a dataframe. It's complicated because this is to use within a larger function and (a) the names and numbers of columns may vary, and (b) the names and numbers of grouping variable(s) will vary. I keep getting the variable lengths differ (found for 'Group')
error, perhaps I need a different way to specify the columns to aggregate?
# Example data
df <- data.frame("Location" = rep(LETTERS[1:16], each = 100),
"Group" = sample(1:200, size = 1600, replace = TRUE),
"Type" = rep(rep(c("Big", "Small"), each = 100), times = 8),
"Var.1" = rnorm(1600, mean = 10),
"Var.2" = rnorm(1600, mean = 5),
"Var.3" = rnorm(1600, mean = 42),
"Var.4" = rnorm(1600, mean = 250))
# Direct call to aggregate, works as expected, returns means of the Var columns.
df.means <- aggregate(cbind(Var.1, Var.2, Var.3, Var.4) ~ Group + Type,
data = df, FUN = mean)
## More flexible approach not working...
# Create a string identifying the column names for aggregate,
# needs to be flexible as length(df) is variable.
cols.to.agg <- noquote(paste(colnames(df)[4:length(df)], collapse = " , "))
# Grouping variable, here is it just the one column "Type",
# but cannot assume this is fixed.
grouping.col <- noquote(colnames(df)[3])
# Couple of approaches, but they fail with
# "variable lengths differ (found for 'Group')"
df.means <- aggregate(cbind(cols.to.agg) ~ Group + grouping.col,
data = df, FUN = mean)
df.means <- aggregate(as.formula(paste0("cbind(Cols.to.agg) ~ Group
+ ", grouping.col)), data = df, FUN = mean)
So, I'm looking to return df.means
but with flexibility in names and numbers of columns.
I wouldn't use noquote
and just concatenate the column names as strings and then change it to a formula.
cols.to.agg <- colnames(df)[4:length(df)]
grouping.col <- colnames(df)[3]
form <- paste0('cbind(',
paste(cols.to.agg, collapse=', '),
') ~ Group + ',
paste(grouping.col, collapse=' + '))
df.means <- aggregate(as.formula(form), data = df, FUN = mean)