Search code examples
raggregate

R aggregate variable lengths differ error


Lots of questions on SO with similar titles, but I can't find any that match my circumstances or adapt them to resolve my error. From what I understand there is an issue with object lengths, but I don't understand why?

I'm looking for a base R solution, to calculate the means of multiple columns in a dataframe. It's complicated because this is to use within a larger function and (a) the names and numbers of columns may vary, and (b) the names and numbers of grouping variable(s) will vary. I keep getting the variable lengths differ (found for 'Group') error, perhaps I need a different way to specify the columns to aggregate?

# Example data
df <- data.frame("Location" = rep(LETTERS[1:16], each = 100), 
                 "Group" = sample(1:200, size = 1600, replace = TRUE), 
                 "Type" = rep(rep(c("Big", "Small"), each = 100), times = 8), 
                 "Var.1" = rnorm(1600, mean = 10), 
                 "Var.2" = rnorm(1600, mean = 5), 
                 "Var.3" = rnorm(1600, mean = 42), 
                 "Var.4" = rnorm(1600, mean = 250))

# Direct call to aggregate, works as expected, returns means of the Var columns.
df.means <- aggregate(cbind(Var.1, Var.2, Var.3, Var.4) ~ Group + Type, 
            data = df, FUN = mean)


## More flexible approach not working...

# Create a string identifying the column names for aggregate,
# needs to be flexible as length(df) is variable.
cols.to.agg <- noquote(paste(colnames(df)[4:length(df)], collapse = " , "))

# Grouping variable, here is it just the one column "Type", 
# but cannot assume this is fixed.
grouping.col <- noquote(colnames(df)[3])

# Couple of approaches, but they fail with 
# "variable lengths differ (found for 'Group')"
df.means <- aggregate(cbind(cols.to.agg) ~ Group + grouping.col,
            data = df, FUN = mean)
df.means <- aggregate(as.formula(paste0("cbind(Cols.to.agg) ~ Group
            + ", grouping.col)), data = df, FUN = mean)

So, I'm looking to return df.means but with flexibility in names and numbers of columns.


Solution

  • I wouldn't use noquote and just concatenate the column names as strings and then change it to a formula.

    cols.to.agg <- colnames(df)[4:length(df)]
    grouping.col <- colnames(df)[3]
    form <- paste0('cbind(', 
                   paste(cols.to.agg, collapse=', '), 
                   ') ~ Group + ',
                   paste(grouping.col, collapse=' + '))
    df.means <- aggregate(as.formula(form), data = df, FUN = mean)