Search code examples
rmultilevel-analysis

Looping over variables for multilevel regression produces type error


I am writing a multilevel regression model, in which I begin the second level with a dataframe of predictands (coefficients from the first level) and a dataframe of predictors. Both dataframes have the same number of observations. I wish to loop over the preditands (columns in first dataframe) and use lm() to regress them against the entire second dataframe of predictors. However, when I do, I get an error that I cannot figure out.

Example:

data(iris)
iris1 <- iris[-5] # remove the categories
iris2 <- iris[-5] * 6

for (col in names(iris1)) {
    lm(iris1[col] ~ iris2)
}

## Error in model.frame.default(formula = iris1[col] ~ iris2, drop.unused.levels = TRUE) : 
##   invalid type (list) for variable 'iris1[col]'

I just can't understand what this means or why R considers iris1[col] to be a list. For simplicity's sake I've tried merging them:

for (col in names(iris1)) {
  tmp_df <- cbind(iris1[col], iris2)
  colnames(tmp_df) <- letters[1:5]  # to avoid duplicate names
  lm(1 ~ ., tmp_df)
}

## Error in model.frame.default(formula = 1 ~ ., data = tmp_df, drop.unused.levels = TRUE) : 
##   variable lengths differ (found for 'a')

And this one's particularly frustrating because they are clearly the same length.


Solution

  • Note that lm can accept a matrix on the left hand side of the formula so we could do this:

    lm(as.matrix(iris1) ~., iris2)
    

    or if we want a separate lm object for each column of iris1:

    regr <- function(y) lm(y ~., iris2))
    Map(regr, iris1)
    

    or

    regr2 <- function(nm) {
      fo <- as.formula(sprintf("iris2$%s ~.", nm))
      do.call("lm", list(fo, quote(iris2)))
    }
    Map(regr2, names(iris1))
    

    or lm.fit:

    regr.fit <- function(y) lm.fit(cbind(1, as.matrix(iris2)), y)
    Map(regr.fit, iris1)
    

    Note that the component names of the result will be the y column name in iris1.