Search code examples
rfor-looplinear-regressionpredict

How to use predict.lm in for loop?


I need to use a linear regression. Since each predictor is added to the model respectively, I should use a for loop to fit the model.

set.seed(98274)                          # Creating example data
y <- rnorm(1000)
x1 <- rnorm(1000) + 0.2 * y
x2 <- rnorm(1000) + 0.2 * x1 + 0.1 * y
x3 <- rnorm(1000) - 0.1 * x1 + 0.3 * x2 - 0.3 * y
data <- data.frame(y, x1, x2, x3)
head(data)                               # Head of data

mod_summaries <- list()                  # Create empty list

for(i in 2:ncol(data)) {                 # Head of for-loop
  
  predictors_i <- colnames(data)[2:i]    # Create vector of predictor names
  mod_summaries[[i - 1]] <- summary(     # Store regression model summary in list
    lm(y ~ ., data[ , c("y", predictors_i)]))
  
}

Then, I tried to predict the test data using those models in another for loop. My code is provided in the following.

## Test
set.seed(44)                          # Creating test data
y <- rnorm(1000)
x1 <- rnorm(1000) + 0.19 * y
x2 <- rnorm(1000) + 0.2 * x1 + 0.11 * y
x3 <- rnorm(1000) - 0.12 * x1 + 0.28 * x2 - 0.33 * y
test <- data.frame(y, x1, x2, x3)


predict_models <- matrix(nrow = nrow(test), ncol = 3)

for(i in 2:ncol(data)) {                 # Head of for-loop
  
  predictors_i <- colnames(data)[2:i]    # Create vector of predictor names
  predict_models[,i-1] <- predict.lm(mod_summaries[[i-1]], test[,2:i])
  
}
predict_models

but it throws out the following error:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  'data' must be a data.frame, environment, or list
In addition: Warning message:
In predict.lm(mod_summaries[[i - 1]], test[, 2:i]) :
  calling predict.lm(<fake-lm-object>) ... 

Solution

  • First, you want to store just the models, not the summaries.

    mod_summaries <- vector('list', ncol(data) - 1L)  ## preallocate list of known length, it's way more efficient
    
    for (i in seq_len(ncol(data))[-1]) {
      predictors_i <- colnames(data)[2:i]
      mod_summaries[[i - 1]] <- lm(y ~ ., data[, c("y", predictors_i)])
    }
    

    Then, data for predict actually doesn't change, only columns in model are used, so using test is sufficient.

    predict_models <- matrix(nrow=nrow(test), ncol=ncol(test) - 1L)
    for (i in seq_len(ncol(data))[-1]) {
      predict_models[, i - 1] <- predict.lm(mod_summaries[[i - 1]], test)
    }
    

    That's actually it.

    head(predict_models)
    #              [,1]        [,2]       [,3]
    # [1,] -0.115690784 -0.19149611 -0.4815419
    # [2,] -0.004721430  0.03814865  0.1894562
    # [3,] -0.110812904  0.02312155  0.2579051
    # [4,]  0.004264032 -0.06147035 -0.2328833
    # [5,]  0.320110168 -0.04145044 -0.3229186
    # [6,] -0.040603638  0.01977484 -0.1090088
    

    Alternatively, and more R-ish, you could do the same in just two lines of code, without for loops, though.

    ms <- lapply(seq_along(data)[-1], \(i) lm(reformulate(names(data)[2:i], 'y'), data))
    pm <- sapply(ms, predict, test)
    head(pm)
    #           [,1]        [,2]       [,3]
    # 1 -0.115690784 -0.19149611 -0.4815419
    # 2 -0.004721430  0.03814865  0.1894562
    # 3 -0.110812904  0.02312155  0.2579051
    # 4  0.004264032 -0.06147035 -0.2328833
    # 5  0.320110168 -0.04145044 -0.3229186
    # 6 -0.040603638  0.01977484 -0.1090088