I need to use a linear regression. Since each predictor is added to the model respectively, I should use a for loop to fit the model.
set.seed(98274) # Creating example data
y <- rnorm(1000)
x1 <- rnorm(1000) + 0.2 * y
x2 <- rnorm(1000) + 0.2 * x1 + 0.1 * y
x3 <- rnorm(1000) - 0.1 * x1 + 0.3 * x2 - 0.3 * y
data <- data.frame(y, x1, x2, x3)
head(data) # Head of data
mod_summaries <- list() # Create empty list
for(i in 2:ncol(data)) { # Head of for-loop
predictors_i <- colnames(data)[2:i] # Create vector of predictor names
mod_summaries[[i - 1]] <- summary( # Store regression model summary in list
lm(y ~ ., data[ , c("y", predictors_i)]))
}
Then, I tried to predict the test data using those models in another for loop. My code is provided in the following.
## Test
set.seed(44) # Creating test data
y <- rnorm(1000)
x1 <- rnorm(1000) + 0.19 * y
x2 <- rnorm(1000) + 0.2 * x1 + 0.11 * y
x3 <- rnorm(1000) - 0.12 * x1 + 0.28 * x2 - 0.33 * y
test <- data.frame(y, x1, x2, x3)
predict_models <- matrix(nrow = nrow(test), ncol = 3)
for(i in 2:ncol(data)) { # Head of for-loop
predictors_i <- colnames(data)[2:i] # Create vector of predictor names
predict_models[,i-1] <- predict.lm(mod_summaries[[i-1]], test[,2:i])
}
predict_models
but it throws out the following error:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
'data' must be a data.frame, environment, or list
In addition: Warning message:
In predict.lm(mod_summaries[[i - 1]], test[, 2:i]) :
calling predict.lm(<fake-lm-object>) ...
First, you want to store just the models, not the summaries.
mod_summaries <- vector('list', ncol(data) - 1L) ## preallocate list of known length, it's way more efficient
for (i in seq_len(ncol(data))[-1]) {
predictors_i <- colnames(data)[2:i]
mod_summaries[[i - 1]] <- lm(y ~ ., data[, c("y", predictors_i)])
}
Then, data for predict
actually doesn't change, only columns in model are used, so using test
is sufficient.
predict_models <- matrix(nrow=nrow(test), ncol=ncol(test) - 1L)
for (i in seq_len(ncol(data))[-1]) {
predict_models[, i - 1] <- predict.lm(mod_summaries[[i - 1]], test)
}
That's actually it.
head(predict_models)
# [,1] [,2] [,3]
# [1,] -0.115690784 -0.19149611 -0.4815419
# [2,] -0.004721430 0.03814865 0.1894562
# [3,] -0.110812904 0.02312155 0.2579051
# [4,] 0.004264032 -0.06147035 -0.2328833
# [5,] 0.320110168 -0.04145044 -0.3229186
# [6,] -0.040603638 0.01977484 -0.1090088
Alternatively, and more R-ish, you could do the same in just two lines of code, without for
loops, though.
ms <- lapply(seq_along(data)[-1], \(i) lm(reformulate(names(data)[2:i], 'y'), data))
pm <- sapply(ms, predict, test)
head(pm)
# [,1] [,2] [,3]
# 1 -0.115690784 -0.19149611 -0.4815419
# 2 -0.004721430 0.03814865 0.1894562
# 3 -0.110812904 0.02312155 0.2579051
# 4 0.004264032 -0.06147035 -0.2328833
# 5 0.320110168 -0.04145044 -0.3229186
# 6 -0.040603638 0.01977484 -0.1090088