I have been working my way through R ISLR College dataset and I'm wanting to perform the best subset selection on the training set, and plot the training set MSE associated with the best model of each size.
#splitting the data into 70/30
subset<- sample(nrow(college)*0.7)
collegetrain<- college[subset,]
This is my code:
The dataset is structured like this: 777 observations with 543 in the training set and 234 in the test set. There are 18 variables with 17 of them being numeric and 1 being a factor of yes and no (this doesn't need to be changed).
The error message i get when i run my code is: Error in s$which [id, , drop=FALSE]: subscript out of bounds
regfit.full <- regsubsets(Apps ~ ., data = collegetrain, nvmax = 20)
train.mat <- model.matrix(Apps ~ ., data = collegetrain, nvmax = 20)
val.errors <- rep(NA, 20)
for (i in 1:17) {
coefi <- coef(regfit.full, id = i)
pred <- train.mat[, names(coefi)] %*% coefi
val.errors[i] <- mean((pred - collegetrain$Apps)^2)
plot(val.errors, xlab = "Number of predictors", ylab = "Training MSE",
pch = 19, type = "b")