I am trying to calculate MSE for multiple random forests which are created by changing mtry, nodesize, and ntree parameters. I use those parameters as variables in randomForest function and I created 3 "for" loops using those variables as indexes. I am trying to store those MSE variables in 1 dimensional array and compare the results. My problem is at the last line of code where I try to add 729 MSE values next to each other in an array. How can I store them in a nested loop like the below?
set.seed(425)
toyota_idx =sample(1:nrow(ToyotaCorolla),nrow(ToyotaCorolla)*0.7)
toyota_train = ToyotaCorolla[toyota_idx,]
toyota_test=ToyotaCorolla[-toyota_idx,]
##random forest
forest.mse=rep(0,729)
for (i in 1:9){
for (j in 1:9){
for (k in 1:9){
bag.toyota=randomForest(Price~.,data=toyota_train,mtry=i,nodesize=j,ntree=k,importance =TRUE)
toyota.prediction = predict(bag.toyota ,newdata=toyota_test)
forest.mse <- c(forest.mse, mean((toyota.prediction-toyota_test$Price)^2))
}
}
}
It's going to be half insane to get what back which array belongs to which i,j,k.
Try making a data.frame with your mrty,nodesize, etc and slot in the MSE per row:
set.seed(425)
ToyotaCorolla = data.frame(Price = runif(100),matrix(rnorm(100*10),ncol=10))
toyota_idx =sample(1:nrow(ToyotaCorolla),nrow(ToyotaCorolla)*0.7)
toyota_train = ToyotaCorolla[toyota_idx,]
toyota_test=ToyotaCorolla[-toyota_idx,]
##random forest
forest.mse=rep(0,nrow(toyota_test))
Grid = expand.grid(mtry=1:9,nodesize=1:9,ntree=1:9)
Grid$forest.mse = NA
for(i in 1:nrow(Grid)){
bag.toyota=randomForest(Price~.,data=toyota_train,
mtry=Grid$mtry[i],nodesize=Grid$nodesize[i],ntree=Grid$ntree[i],importance =TRUE)
toyota.prediction = predict(bag.toyota ,newdata=toyota_test)
Grid$forest.mse[i] = mean((toyota.prediction-toyota_test$Price)^2)
}
head(Grid)
mtry nodesize ntree forest.mse
1 1 1 1 0.1431115
2 2 1 1 0.1652446
3 3 1 1 0.2253738
4 4 1 1 0.1352773
5 5 1 1 0.1561385