Search code examples
rloopsvectorrandom-forestmse

Nested Loops for Calculating MSE of Random Forest


I am trying to calculate MSE for multiple random forests which are created by changing mtry, nodesize, and ntree parameters. I use those parameters as variables in randomForest function and I created 3 "for" loops using those variables as indexes. I am trying to store those MSE variables in 1 dimensional array and compare the results. My problem is at the last line of code where I try to add 729 MSE values next to each other in an array. How can I store them in a nested loop like the below?

set.seed(425)
toyota_idx =sample(1:nrow(ToyotaCorolla),nrow(ToyotaCorolla)*0.7)
toyota_train = ToyotaCorolla[toyota_idx,]
toyota_test=ToyotaCorolla[-toyota_idx,]

##random forest
forest.mse=rep(0,729)

for (i in 1:9){
  for (j in 1:9){
    for (k in 1:9){
bag.toyota=randomForest(Price~.,data=toyota_train,mtry=i,nodesize=j,ntree=k,importance =TRUE)
toyota.prediction = predict(bag.toyota ,newdata=toyota_test)
forest.mse <- c(forest.mse, mean((toyota.prediction-toyota_test$Price)^2))
    }
  }
}

Solution

  • It's going to be half insane to get what back which array belongs to which i,j,k.

    Try making a data.frame with your mrty,nodesize, etc and slot in the MSE per row:

    set.seed(425)
    ToyotaCorolla = data.frame(Price = runif(100),matrix(rnorm(100*10),ncol=10))
    
    toyota_idx =sample(1:nrow(ToyotaCorolla),nrow(ToyotaCorolla)*0.7)
    toyota_train = ToyotaCorolla[toyota_idx,]
    toyota_test=ToyotaCorolla[-toyota_idx,]
    
    ##random forest
    forest.mse=rep(0,nrow(toyota_test))
    Grid = expand.grid(mtry=1:9,nodesize=1:9,ntree=1:9)
    Grid$forest.mse = NA
    
    for(i in 1:nrow(Grid)){
    
    bag.toyota=randomForest(Price~.,data=toyota_train,
    mtry=Grid$mtry[i],nodesize=Grid$nodesize[i],ntree=Grid$ntree[i],importance =TRUE)
    toyota.prediction = predict(bag.toyota ,newdata=toyota_test)
    Grid$forest.mse[i] = mean((toyota.prediction-toyota_test$Price)^2)
    
    }
    
    head(Grid)
      mtry nodesize ntree forest.mse
    1    1        1     1  0.1431115
    2    2        1     1  0.1652446
    3    3        1     1  0.2253738
    4    4        1     1  0.1352773
    5    5        1     1  0.1561385