Search code examples
rloopsmachine-learningdecision-tree

Is there a way to calculate training and test error multiple times using a loop?


I am trying to prune a decision tree to create 19 trees that have 2-20 terminal nodes, and I would like to calculate the training and test error for each. I used this code:

range <- c(2:20)

for (i in range) {
  prune.fit <- prune.tree(fit, best = i)
  
  plot(prune.fit) # all the plots :) 
  text(prune.fit, pretty = 0)
}

which worked well to generate the trees, but when I added in the training and test error it wouldn't work. I then tried this:

for (i in range) {
    pred.fittrain[i] <- predict(prune.fit[i], newdata = my_ahp_train)
    mean((pred.fittrain - my_ahp_train$sale_price)^2)
    
    pred.fittest[i] <- predict(prune.fit[i], newdata = my_ahp_test)
    mean((pred.fittest - my_ahp_test$sale_price)^2)
}

but it just gave me an error. I don't know how to fix this so that it calculates for each individual tree. If anyone has any tips please let me know!

For the Training and Test Error calculation I tried the following codes:

range <- c(2:20)

for (i in range) {
  prune.fit <- prune.tree(fit, best = i)
  
  plot(prune.fit) # all the plots :) 
  text(prune.fit, pretty = 0)

pred.fittrain[i] <- predict(prune.fit[i], newdata = my_ahp_train)
    mean((pred.fittrain - my_ahp_train$sale_price)^2)
    
    pred.fittest[i] <- predict(prune.fit[i], newdata = my_ahp_test)
    mean((pred.fittest - my_ahp_test$sale_price)^2)
}

AND

range <- c(2:20)

for (i in range) {
  prune.fit <- prune.tree(fit, best = i)
  
  plot(prune.fit) # all the plots :) 
  text(prune.fit, pretty = 0)

pred.fittrain <- predict(prune.fit, newdata = my_ahp_train)
    mean((pred.fittrain - my_ahp_train$sale_price)^2)
    
    pred.fittest <- predict(prune.fit, newdata = my_ahp_test)
    mean((pred.fittest - my_ahp_test$sale_price)^2)
}

AND

for (i in range) {
    pred.fittrain[i] <- predict(prune.fit[i], newdata = my_ahp_train)
    mean((pred.fittrain - my_ahp_train$sale_price)^2)
    
    pred.fittest[i] <- predict(prune.fit[i], newdata = my_ahp_test)
    mean((pred.fittest - my_ahp_test$sale_price)^2)
}

I was expecting one of these to generate the training and test errors for each decision tree.


Solution

  • It's hard to answer without knowing the packages used and no data, but the following code might be a step forward, see if it makes sense:

    lapply(2:20, function(i){
      prune.fit <- prune.tree(fit, best = i)
      
      # train
      prediction_train <- predict(prune.fit, newdata = my_ahp_train)
      mse_train <-  mean((prediction_train - my_ahp_train$sale_price)^2)
      
      # repeat the same for test 
      prediction_test <- predict(prune.fit, newdata = my_ahp_test)
      mse_test <-  mean((prediction_test - my_ahp_test$sale_price)^2)
      c(i = i, mse_train = mse_train, mse_test = mse_test)
      
    }) %>% do.call(rbind, .)