I am trying to prune a decision tree to create 19 trees that have 2-20 terminal nodes, and I would like to calculate the training and test error for each. I used this code:
range <- c(2:20)
for (i in range) {
prune.fit <- prune.tree(fit, best = i)
plot(prune.fit) # all the plots :)
text(prune.fit, pretty = 0)
}
which worked well to generate the trees, but when I added in the training and test error it wouldn't work. I then tried this:
for (i in range) {
pred.fittrain[i] <- predict(prune.fit[i], newdata = my_ahp_train)
mean((pred.fittrain - my_ahp_train$sale_price)^2)
pred.fittest[i] <- predict(prune.fit[i], newdata = my_ahp_test)
mean((pred.fittest - my_ahp_test$sale_price)^2)
}
but it just gave me an error. I don't know how to fix this so that it calculates for each individual tree. If anyone has any tips please let me know!
For the Training and Test Error calculation I tried the following codes:
range <- c(2:20)
for (i in range) {
prune.fit <- prune.tree(fit, best = i)
plot(prune.fit) # all the plots :)
text(prune.fit, pretty = 0)
pred.fittrain[i] <- predict(prune.fit[i], newdata = my_ahp_train)
mean((pred.fittrain - my_ahp_train$sale_price)^2)
pred.fittest[i] <- predict(prune.fit[i], newdata = my_ahp_test)
mean((pred.fittest - my_ahp_test$sale_price)^2)
}
AND
range <- c(2:20)
for (i in range) {
prune.fit <- prune.tree(fit, best = i)
plot(prune.fit) # all the plots :)
text(prune.fit, pretty = 0)
pred.fittrain <- predict(prune.fit, newdata = my_ahp_train)
mean((pred.fittrain - my_ahp_train$sale_price)^2)
pred.fittest <- predict(prune.fit, newdata = my_ahp_test)
mean((pred.fittest - my_ahp_test$sale_price)^2)
}
AND
for (i in range) {
pred.fittrain[i] <- predict(prune.fit[i], newdata = my_ahp_train)
mean((pred.fittrain - my_ahp_train$sale_price)^2)
pred.fittest[i] <- predict(prune.fit[i], newdata = my_ahp_test)
mean((pred.fittest - my_ahp_test$sale_price)^2)
}
I was expecting one of these to generate the training and test errors for each decision tree.
It's hard to answer without knowing the packages used and no data, but the following code might be a step forward, see if it makes sense:
lapply(2:20, function(i){
prune.fit <- prune.tree(fit, best = i)
# train
prediction_train <- predict(prune.fit, newdata = my_ahp_train)
mse_train <- mean((prediction_train - my_ahp_train$sale_price)^2)
# repeat the same for test
prediction_test <- predict(prune.fit, newdata = my_ahp_test)
mse_test <- mean((prediction_test - my_ahp_test$sale_price)^2)
c(i = i, mse_train = mse_train, mse_test = mse_test)
}) %>% do.call(rbind, .)