In tree
package we can use following code for choosing number of terminal nods:
tree.model = tree(...)
tree.prune = prune.tree(tree.model, best = 20)
This code returns a new tree with 20 terminal nods.
In rpart
package following code can use for this:
rpart.model = rpart(...)
rpart.prune = prune.rpart(rpart.model, cp =?)
That cp
is cost complexity parameter. but I want similar best
argument in prune.tree
.
rpart
package doesn't have a similar argument to best
of tree
package. The tree package was developed to cover the functionalities rpart
was missing on.
To choose appropriate number of nodes, you can tune other parameters in rpart
. For eg.
prune.control <- rpart.control(minsplit = 20, minbucket = round(minsplit/3), xval = 10)
rpart(formula, data, method, control = prune.control)
Then, evaluate the cross validated error vs cp, to choose a cp
value. Also, you can automatically tune cp
value using caret
package. For eg.
ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 5)
model <- train(x = train_data,
y = labels,
method = "rpart",
trControl = ctrl)