Search code examples
rtreerpart

How can choose number of nods in rpart?


In tree package we can use following code for choosing number of terminal nods:

tree.model = tree(...)
tree.prune = prune.tree(tree.model, best = 20)

This code returns a new tree with 20 terminal nods.

In rpart package following code can use for this:

rpart.model = rpart(...)
rpart.prune = prune.rpart(rpart.model, cp =?)

That cp is cost complexity parameter. but I want similar best argument in prune.tree.


Solution

  • rpart package doesn't have a similar argument to best of tree package. The tree package was developed to cover the functionalities rpart was missing on.

    To choose appropriate number of nodes, you can tune other parameters in rpart. For eg.

    prune.control <- rpart.control(minsplit = 20, minbucket = round(minsplit/3), xval = 10)
    rpart(formula, data, method, control = prune.control)
    

    Then, evaluate the cross validated error vs cp, to choose a cp value. Also, you can automatically tune cp value using caret package. For eg.

    ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 5)
    
    model <- train(x = train_data,
                   y = labels,
                   method = "rpart",
                   trControl = ctrl)