Search code examples
rpartyctree

Make overfitting tree with maximum depth using ctree


When plotting a ctree model from partykit, I understand that it choose a default to prevent overfitting with overgrown trees. This default value sometimes results in an overly simple tree. To use a post-pruning technique I want to make an overfitting tree, potentially full-grown, using ctree and then work on the pruning later. Try many different things but my code is getting an error.

This stack overflow answer on using all variables to make the tree is not what I want. I don't necessarily want all variables, but I want maximum depth for a tree to go as overgrown as possible.

Basically, how to have the tree go as many depths as possible?

See code and output below:

treemodel <- ctree(Species ~ ., iris)
plot(treemodel)

And I use the Help + documentation from the package but don't see a lot of options to customize this. Promising one is the control parameter, but the documentation isn't very detailed. From searching on other forums, I gave the following a try:

treemodel <- ctree(Species ~ ., iris, control=mincriterion)

I also tried:

treemodel <- ctree(Species ~ ., iris, control="mincriterion")

But both code throws an error. The error:

Error in if (sum(weights) < ctrl$minsplit) return(partynode(as.integer(id))) : argument is of length zero

I am using partykit 1.1-1 and r on mac os.


Solution

  • ctree from partykit accepts a ctree_control parameter through the control argument that you can use to control aspects of the tree fit.

    Doing control=mincriterion or control="mincriterion" is not correct and hence you get an error. control expects a list with control parameters, not a character value.

    In particular, you want to pass into ctree_control the following:

    • mincriterion: Act as a "regulator" for the depth of the tree, smaller values result in larger trees; When mincriterion is 0.8, p-value must be smaller than 0.2 in order for a node to split
    • minsplit and minbucket: Set to 0 so the minimum criterion is always met and thus splitting never stop

    From the package's author itself:

    A split is implemented when the criterion exceeds the value given by mincriterion as specified in ctree_control. For example, when mincriterion = 0.95, the p-value must be smaller than 0.05 in order to split this node. This statistical approach ensures that the right-sized tree is grown without additional (post-)pruning or cross-validation

    So with that, the final code using control=ctree_control():

    diab_model <- ctree(diabetes ~ ., diab_train, control = ctree_control(mincriterion=0.005, minsplit=0, minbucket=0))
    plot(diab_model)
    

    The first line of code creates your decision tree by overriding the defaults, and the second line of code plots the ctree object. You'll get a fully grown tree with maximum depth. Experiment with the values of mincriterion, minsplit, and minbucket. They can also be treated as a hyperparameter. Here's the output of plot(diab_model)

    enter image description here