Search code examples
rrpart

format split labels in rpart.plot


I am plotting a tree with rpart.plot::prp(), much like:

library("rpart.plot")
data("ptitanic")
data <- ptitanic
data$sibsp <- as.integer(data$sibsp) # just to show that these are integers
data$age <- as.integer(data$age) # just to show that these are integers
tree <- rpart(survived~., data=data, cp=.02)
prp(tree, , fallen.leaves = FALSE, type=4, extra=1, varlen=0, faclen=0, yesno.yshift=-1)

enter image description here

Even though certain variables are integers (age and sibsp), rpart creates a seemingly arbitrary split point, which confuses the viewer. Nobody has 2.5 siblings/spouses aboard -- the logical split is sibsp >= 3

I have looked at split.fun in this excellent tutorial and ?prp. Other than using a regex to capture the number, format it properly, and replace it in the label string, I can't think of any solutions within prp.

A workaround I am considering is to pass a modified tree (object of class rpart) where the contents have been rounded. Is it possible to do this by modifying tree$splits?

Any other ideas?


Solution

  • Version 3.0.0 of the rpart.plot package (July 2018) treats predictors with integer values specially to automatically get the results you want.

    So rpart.plot now automatically prints sibsp >= 3 instead of sibsp >= 2.5, since it sees that in the training data all values of sibsp are integral.

    Section 4.1 of the vignette for the rpart.plot package has an example.