I am plotting a tree with rpart.plot::prp()
, much like:
library("rpart.plot")
data("ptitanic")
data <- ptitanic
data$sibsp <- as.integer(data$sibsp) # just to show that these are integers
data$age <- as.integer(data$age) # just to show that these are integers
tree <- rpart(survived~., data=data, cp=.02)
prp(tree, , fallen.leaves = FALSE, type=4, extra=1, varlen=0, faclen=0, yesno.yshift=-1)
Even though certain variables are integers (age
and sibsp
), rpart
creates a seemingly arbitrary split point, which confuses the viewer. Nobody has 2.5 siblings/spouses aboard -- the logical split is sibsp >= 3
I have looked at split.fun
in this excellent tutorial and ?prp
. Other than using a regex to capture the number, format it properly, and replace it in the label string, I can't think of any solutions within prp
.
A workaround I am considering is to pass a modified tree
(object of class rpart
) where the contents have been rounded. Is it possible to do this by modifying tree$splits
?
Any other ideas?
Version 3.0.0 of the rpart.plot package (July 2018) treats predictors with integer values specially to automatically get the results you want.
So rpart.plot
now automatically prints sibsp >= 3
instead of sibsp >= 2.5
, since it sees that in the training data all values of sibsp
are integral.
Section 4.1 of the vignette for the rpart.plot package has an example.