Search code examples
rtreevisualizationlabelrpart

Wrong labels in rpart tree


I am running into some labels issue when using rpart in R.

Here's my situation.

I'm working on a dataset with categorical variables, here's an extract of my data

head(Dataset)
Entity  IL  CP  TD  Budget 
  2      1   3   2     250
  5      2   2   1     663
  6      1   2   3     526 
  2      3   1   2     522

when I plot my decision tree adding the labels, using

plot(tree) 
text(tree)

I get wrong labels : for Entity, I get "abcd"

Why do I get that and how can I fix that ?

Thank you for your help


Solution

  • By default plot.rpart will just label the levels of factor variables with letters, the first level will be a, second b and so on. Example:

    library(rpart)
    library(ggplot2) #for the data
    
    data("diamonds")    
    df <- diamonds[1:2000,]
    
    fit <- rpart(price ~ color + cut + clarity, data = df)
    plot(fit)
    text(fit)
    

    enter image description here

    In my opinion instead of customizing this plot use the rpart plotting dedicated package:

    library(rpart.plot)
    prp(fit)
    

    enter image description here

    it has many customization options (example):

    prp(fit,
        type = 4,
        extra = 101,
        fallen.leaves = T,
        box.palette = colorRampPalette(c("red", "white", "green3"))(10),
        round = 2,
        branch.lty = 2,
        branch.lwd = 1,
        space = -1,
        varlen = 0,
        faclen = 0)
    

    enter image description here

    Another options is:

    library(rattle)
    fancyRpartPlot(fit,
                   type = 4)
    

    enter image description here

    which uses prp internally with different defaults.